Regression Testing
Before any changes are committed to trunk or vivified to the live site, developers must verify the correctness of the system by running regression tests. These tests are a series of saved choices files, test suites, and gold-standard parsing profiles. The testing framework uses the customization system to create a grammar from the choices file, then uses [incr tsdb()] and the LKB to parse the test suite with the grammar and verify the results are the same as those in the gold-standard profile.
See below for information on how to run, update, add, or maintain regression tests, as well as a description of the directory structure of the regression test framework.
Running Regression Tests
Before you can run the regression tests, you must make sure the environment variable CUSTOMIZATIONROOT is set.
To run all tests, run the "regression-test" (short form: "r") command of matrix.py with no arguments:
python matrix.py regression-test
To run a single test, run the "regression-test" command with the test name as the argument:
python matrix.py regression-test infl-q-aux-verb
You may also run a single test from a Lisp buffer:
- From the lisp buffer (of a running LKB), first the steps that only need to be done once per lkb session (again, with appropriate path names):
:pa lkb (setf *customization-root* "~/matrix/gmcs/") (load "~/matrix/lisp/matrix-regression-tests.lsp")
- Then to run a particular regression test:
(run-matrix-regression-test "lg-name")
This leaves you with the grammar loaded in the lkb so you can explore the results. Some further instructions on what to do next with [incr tsdb()] print out in the lisp buffer.
For each test, a result will be printed to STDOUT. A correct grammar will result in a "Success!" message, while a faulty one will result in a "DIFFS!" message. If you ran a single test, the items causing error messages will be printed to STDOUT, otherwise they will be repressed. The results will also be written to a log file.
After running the tests, check the logs for differences, and either fix the regressions or update any progressions.
Checking Logs
The results of a run of the regression tests are saved in several log files. These files are:
gmcs/
regression_tests/
logs/
regression-tests.date
test-name.date
tsdb.date- "regression-tests.date" shows the "Success!"/"DIFFS!" messages for each test, and the items causing diffs.
- "test-name.date" shows the diffs from a particular test.
- "tsdb.date" shows the output of [incr tsdb()], which may be useful in debugging (if the grammar fails to load, for instance).
Note: The date is only granular to the day, so multiple runs in a single day are in the same file.
Updating Regression Tests
Sometimes, changes to the customization system yield grammars that perform better that the gold standard. These changes will produce a "DIFFS!" result, but it is not necessarily a bad thing. After opening the current and gold profiles in [incr tsdb()] and verifying (by hand) that the current results are more desirable, you can update the gold standard to use the current results. This is done with the "regression-test-update" (short form: "ru") command of matrix.py, and with the name of the test as an argument:
python matrix.py ru test-to-update
Be sure to run svn commit such that any new gold standards are committed.
Adding Regression Tests
- Develop choices files and corresponding txt-suites for regression tests, and store them in gmcs/regression_tests/scratch/
- Make sure that there's no iso-code line in the choices file.
- Run logon (M-x logon within emacs), and type the following commands into the lisp buffer (substituting the actual path to your matrix directory):
:pa lkb (setf *customization-root* "~/matrix/gmcs/") (load "~/matrix/lisp/matrix-regression-tests.lsp")
- Invoke create-matrix-regression-test() from within Lisp. create-matrix-regression-test() takes the names of two files (choices and txt-suite) and ends with an open dvi file showing a summary of the results (typing 'q' will make this window go away).
(create-matrix-regression-test "choices-file" "txt-suite")
- If you need to get rid of the files created by this process, you can invoke clean-up-regression-test():
(clean-up-regression-test "lg-name")
Examine the results in [incr tsdb()] to see if the grammar behaved as expected. If the results are suitable, this profile will become the initial gold standard for this new regression test. You can examine the results from the [incr tsdb()] podium as follows (assuming your database root is set properly, i.e., to ~/matrix/gmcs/regression_tests/home/):
- Options | Update | Database list
- select the new profile
- Browse | Results
- Use the "regression-test-add" command of matrix.py to add the new regression test to the system (putting all of the files in the right places). Be prepared to offer a descriptive comment for the test. Do not use any quotation marks in your comment string. Here "choices-file" and "txt-suite" should be relative paths (or just filenames) to those files, starting from gmcs/regression_tests/scratch/. If successful, the files will be added (but not committed) to SVN.
python matrix.py regression-test-add choices-file txt-suite
- When you're finished adding all your regression tests:
svn commit
Maintaining regression tests
DO avoid editing regression-test-index directly or changing the contents of choices/ txt-suites/ skeletons/ or home/gold by hand. This code is probably fairly brittle wrt to files not being where they are expected.
Otherwise, no need to do anything here: We want choices files in old versions so that we are routinely testing the up-rev code.
Directory structure:
regression_tests/
add_regression_test.py
call-customize
run_regression_tests.sh
regression-test-Index
regressiontestindex.py
update-gold-standard.sh
choices/
txt-suites/
skeletons/ [tsdb skeletons]
Index.lisp
home/ [tsdb home]
gold/
current/
grammars/
logs/
scratch/
Notes
Everything in the home/current/, grammars/, and logs/ subdirectories is placed there and named by the scripts.
Language name in choices file used as the basis of the naming. We need a convention for them
One-to-one mapping between choices files and txt-suites and skeletons. We might end up with multiple txt-suites for the same "language", but we would still require separate choices files with the same choices except for the language name.
Scratch is the default place for putting choices files and txt-suites to play with and then eventually add to the system. The contents of scratch should be local only, not under svn.
After creating regression test, be sure to close [incr tsdb()] before running regression tests. Ongoing processes in [incr tsdb()] can block actions needed during the regression tests.