Before any changes are committed to trunk or vivified to the live site, developers must verify the correctness of the system by running regression tests. These tests are a series of saved choices files, test suites, and gold-standard parsing profiles. The testing framework uses the customization system to create a grammar from the choices file, then uses [incr tsdb()] and ACE to parse the test suite with the grammar and verify the results are the same as those in the gold-standard profile.
Regression testing works with ACE (AceTop) for speed and handling ICONS. As of September 18th, 2014, it takes about 8 minutes with 273 choices files. Regression testing with ACE requires about 2 GB free disk space on your machine. You can download the latest version of ACE here.
Setting Up a Regression Testing Environment
Regression tests require a system with Ubuntu 12.04+, emacs23+, LOGON, ACE, and other requirements. For a step by step guide to setting up a virtual machine that meets these requirements, see MatrixRegressionTestingSetup.
Note if you try to run most or all of the tests in one command, a current bug results in all of the tests producing errors. To avoid this bug, change the user limit to at least above 10,000. For instance, in Ubuntu, the limit can be changed by editing this file: /etc/security/limits.conf
* soft nofile 40000 * hard nofile 40000
Then, you can change it at the terminal:
ulimit -n 40000
Logout and login to initiate this change.
Running Regression Tests
To run all tests, run the "regression-test" (short form: "r") command of matrix.py with no arguments:
python matrix.py regression-test
To run a single test, run the "regression-test" command with the test name as the argument:
python matrix.py regression-test infl-q-aux-verb
You can also run a suite of tests using the wildcard character. The following runs all of the tests beginning with "neg":
python matrix.py r neg*
For each test, a result will be printed to STDOUT. A correct grammar will result in a "Success!" message, while a faulty one will result in a "DIFFS!" message. If you ran a single test, the items causing error messages will be printed to STDOUT, otherwise they will be repressed. The results will also be written to a log file.
The results of a run of the regression tests are saved in several log files. These files are:
gmcs/ regression_tests/ logs/ regression-tests.date test-name.date tsdb.date
- "regression-tests.date" shows the "Success!"/"DIFFS!" messages for each test, and the items causing diffs.
- "test-name.date" shows the diffs from a particular test.
- "tsdb.date" shows the output of [incr tsdb()], which may be useful in debugging (if the grammar fails to load, for instance).
Note: The date is only granular to the day, so multiple runs in a single day are in the same file.
Updating Regression Tests
Sometimes, changes to the customization system yield grammars that perform better that the gold standard. These changes will produce a "DIFFS!" result, but it is not necessarily a bad thing. After opening the current and gold profiles in [incr tsdb()] and verifying (by hand) that the current results are more desirable, you can update the gold standard to use the current results. This is done with the "regression-test-update" (short form: "ru") command of matrix.py, and with the name of the test as an argument. This command is designed to be used after discovering a difference. It relies on the new gold standard file being stored in the testing environment (regression_testing/home/current), so make sure to run the test first:
python matrix.py r test-to-update python matrix.py ru test-to-update
This should add the new gold standard files to svn, so be sure to commit them!
Adding Regression Tests
Whenever a new feature is added to the customization system, regression tests should be created to test this feature. Regression tests consist of a choices file and a test suite. Choices files can be any set of choices in the choices file format, but they must validate.
Ensure that the Grammar Matrix produces the grammar you expect given the choices file before adding the test. Here is a general workflow:
- Create txtsuite file with the sentences for the testsuite
- Create the corresponding choices file
- Use the current customization system version to create a grammar from the choices file
Start up the lkb & [incr tsdb()]
- Load the grammar into the lkb
- In [incr tsdb()] use "import test items" to create a test suite profile from the txtsuite file
- In [incr tsdb()] use "process | all items" to parse the test items with the grammar
In [incr tsdb()] use "browse | results" to manually explore both grammaticality and MRSs for each test item
- Adjust the customization system until the previous step yields the desired result
- For each change to the customization system, ensure that all other regression tests continue to pass
Tests take their name from the choices file's language name, so make sure to name each language appropriately. Generally, test/language names follow this general template: LIBRARY-PHENOMENON-INSTANCE; e.g. "neg-head-comp-vpauxbefore-compbefore". However, this is not an enforced standard.
Test suites are text files with one test item per line, negative (ungrammatical) items marked with an asterisk (*), such as:
- he likes her
- the boy plays
- the dog barks
- *he likes she
- *the boy play
- *the dogs barks
New regression tests are created using matrix.py. First, put your choices file and test suite file into mybranch/gmcs/regression_tests/scratch/, and then do the following:
python matrix.py regression-test-add CHOICES_FILENAME TXT-SUITE_FILENAME
CHOICES_FILENAME and TXT-SUITE_FILENAME are just the filenames (not the full paths) of the files you added into the scratch/ directory.
This is what regression-test-add does under the bonnet (paths relative to gmcs/regression_tests):
- Customizes a grammar from the choices file
- Compiles the grammar with ACE
Copies the choices file to choices/ and test suite to txt-suites/
Adds the regression test to regression-test-index
- Creates an unparsed profile
Parses the profile with ACE and art
Uses the parsed profile as gold (in home/gold/lang-name/), and the item and relations files for a skeleton (in skeletons/lang-name)
Updates the skeletons index (skeletons/Index.lisp)
Make sure to test your regression test after you add it:
python matrix.py regression-test NAME_OF_TEST_LANGUAGE
When you're finished adding all your regression tests:
Maintaining regression tests
DO avoid editing regression-test-index directly or changing the contents of choices/ txt-suites/ skeletons/ or home/gold by hand. This code is probably fairly brittle wrt to files not being where they are expected.
Otherwise, no need to do anything here: We want choices files in old versions so that we are routinely testing the up-rev code.
regression_tests/ add_regression_test.py call-customize run_regression_tests.sh regression-test-Index regressiontestindex.py update-gold-standard.sh choices/ txt-suites/ skeletons/ [tsdb skeletons] Index.lisp home/ [tsdb home] gold/ current/ grammars/ logs/ scratch/
A typical reason for all tests failing is using a wrong customization root.
Everything in the home/current/, grammars/, and logs/ subdirectories is placed there and named by the scripts.
Language name in choices file used as the basis of the naming. We need a convention for them
One-to-one mapping between choices files and txt-suites and skeletons. We might end up with multiple txt-suites for the same "language", but we would still require separate choices files with the same choices except for the language name.
Scratch is the default place for putting choices files and txt-suites to play with and then eventually add to the system. The contents of scratch should be local only, not under svn.
Comparision on the number of readings is still made by a funtion called compare-in-detail() that [incr tsdb()] provides. Comparision on MRS is made by an external software mrs-compare. There is no problem in using this for now, but in the future this may cause a potential problem with ambiguous sentences that have multiple MRSes. An alternative way would be using PyDelphin instead of mrs-compare.
Sometimes a new regression test name is not added to regression-test-index properly, specifically sometimes there is a newline missing after the last added test. This will result in an error and there is no helpful error message to go with it currently. One of the error messages that pop up here is python's "too many values to unpack"; another is "test not found". Check that you've added a new test name properly to regression-test-index.