Before any changes are committed to trunk or vivified to the live site, developers must verify the correctness of the system by running regression tests. These tests are a series of saved choices files, test suites, and gold-standard parsing profiles.
Regression Testing with Pydelphin
The most modern framework for regression testing is with pydelphin. It works on Linux and Mac OSX, and only requires that you check out the Matrix code and install pydelphin and ACE. Pydelphin allows you to:
Run all or some of the regression tests listed in the regression-test-index file (located in gmcs/regression_tests/)
Note that you should refer to the official pydelphin docs for the information on how to use it in general.
Also note that you should not forget to manually add new tests to SVN. This might be for the better as it encourages extra care.
Last but not least, note that, while pydelphin allows you to inspect the MRS and compare two profiles, it has no GUI and will not allow you to click around the MRS, highlight variables, click on results to see trees and feature structures etc. Therefore, it is recommended that you still use [incr tsdb()] to inspect the gold profiles before adding new tests.
Click here to see the information on the original regression testing framework.
Running regression tests
NB: Below, it is assumed that your matrix/trunk and any matrix/branches are located right in your home directory. Please adjust according to your actual directory structure.
After you have checked out the matrix code and have descended into the matrix/trunk directory (or to your branch), create and activate a virtual environment (not necessary but strongly recommended):
[~]$ cd matrix/trunk [~/matrix/trunk]$ virtualenv -p python3 py3env [~/matrix/trunk]$ source py3env/bin/activate
Now install pydelphin v. 1.0:
(py3env) [~/matrix/trunk]$ pip install pydelphin
Now, assuming you have got ACE added to your PATH, you can simply run all the regression tests which are listed in gmcs/regression_tests/regression-test-index:
(py3env) [~/matrix/trunk] ./rtest.py
Or, you can run a specific test:
(py3env) [~/matrix/trunk] ./rtest.py Cree
Or, a number of tests starting with some string:
(py3env) [~/matrix/trunk] ./rtest.py adj*
Creating a new regression test
To add a new regression test:
- Come up with a meaningful name for the test, e.g. wh3-svo-sg-oblig-det for "wh-questions library, test number 3, a pseudolanguage which has SVO word order, obligatory single fronting, and determiners".
- You probably have a grammar corresponding to this new test which you created with the Grammar Matrix system. Take a note of the location of the choices file and the test_sentences file, or put them somewhere convenient, such as gmcs/regression_tests/scratch/.
- Run rtest.py with the --add command:
(py3env) [~/matrix/trunk] ./rtest.py --add gmcs/regression_tests/scratch/choices gmcs/regression_tests/scratch/test_sentences myNewTestName
Removing a regression test
If something is not right, you can always remove a fully or partially created test by running:
(py3env) [~/matrix/trunk] ./rtest.py --remove myOldTestName
Updating a regression test
Sometimes gold profiles contain mistakes or at any rate, the current profile is preferable over the gold one. In this case, the gold profile should simply be replaced by the current one. You can do this by running:
(py3env) [~/matrix/trunk] ./rtest.py --update TestName
Debugging regression diffs
If some of the tests fail, look in the corresponding log file. For example, you might see there:
Current profile: MyBranch/gmcs/regression_tests/home/current/testname Gold profile: MyBranch/gmcs/regression_tests/home/gold/testname 1 <0,1,0> 3 <0,1,0> 5 <0,1,0> 7 <0,1,0> 10 <0,1,0> 11 <1,0,1> 15 <0,1,0> 16 <0,1,0> Result: FAIL
The above log is telling you that item whose ID is 11 is parsed differently in the current and the gold profiles. You can tell because it has 0 results in the middle column which is the shared results column. The other columns are for unique results in current and gold, so for a test to pass, those columns should have zeros.
To inspect, you can either use [incr tsdb()] or pydelphin, as below:
(py3env) [~/matrix/trunk] delphin select 'mrs where i-id = 11' gmcs/regression_tests/home/gold/testname/ | delphin convert --indent (py3env) [~/matrix/trunk] delphin select 'mrs where i-id = 11' gmcs/regression_tests/home/current/testname/ | delphin convert --indent
That will nicely display the two MRS in your terminal (you may need to install delphin highlight plugin:
pip install delphin.highlight
Classical (LOGON-dependent) system for regression testing
This older testing framework uses the customization system to create a grammar from the choices file, then uses [incr tsdb()] and ACE to parse the test suite with the grammar and verify the results are the same as those in the gold-standard profile.
Regression testing works with ACE (AceTop) for speed and handling ICONS. As of September 18th, 2014, it takes about 8 minutes with 273 choices files. Regression testing with ACE requires about 2 GB free disk space on your machine. You can download the latest version of ACE here.
Setting Up a Regression Testing Environment
Regression tests require a system with Ubuntu 12.04+, emacs23+, LOGON, ACE, and other requirements. For a step by step guide to setting up a virtual machine that meets these requirements, see MatrixRegressionTestingSetup.
Note if you try to run most or all of the tests in one command, a current bug results in all of the tests producing errors. To avoid this bug, change the user limit to at least above 10,000. For instance, in Ubuntu, the limit can be changed by editing this file: /etc/security/limits.conf
* soft nofile 40000 * hard nofile 40000
Then, you can change it at the terminal:
ulimit -n 40000
Logout and login to initiate this change.
Running Regression Tests
To run all tests, run the "regression-test" (short form: "r") command of matrix.py with no arguments:
python matrix.py regression-test
To run a single test, run the "regression-test" command with the test name as the argument:
python matrix.py regression-test infl-q-aux-verb
You can also run a suite of tests using the wildcard character. The following runs all of the tests beginning with "neg":
python matrix.py r neg*
For each test, a result will be printed to STDOUT. A correct grammar will result in a "Success!" message, while a faulty one will result in a "DIFFS!" message. If you ran a single test, the items causing error messages will be printed to STDOUT, otherwise they will be repressed. The results will also be written to a log file.
The results of a run of the regression tests are saved in several log files. These files are:
gmcs/ regression_tests/ logs/ regression-tests.date test-name.date tsdb.date
- "regression-tests.date" shows the "Success!"/"DIFFS!" messages for each test, and the items causing diffs.
- "test-name.date" shows the diffs from a particular test.
- "tsdb.date" shows the output of [incr tsdb()], which may be useful in debugging (if the grammar fails to load, for instance).
Note: The date is only granular to the day, so multiple runs in a single day are in the same file.
Updating Regression Tests
Sometimes, changes to the customization system yield grammars that perform better that the gold standard. These changes will produce a "DIFFS!" result, but it is not necessarily a bad thing. After opening the current and gold profiles in [incr tsdb()] and verifying (by hand) that the current results are more desirable, you can update the gold standard to use the current results. This is done with the "regression-test-update" (short form: "ru") command of matrix.py, and with the name of the test as an argument. This command is designed to be used after discovering a difference. It relies on the new gold standard file being stored in the testing environment (regression_testing/home/current), so make sure to run the test first:
python matrix.py r test-to-update python matrix.py ru test-to-update
This should add the new gold standard files to svn, so be sure to commit them!
Adding Regression Tests
Whenever a new feature is added to the customization system, regression tests should be created to test this feature. Regression tests consist of a choices file and a test suite. Choices files can be any set of choices in the choices file format, but they must validate.
Ensure that the Grammar Matrix produces the grammar you expect given the choices file before adding the test. Here is a general workflow:
- Create txtsuite file with the sentences for the testsuite
- Create the corresponding choices file
- Use the current customization system version to create a grammar from the choices file
Start up the lkb & [incr tsdb()]
- Load the grammar into the lkb
- In [incr tsdb()] use "import test items" to create a test suite profile from the txtsuite file
- In [incr tsdb()] use "process | all items" to parse the test items with the grammar
In [incr tsdb()] use "browse | results" to manually explore both grammaticality and MRSs for each test item
- Adjust the customization system until the previous step yields the desired result
- For each change to the customization system, ensure that all other regression tests continue to pass
Tests take their name from the choices file's language name, so make sure to name each language appropriately. Generally, test/language names follow this general template: LIBRARY-PHENOMENON-INSTANCE; e.g. "neg-head-comp-vpauxbefore-compbefore". However, this is not an enforced standard.
Test suites are text files with one test item per line, negative (ungrammatical) items marked with an asterisk (*), such as:
- he likes her
- the boy plays
- the dog barks
- *he likes she
- *the boy play
- *the dogs barks
New regression tests are created using matrix.py. First, put your choices file and test suite file into mybranch/gmcs/regression_tests/scratch/, and then do the following:
python matrix.py regression-test-add CHOICES_FILENAME TXT-SUITE_FILENAME
CHOICES_FILENAME and TXT-SUITE_FILENAME are just the filenames (not the full paths) of the files you added into the scratch/ directory.
This is what regression-test-add does under the bonnet (paths relative to gmcs/regression_tests):
- Customizes a grammar from the choices file
- Compiles the grammar with ACE
Copies the choices file to choices/ and test suite to txt-suites/
Adds the regression test to regression-test-index
- Creates an unparsed profile
Parses the profile with ACE and art
Uses the parsed profile as gold (in home/gold/lang-name/), and the item and relations files for a skeleton (in skeletons/lang-name)
Updates the skeletons index (skeletons/Index.lisp)
Make sure to test your regression test after you add it:
python matrix.py regression-test NAME_OF_TEST_LANGUAGE
When you're finished adding all your regression tests:
Maintaining regression tests
DO avoid editing regression-test-index directly or changing the contents of choices/ txt-suites/ skeletons/ or home/gold by hand. This code is probably fairly brittle wrt to files not being where they are expected.
Otherwise, no need to do anything here: We want choices files in old versions so that we are routinely testing the up-rev code.
regression_tests/ add_regression_test.py call-customize run_regression_tests.sh regression-test-Index regressiontestindex.py update-gold-standard.sh choices/ txt-suites/ skeletons/ [tsdb skeletons] Index.lisp home/ [tsdb home] gold/ current/ grammars/ logs/ scratch/
A typical reason for all tests failing is using a wrong customization root.
Everything in the home/current/, grammars/, and logs/ subdirectories is placed there and named by the scripts.
Language name in choices file used as the basis of the naming. We need a convention for them
One-to-one mapping between choices files and txt-suites and skeletons. We might end up with multiple txt-suites for the same "language", but we would still require separate choices files with the same choices except for the language name.
Scratch is the default place for putting choices files and txt-suites to play with and then eventually add to the system. The contents of scratch should be local only, not under svn.
Comparision on the number of readings is still made by a funtion called compare-in-detail() that [incr tsdb()] provides. Comparision on MRS is made by an external software mrs-compare. There is no problem in using this for now, but in the future this may cause a potential problem with ambiguous sentences that have multiple MRSes. An alternative way would be using PyDelphin instead of mrs-compare.
Sometimes a new regression test name is not added to regression-test-index properly, specifically sometimes there is a newline missing after the last added test. This will result in an error and there is no helpful error message to go with it currently. One of the error messages that pop up here is python's "too many values to unpack"; another is "test not found". Check that you've added a new test name properly to regression-test-index.