Regression Testing

Before any changes are committed to trunk or vivified to the live site, developers must verify the correctness of the system by running regression tests. These tests are a series of saved choices files, test suites, and gold-standard parsing profiles.

Regression Testing with Pydelphin

The most modern framework for regression testing is with pydelphin. It works on Linux and Mac OSX, and only requires that you check out the Matrix code and install pydelphin and ACE. Pydelphin allows you to:

Note that you should refer to the official pydelphin docs for the information on how to use it in general.

Also note that you should not forget to manually add new tests to SVN. This might be for the better as it encourages extra care.

Last but not least, note that, while pydelphin allows you to inspect the MRS and compare two profiles, it has no GUI and will not allow you to click around the MRS, highlight variables, click on results to see trees and feature structures etc. Therefore, it is recommended that you still use [incr tsdb()] to inspect the gold profiles before adding new tests.

Click here to see the information on the original regression testing framework.

Running regression tests

NB: Below, it is assumed that your matrix/trunk and any matrix/branches are located right in your home directory. Please adjust according to your actual directory structure.

After you have checked out the matrix code and have descended into the matrix/trunk directory (or to your branch), create and activate a virtual environment (not necessary but strongly recommended):

[~]$ cd matrix/trunk
[~/matrix/trunk]$ virtualenv -p python3 py3env
[~/matrix/trunk]$ source py3env/bin/activate

Now install pydelphin v. 1.0:

(py3env) [~/matrix/trunk]$ pip install pydelphin

Now, assuming you have got ACE added to your PATH, you can simply run all the regression tests which are listed in gmcs/regression_tests/regression-test-index:

(py3env) [~/matrix/trunk] ./

Or, you can run a specific test:

(py3env) [~/matrix/trunk] ./ Cree

Or, a number of tests starting with some string:

(py3env) [~/matrix/trunk] ./ adj*

Creating a new regression test

To add a new regression test:

  1. Come up with a meaningful name for the test, e.g. wh3-svo-sg-oblig-det for "wh-questions library, test number 3, a pseudolanguage which has SVO word order, obligatory single fronting, and determiners".
  2. You probably have a grammar corresponding to this new test which you created with the Grammar Matrix system. Take a note of the location of the choices file and the test_sentences file, or put them somewhere convenient, such as gmcs/regression_tests/scratch/.
  3. Run with the --add command:

(py3env) [~/matrix/trunk] ./ --add gmcs/regression_tests/scratch/choices gmcs/regression_tests/scratch/test_sentences myNewTestName

Removing a regression test

If something is not right, you can always remove a fully or partially created test by running:

(py3env) [~/matrix/trunk] ./ --remove myOldTestName

Updating a regression test

Sometimes gold profiles contain mistakes or at any rate, the current profile is preferable over the gold one. In this case, the gold profile should simply be replaced by the current one. You can do this by running:

(py3env) [~/matrix/trunk] ./ --update TestName

Debugging regression diffs

If some of the tests fail, look in the corresponding log file. For example, you might see there:

  Current profile: MyBranch/gmcs/regression_tests/home/current/testname
  Gold profile: MyBranch/gmcs/regression_tests/home/gold/testname
  1                                        <0,1,0>
  3                                        <0,1,0>
  5                                        <0,1,0>
  7                                        <0,1,0>
  10                                       <0,1,0>
  11                                       <1,0,1>
  15                                       <0,1,0>
  16                                       <0,1,0>
Result: FAIL

The above log is telling you that item whose ID is 11 is parsed differently in the current and the gold profiles. You can tell because it has 0 results in the middle column which is the shared results column. The other columns are for unique results in current and gold, so for a test to pass, those columns should have zeros.

To inspect, you can either use [incr tsdb()] or pydelphin, as below:

     (py3env) [~/matrix/trunk] delphin select 'mrs where i-id = 11' gmcs/regression_tests/home/gold/testname/ | delphin convert --indent
     (py3env) [~/matrix/trunk] delphin select 'mrs where i-id = 11' gmcs/regression_tests/home/current/testname/ | delphin convert --indent

That will nicely display the two MRS in your terminal (you may need to install delphin highlight plugin:

    pip install delphin.highlight

Classical (LOGON-dependent) system for regression testing

This older testing framework uses the customization system to create a grammar from the choices file, then uses [incr tsdb()] and ACE to parse the test suite with the grammar and verify the results are the same as those in the gold-standard profile.

See below for information on how to run, update, add, or maintain regression tests, as well as a description of the directory structure of the regression test framework.

Regression testing works with ACE (AceTop) for speed and handling ICONS. As of September 18th, 2014, it takes about 8 minutes with 273 choices files. Regression testing with ACE requires about 2 GB free disk space on your machine. You can download the latest version of ACE here.

Setting Up a Regression Testing Environment

Regression tests require a system with Ubuntu 12.04+, emacs23+, LOGON, ACE, and other requirements. For a step by step guide to setting up a virtual machine that meets these requirements, see MatrixRegressionTestingSetup.

Note if you try to run most or all of the tests in one command, a current bug results in all of the tests producing errors. To avoid this bug, change the user limit to at least above 10,000. For instance, in Ubuntu, the limit can be changed by editing this file: /etc/security/limits.conf

* soft nofile 40000
* hard nofile 40000

Then, you can change it at the terminal:

ulimit -n 40000

Logout and login to initiate this change.

Running Regression Tests

Before you can run the regression tests, you must make sure the environment variables CUSTOMIZATIONROOT and ACEROOT are set.

To run all tests, run the "regression-test" (short form: "r") command of with no arguments:

python regression-test

To run a single test, run the "regression-test" command with the test name as the argument:

python regression-test infl-q-aux-verb

You can also run a suite of tests using the wildcard character. The following runs all of the tests beginning with "neg":

python r neg*

For each test, a result will be printed to STDOUT. A correct grammar will result in a "Success!" message, while a faulty one will result in a "DIFFS!" message. If you ran a single test, the items causing error messages will be printed to STDOUT, otherwise they will be repressed. The results will also be written to a log file.

After running the tests, check the logs for differences, and either fix the regressions or update any progressions.

Checking Logs

The results of a run of the regression tests are saved in several log files. These files are:


Note: The date is only granular to the day, so multiple runs in a single day are in the same file.

Updating Regression Tests

Sometimes, changes to the customization system yield grammars that perform better that the gold standard. These changes will produce a "DIFFS!" result, but it is not necessarily a bad thing. After opening the current and gold profiles in [incr tsdb()] and verifying (by hand) that the current results are more desirable, you can update the gold standard to use the current results. This is done with the "regression-test-update" (short form: "ru") command of, and with the name of the test as an argument. This command is designed to be used after discovering a difference. It relies on the new gold standard file being stored in the testing environment (regression_testing/home/current), so make sure to run the test first:

python r test-to-update
python ru test-to-update

This should add the new gold standard files to svn, so be sure to commit them!

Adding Regression Tests

Whenever a new feature is added to the customization system, regression tests should be created to test this feature. Regression tests consist of a choices file and a test suite. Choices files can be any set of choices in the choices file format, but they must validate.

Ensure that the Grammar Matrix produces the grammar you expect given the choices file before adding the test. Here is a general workflow:

  1. Create txtsuite file with the sentences for the testsuite
  2. Create the corresponding choices file
  3. Use the current customization system version to create a grammar from the choices file
  4. Start up the lkb & [incr tsdb()]

  5. Load the grammar into the lkb
  6. In [incr tsdb()] use "import test items" to create a test suite profile from the txtsuite file
  7. In [incr tsdb()] use "process | all items" to parse the test items with the grammar
  8. In [incr tsdb()] use "browse | results" to manually explore both grammaticality and MRSs for each test item

  9. Adjust the customization system until the previous step yields the desired result
  10. For each change to the customization system, ensure that all other regression tests continue to pass

Tests take their name from the choices file's language name, so make sure to name each language appropriately. Generally, test/language names follow this general template: LIBRARY-PHENOMENON-INSTANCE; e.g. "neg-head-comp-vpauxbefore-compbefore". However, this is not an enforced standard.

Test suites are text files with one test item per line, negative (ungrammatical) items marked with an asterisk (*), such as:

New regression tests are created using First, put your choices file and test suite file into mybranch/gmcs/regression_tests/scratch/, and then do the following:

python regression-test-add CHOICES_FILENAME TXT-SUITE_FILENAME

CHOICES_FILENAME and TXT-SUITE_FILENAME are just the filenames (not the full paths) of the files you added into the scratch/ directory.

This is what regression-test-add does under the bonnet (paths relative to gmcs/regression_tests):

  1. Customizes a grammar from the choices file
  2. Compiles the grammar with ACE
  3. Copies the choices file to choices/ and test suite to txt-suites/

  4. Adds the regression test to regression-test-index

  5. Creates an unparsed profile
  6. Parses the profile with ACE and art

  7. Uses the parsed profile as gold (in home/gold/lang-name/), and the item and relations files for a skeleton (in skeletons/lang-name)

  8. Updates the skeletons index (skeletons/Index.lisp)

Make sure to test your regression test after you add it:

python regression-test NAME_OF_TEST_LANGUAGE

When you're finished adding all your regression tests:

svn commit

If you're having trouble, make sure your CUSTOMIZATIONROOT and ACEROOT are set up properly and take a look at MatrixRegressionTestingSetup

Maintaining regression tests

DO avoid editing regression-test-index directly or changing the contents of choices/ txt-suites/ skeletons/ or home/gold by hand. This code is probably fairly brittle wrt to files not being where they are expected.

Otherwise, no need to do anything here: We want choices files in old versions so that we are routinely testing the up-rev code.

Directory structure:

        skeletons/      [tsdb skeletons]
        home/           [tsdb home]


A typical reason for all tests failing is using a wrong customization root.

Everything in the home/current/, grammars/, and logs/ subdirectories is placed there and named by the scripts.

Language name in choices file used as the basis of the naming. We need a convention for them :)

One-to-one mapping between choices files and txt-suites and skeletons. We might end up with multiple txt-suites for the same "language", but we would still require separate choices files with the same choices except for the language name.

Scratch is the default place for putting choices files and txt-suites to play with and then eventually add to the system. The contents of scratch should be local only, not under svn.

After creating regression test, be sure to close [incr tsdb()] before running regression tests. Ongoing processes in [incr tsdb()] can block actions needed during the regression tests.

Comparision on the number of readings is still made by a funtion called compare-in-detail() that [incr tsdb()] provides. Comparision on MRS is made by an external software mrs-compare. There is no problem in using this for now, but in the future this may cause a potential problem with ambiguous sentences that have multiple MRSes. An alternative way would be using PyDelphin instead of mrs-compare.

Sometimes a new regression test name is not added to regression-test-index properly, specifically sometimes there is a newline missing after the last added test. This will result in an error and there is no helpful error message to go with it currently. One of the error messages that pop up here is python's "too many values to unpack"; another is "test not found". Check that you've added a new test name properly to regression-test-index.

MatrixRegressionTesting (last edited 2019-09-10 23:02:28 by OlgaZamaraeva)

(The DELPH-IN infrastructure is hosted at the University of Oslo)