Regression Testing

Before any changes are committed to trunk or vivified to the live site, developers must verify the correctness of the system by running regression tests. These tests are a series of saved choices files, test suites, and gold-standard parsing profiles. The testing framework uses the customization system to create a grammar from the choices file, then uses [incr tsdb()] and ACE to parse the test suite with the grammar and verify the results are the same as those in the gold-standard profile.

See below for information on how to run, update, add, or maintain regression tests, as well as a description of the directory structure of the regression test framework.

Regression testing works with ACE (AceTop) for speed and handling ICONS. As of September 18th, 2014, it takes about 8 minutes with 273 choices files. Regression testing with ACE requires about 2 GB free disk space on your machine. You can download the latest version of ACE here.

Setting Up a Regression Testing Environment

Regression tests require a system with Ubuntu 12.04+, emacs23+, LOGON, ACE, and other requirements. For a step by step guide to setting up a virtual machine that meets these requirements, see MatrixRegressionTestingSetup.

Note if you try to run most or all of the tests in one command, a current bug results in all of the tests producing errors. To avoid this bug, change the user limit to at least above 10,000. For instance, in Ubuntu, the limit can be changed by editing this file: /etc/security/limits.conf

* soft nofile 40000
* hard nofile 40000

Then, you can change it at the terminal:

ulimit -n 40000

Logout and login to initiate this change.

Running Regression Tests

Before you can run the regression tests, you must make sure the environment variables CUSTOMIZATIONROOT and ACEROOT are set.

To run all tests, run the "regression-test" (short form: "r") command of matrix.py with no arguments:

python matrix.py regression-test

To run a single test, run the "regression-test" command with the test name as the argument:

python matrix.py regression-test infl-q-aux-verb

You can also run a suite of tests using the wildcard character. The following runs all of the tests beginning with "neg":

python matrix.py r neg*

For each test, a result will be printed to STDOUT. A correct grammar will result in a "Success!" message, while a faulty one will result in a "DIFFS!" message. If you ran a single test, the items causing error messages will be printed to STDOUT, otherwise they will be repressed. The results will also be written to a log file.

After running the tests, check the logs for differences, and either fix the regressions or update any progressions.

Checking Logs

The results of a run of the regression tests are saved in several log files. These files are:

    gmcs/
        regression_tests/
            logs/
                regression-tests.date
                test-name.date
                tsdb.date

Note: The date is only granular to the day, so multiple runs in a single day are in the same file.

Updating Regression Tests

Sometimes, changes to the customization system yield grammars that perform better that the gold standard. These changes will produce a "DIFFS!" result, but it is not necessarily a bad thing. After opening the current and gold profiles in [incr tsdb()] and verifying (by hand) that the current results are more desirable, you can update the gold standard to use the current results. This is done with the "regression-test-update" (short form: "ru") command of matrix.py, and with the name of the test as an argument. This command is designed to be used after discovering a difference. It relies on the new gold standard file being stored in the testing environment (regression_testing/home/current), so make sure to run the test first:

python matrix.py r test-to-update
python matrix.py ru test-to-update

This should add the new gold standard files to svn, so be sure to commit them!

Adding Regression Tests

Whenever a new feature is added to the customization system, regression tests should be created to test this feature. Regression tests consist of a choices file and a test suite. Choices files can be any set of choices in the choices file format, but they must validate.

Ensure that the Grammar Matrix produces the grammar you expect given the choices file before adding the test. Here is a general workflow:

  1. Create txtsuite file with the sentences for the testsuite
  2. Create the corresponding choices file
  3. Use the current customization system version to create a grammar from the choices file
  4. Start up the lkb & [incr tsdb()]

  5. Load the grammar into the lkb
  6. In [incr tsdb()] use "import test items" to create a test suite profile from the txtsuite file
  7. In [incr tsdb()] use "process | all items" to parse the test items with the grammar
  8. In [incr tsdb()] use "browse | results" to manually explore both grammaticality and MRSs for each test item

  9. Adjust the customization system until the previous step yields the desired result
  10. For each change to the customization system, ensure that all other regression tests continue to pass

Tests take their name from the choices file's language name, so make sure to name each language appropriately. Generally, test/language names follow this general template: LIBRARY-PHENOMENON-INSTANCE; e.g. "neg-head-comp-vpauxbefore-compbefore". However, this is not an enforced standard.

Test suites are text files with one test item per line, negative (ungrammatical) items marked with an asterisk (*), such as:

New regression tests are created using matrix.py. First, put your choices file and test suite file into mybranch/gmcs/regression_tests/scratch/, and then do the following:

python matrix.py regression-test-add CHOICES_FILENAME TXT-SUITE_FILENAME

CHOICES_FILENAME and TXT-SUITE_FILENAME are just the filenames (not the full paths) of the files you added into the scratch/ directory.

This is what regression-test-add does under the bonnet (paths relative to gmcs/regression_tests):

  1. Customizes a grammar from the choices file
  2. Compiles the grammar with ACE
  3. Copies the choices file to choices/ and test suite to txt-suites/

  4. Adds the regression test to regression-test-index

  5. Creates an unparsed profile
  6. Parses the profile with ACE and art

  7. Uses the parsed profile as gold (in home/gold/lang-name/), and the item and relations files for a skeleton (in skeletons/lang-name)

  8. Updates the skeletons index (skeletons/Index.lisp)

Make sure to test your regression test after you add it:

python matrix.py regression-test NAME_OF_TEST_LANGUAGE

When you're finished adding all your regression tests:

svn commit

If you're having trouble, make sure your CUSTOMIZATIONROOT and ACEROOT are set up properly and take a look at MatrixRegressionTestingSetup

Maintaining regression tests

DO avoid editing regression-test-index directly or changing the contents of choices/ txt-suites/ skeletons/ or home/gold by hand. This code is probably fairly brittle wrt to files not being where they are expected.

Otherwise, no need to do anything here: We want choices files in old versions so that we are routinely testing the up-rev code.

Directory structure:

regression_tests/
        add_regression_test.py
        call-customize
        run_regression_tests.sh
        regression-test-Index
        regressiontestindex.py
        update-gold-standard.sh
        choices/
        txt-suites/
        skeletons/      [tsdb skeletons]
                Index.lisp
        home/           [tsdb home]
                gold/
                current/
        grammars/
        logs/
        scratch/

Notes

A typical reason for all tests failing is using a wrong customization root.

Everything in the home/current/, grammars/, and logs/ subdirectories is placed there and named by the scripts.

Language name in choices file used as the basis of the naming. We need a convention for them :)

One-to-one mapping between choices files and txt-suites and skeletons. We might end up with multiple txt-suites for the same "language", but we would still require separate choices files with the same choices except for the language name.

Scratch is the default place for putting choices files and txt-suites to play with and then eventually add to the system. The contents of scratch should be local only, not under svn.

After creating regression test, be sure to close [incr tsdb()] before running regression tests. Ongoing processes in [incr tsdb()] can block actions needed during the regression tests.

Comparision on the number of readings is still made by a funtion called compare-in-detail() that [incr tsdb()] provides. Comparision on MRS is made by an external software mrs-compare. There is no problem in using this for now, but in the future this may cause a potential problem with ambiguous sentences that have multiple MRSes. An alternative way would be using PyDelphin instead of mrs-compare.

Sometimes a new regression test name is not added to regression-test-index properly, specifically sometimes there is a newline missing after the last added test. This will result in an error and there is no helpful error message to go with it currently. One of the error messages that pop up here is python's "too many values to unpack"; another is "test not found". Check that you've added a new test name properly to regression-test-index.

MatrixRegressionTesting (last edited 2017-11-11 19:53:03 by OlgaZamaraeva)

(The DELPH-IN infrastructure is hosted at the University of Oslo)