The script to run the coverage test is under zhong/cmn/zhs/utils. (The test script for other languages will be added later).
$ ./coverage.sh [--data LIST_FILE_NAME] \ [--best b] \ [--timeout sec] \ [--max-chart-megabytes mb] \ [--max-unpack-megabytes mb]
The default option values are as follows.
timeout: 5 (sec.)
max-chart-megabyte: 256 (mb) for parsing, 2048mb for generation (fixed)
max-unpack-megabyte: 256 (mb) for parsing, 2048mb for generation (fixed)
If you run the following script, it will test both the regression test with all testsuites for comparison and the coverage test with all the following datasets.
Development Set (dev)
This set is used to develop the grammar, and we can sometimes see the items and look into where the problem in processing comes from. This set includes the followings. (The LIST_FILE_NAME is DEV.)
mrs: Matrix MRS testsuite (107 sentences) (see MatrixMrsTestSuiteMandarin)
ntumc: the NTU-MC corpus (http://compling.hss.ntu.edu.sg/ntumc/)
pctb-dev: Penn Chinese Treebank (LDC10T07; the first 5,000 sentences). This profile is not included in the repository, because we have no permission to redistribute it.
sinica-dev: The Sinica Treebank include in the NLTK (~/nltk_data/treebank/sinica). This profile has the first 5,000 sentences, and it is not include in the repository for the same reason as pctb-dev.
The held-out dataset refers to a similarly constructed test set by different groups and possibly from a different point of view. This dataset aims to see how much out grammar is good from an objective stance. This set includes the followings. (The LIST_FILE_NAME is HELDOUT.)
jec: JEC Basic Sentence Data (created by Kurohashi and Kawahra Lab., http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php). This is a trilingual dataset, whose components are written in Japanese, English, and Chinese. These data can be used for testing machine translation later.
fu-berlin: This testsuite was created as a project of constructing "An HPSG Fragament of Chinese" by Stefan Müller's group (Freie Universität Berlin). This includes the basic grammatical and ungrammatical sentences to be checked in developing the basis of Chinese grammar in HPSG. (see https://hpsg.fu-berlin.de/Fragments/Chinese/)
mcg-smallest: Mandarin Chinese Grammar
mcg-wxl: The data above and this were constructed based on the phenomenon-oriented test suite from Dr. Xiangli Wang. This is similar to fu-berlin, but is more likely to be based on the DELPH-IN framework.
Test Set (test)
These data should not be touched and seen by developers. These data are tested every Friday to see the progress state of our current grammar. For the copyright reason, these data are not included in the repository. (The LIST_FILE_NAME is TEST.)
pctb-test: Penn Chinese Treebank (LDC10T07; the second5,000 sentences).
sinica-test: The Sinica Treebank include in the NLTK (~/nltk_data/treebank/sinica, the second 5,000 sentences).
The coverage testing employs several DELPH-IN tools. All the paths of the tools below must be enrolled as PATH in ~/.bashrc (or whatever).
ace: see AceTop
pyDelphin: see https://github.com/goodmami/pydelphin
gTest: see https://github.com/goodmami/gtest
What to be calculated
Parsing Coverage: The parsing coverage is computed in terms of four choices, including ordinary, unknown word handling, robust parsing and the combination of the last two.
Generation Coverage: This is computed by one-best parsing and generation.
End-to-end Coverage: Parsing Coverage * Generation Coverage
History of Coverage