Elementary Dependency Match (EDM; see Dridan & Oepen, 2011) is a granular evaluation metric based on the so-called 'ltriples' format that can be exported from a [incr tsdb()] profile. These triples are derived from the variable-free reduction of MRSs known as Elementary Dependency Structures (EDS), described by Oepen & Lønning, 2006. EDS describes almost all the semantics contained in an MRS (excepting scopal information), and can be divided into three types of semantic information:
- NAMES: predicate name to char span ARGS: ARG-type relations between char spans PROPS: features of predicates, such as TENSE and GENDER
An EDM evaluation is measured over all three types of ED, but other combinations are possible. EDM_NA evaluates predicate names and arguments, and is closest to other metrics such as GR, CCG dependencies etc. The default ouput for the evaluation script shows precision, recall and f-score over each relation separately, as well as typical aggregations.
To use this evaluation (following the original procedure of Dridan, 2007), you first need to set the data up as described below. Alternatively, there is a more recent reference implementation of EDM available as part of mtool, the Swiss Army Knife of Meaning Representation.
1. Export gold:
$LOGONROOT/lingo/lkb/src/tsdb/home/export --binary --format ltriples <gold profile>
2. Export test:
$LOGONROOT/lingo/lkb/src/tsdb/home/export --binary --format ltriples --active=all <test profile>
This should produce directories containing one gzipped file per item parsed. The ltriples should look like:
_treat_v_1<10:17> ARG2 _user_n_of<23:27>
The links (eg. <10:17>) are necessary to the evaluation, and if your output doesn't have them, ask StephanOepen why.
The Perl implementation is available in SVN:
svn co http://svn.delph-in.net/mu/evaluation/EDM/trunk
Usage: cat <goldfilelist>|./edm_eval.pl [-i] [-v] [-p <num>] [-s] <export directory>
- -i: ignore gold where parse failed -v: verbose output
-p <num>: parse number -s: raw figures for statistical significance calculations
To evaluate a profile:
- ls -1 jhk.gold/*|edm_eval.pl jhk.test
To evaluate a profile, only over files that received a parse:
- ls -1 jhk.gold/*|edm_eval.pl -i jhk.test
To evaluate a single item:
- echo jhk.gold/3025231.gz |edm_eval.pl jhk.test
To evaluate a specific analysis of a single item:
- echo jhk.gold/3025231.gz |edm_eval.pl -p 100 jhk.test
To examine the errors in a single item:
- echo jhk.gold/3025231.gz |edm_eval.pl -v jhk.test
To produce the files needed for statistical significance testing:
- for file in jhk.gold/*;
- do echo $file|edm_eval.pl -s jhk.test;
done > jhk.test.stats
An implementation of the computationally-intensive randomisation test described in:
- Alexander Yeh. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pages 947–953, Saarbruecken, Germany.
statsig_shuffle.pl <first stats file> <second stats file> [iterations] statsig_shuffle.pl jhk.gold.stats jhk.test.stats 10000