Differences between revisions 6 and 7
Revision 6 as of 2020-01-14 08:53:55
Size: 3242
Comment: Point to SVN repo directly, at Bec's request
Revision 7 as of 2020-01-14 23:56:36
Size: 3495
Editor: StephanOepen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
MRSs known as ''Elementary Dependencies'', described by [[http://www.emmtee.net/bib/Oep:Lon:06.pdf|Oepen & Lønning, 2006]].
Elementary Dependencies describe almost all the semantics contained in an MRS (excepting scopal information), and can be divided into three types:
MRSs known as ''Elementary Dependency Structures'' (EDS), described by [[http://www.emmtee.net/bib/Oep:Lon:06.pdf|Oepen & Lønning, 2006]].
EDS describes almost all the semantics contained in an MRS (excepting scopal information), and can be divided into three types of semantic information:
Line 16: Line 16:
To use this evaluation, you first need to set the data up as described below. To use this evaluation (following the original procedure of Dridan, 2007), you first need to set the data up as described below.
Alternatively, there is a more recent reference implementation of EDM available as part of [[https://github.com/cfmrp/mtool|mtool]], the Swiss Army Knife of Meaning Representation.

Elementary Dependency Match (EDM; see Dridan & Oepen, 2011) is a granular evaluation metric based on the so-called 'ltriples' format that can be exported from a [incr tsdb()] profile. These triples are derived from the variable-free reduction of MRSs known as Elementary Dependency Structures (EDS), described by Oepen & Lønning, 2006. EDS describes almost all the semantics contained in an MRS (excepting scopal information), and can be divided into three types of semantic information:

  • NAMES: predicate name to char span ARGS: ARG-type relations between char spans PROPS: features of predicates, such as TENSE and GENDER

An EDM evaluation is measured over all three types of ED, but other combinations are possible. EDM_NA evaluates predicate names and arguments, and is closest to other metrics such as GR, CCG dependencies etc. The default ouput for the evaluation script shows precision, recall and f-score over each relation separately, as well as typical aggregations.

To use this evaluation (following the original procedure of Dridan, 2007), you first need to set the data up as described below. Alternatively, there is a more recent reference implementation of EDM available as part of mtool, the Swiss Army Knife of Meaning Representation.

Set up

1. Export gold:

  • $LOGONROOT/lingo/lkb/src/tsdb/home/export --binary --format ltriples <gold profile>

2. Export test:

  • $LOGONROOT/lingo/lkb/src/tsdb/home/export --binary --format ltriples --active=all <test profile>

This should produce directories containing one gzipped file per item parsed. The ltriples should look like:

  • _treat_v_1<10:17> ARG2 _user_n_of<23:27>

The links (eg. <10:17>) are necessary to the evaluation, and if your output doesn't have them, ask StephanOepen why.

Evaluate

The Perl implementation is available in SVN:

svn co http://svn.delph-in.net/mu/evaluation/EDM/trunk

Usage: cat <goldfilelist>|./edm_eval.pl [-i] [-v] [-p <num>] [-s] <export directory>

  • -i: ignore gold where parse failed -v: verbose output

    -p <num>: parse number -s: raw figures for statistical significance calculations

To evaluate a profile:

  • ls -1 jhk.gold/*|edm_eval.pl jhk.test

To evaluate a profile, only over files that received a parse:

  • ls -1 jhk.gold/*|edm_eval.pl -i jhk.test

To evaluate a single item:

  • echo jhk.gold/3025231.gz |edm_eval.pl jhk.test

To evaluate a specific analysis of a single item:

  • echo jhk.gold/3025231.gz |edm_eval.pl -p 100 jhk.test

To examine the errors in a single item:

  • echo jhk.gold/3025231.gz |edm_eval.pl -v jhk.test

To produce the files needed for statistical significance testing:

  • for file in jhk.gold/*;
    • do echo $file|edm_eval.pl -s jhk.test;

    done > jhk.test.stats

Significance Testing

An implementation of the computationally-intensive randomisation test described in:

  • Alexander Yeh. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pages 947–953, Saarbruecken, Germany.

Usage:

  • statsig_shuffle.pl <first stats file> <second stats file> [iterations] statsig_shuffle.pl jhk.gold.stats jhk.test.stats 10000

ElementaryDependencyMatch (last edited 2020-01-14 23:56:36 by StephanOepen)

(The DELPH-IN infrastructure is hosted at the University of Oslo)