In order to facilitate comparison with and reproducibility of experiments using DELPH-IN data and tool sets, this page documents standard training and testing data sets for each grammar, and standard evaluation metrics and terminology. We encourage everyone to use the standards listed here, or to describe any deviations in terms of these standards.

Data

Evaluation Metrics

Coverage

Accuracy

It is important to specify whether these metrics are calculated over:

metrics

ReproducibilityStandards (last edited 2011-10-08 21:12:08 by localhost)

(The DELPH-IN infrastructure is hosted at the University of Oslo)