Background
This page (a work in progress, like so many others on this wiki) collects some practical and historic information around official snapshots of the ERG, e.g. officially released versions of the grammar.
(Pre-)Release MRS Quality Control
Once new treebanked and thinned profiles are available, run a set of automated wellformedness tests on the MRSs, for example:
$LOGONROOT/redwoods --terg --default \
--filter syntax,lnk,fragmentation erg/trunk/mrs/12-02-06/pet.1
Re-Generate the Core SEM-I
The bulk of the semantic interface (SEM-I) is auto-generated from the lexicon (recorded as core.smi):
(in-package :mt)
(with-open-file (stream "~/src/logon/lingo/terg/core.smi"
:direction :output :if-exists :supersede)
(print-semi (construct-semi) :format :compact :stream stream))The master file erg.smi is manually maintained and includes the auto-generated entries.
Validate and Update the Head Table
(in-package :tsdb) (read-heads "~/src/logon/lingo/terg/erg.hds" :test t)
Generate Maximum Entropy and PCFG Models
By default, the Maximum Entropy training scripts (re-)generate a fresh feature cache, hence the following two jobs must not run in parallel
sbatch ${LOGONROOT}/uio/titan/redwoods \
--redwoods --run train.wescience.lisp sbatch ${LOGONROOT}/uio/titan/redwoods \
--redwoods --run train.redwoods.lispFor the time being, there is only one PCFG model, trained off the full (non-testing) Redwoods collection:
sbatch ${LOGONROOT}/uio/titan/redwoods \
--redwoods --run pcfg.lisp
Update Summary Statistics of Redwoods Treebanks
Since its October 2010 release, the ERG includes a spreadsheet that summarizes key statistics of the gold-standard profiles that comprise the Redwoods Treebank. The raw data for addition to the file redwoods.xls can be generated automagically:
(in-package :tsdb)
(loop
with *phenomena* = nil
with *statistics-aggregate-dimension* = :phenomena
with *statistics-all-rejections-p* = t
with *tsdb-home* = (logon-directory "lingo/terg/tsdb/gold" :string)
initially (purge-profile-cache :all)
for db in (find-tsdb-directories)
for name = (get-field :database db)
do (analyze-trees name :append "/tmp/redwoods.csv" :format :csv))