Differences between revisions 8 and 9
Revision 8 as of 2013-06-24 21:26:23
Size: 3355
Editor: StephanOepen
Comment:
Revision 9 as of 2013-06-24 22:00:28
Size: 3695
Editor: StephanOepen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 56: Line 56:
The ACE parser-generator does not (yet) report the token lattices (before and after
token mapping, i.e. the `tsdb(1)` fields ‘`p-input`’ and ‘`p-tokens`’) to <<itsdb>>;
both in exporting and post-processing results as well as in regression testing (or
cross-platform comparison), precise token information is often desirable.

Background

This page provides (emerging) information on support in the LOGON tree for the Answer Constraint Engine (ACE). As of mid-2013, binaries for parsing-generating and treebanking with ACE are included with the LOGON tree, albeit (currently) only for 64-bit environments. These binaries are maintained by the ACE developer, WoodleyPackard.

Parsing

  $LOGONROOT/parse --binary --terg+tnt/ace --protocol 2 --best 1 --limit 0 --count 8 cb

Full-Forest Treebanking

To invoke full-forest treebanking, in the [incr tsdb()] podium, select Trees|Switches|External Treebanking Tool’. While this toggle is in effect, the ‘Trees|Annotate’ and ‘Trees|Updatecommands will invoke the external ACE Full-Forest Treebanker. To give an illusion of tight integration, the following [incr tsdb()] parameters will have an effect on the external treebanking tool: (a) selection of a ‘working set’ of items, through a condition on profile entries (e.g. as determined through ‘Options|TSQL Condition’ or ‘Options|New Condition’); (b) the selection of a ‘gold’ profile (by clicking the middle mouse button in the [incr tsdb()] podium), as the source for update information; and (c) the toggle for batch vs. interactive updates (‘Trees|Switches|Automatic Update’).

Furthermore, the invokation of the Answer treebanker application can be customized through ‘Trees|Variables|Treebanking Tool’ or setting the [incr tsdb()] variable *redwoods-treebanker-application* in the per-user ‘.tsdbrc’. The default value, for the time being, is "answer --annotate --terg".

In principle, it should work to follow the ‘common’ release procedure for treebank creation, i.e. first call for an automatic update immediately following the parsing, but adding the option ‘--update/external’ to the ‘parsecommand line. This functionality remains to be validated, though.

Known Issues

Edge identifiers in full ACE derivations (as reported to [incr tsdb()]) are not unique in the context of one input to the parser-generator; this means that ‘classic’ [incr tsdb()] treebanking tools (i.e. tree-based annotation, using syntactic or semantic discriminants) cannot be used on these profiles.

Derivations deposited in [incr tsdb()] profiles upon successful forest disambiguation fail to preserve the token identifiers (from the entries of the original forest), which may not matter to many downstream applications but would be a missing link in the conversion of ERG derivations to PTB-style tokenization.

The ACE parser-generator does not (yet) report the token lattices (before and after token mapping, i.e. the tsdb(1) fields ‘p-input’ and ‘p-tokens’) to [incr tsdb()]; both in exporting and post-processing results as well as in regression testing (or cross-platform comparison), precise token information is often desirable.

The exact interpretation of the ‘--best’ and ‘--limit’ parameters remains to be defined for forest-based parsing-generating (unlike in the classic setups, a --limit’ of 0 should probably mean recording no derivations, rather than an unlimited number of them; arguably, this interpretation should also be applied retroactively to n-best parsing-generating).

Resource limit specifications through command-line options to the LOGON parse or generate scripts (i.e. --time, --memory, and --edges) are not communicated to the ACE parser-generator.

LogonAnswer (last edited 2016-05-23 22:07:28 by StephanOepen)

(The DELPH-IN infrastructure is hosted at the University of Oslo)