Differences between revisions 21 and 22
Revision 21 as of 2013-07-09 19:38:09
Size: 5227
Editor: StephanOepen
Comment:
Revision 22 as of 2013-07-26 09:08:36
Size: 4065
Comment:
Deletions are marked like this. Additions are marked like this.
Line 53: Line 53:
Edge identifiers in full ACE derivations (as reported to <<itsdb>>) are not unique in the
context of one input to the parser-generator; this means that ‘classic’ <<itsdb>> treebanking
tools (i.e. tree-based annotation, using syntactic or semantic discriminants) cannot be
used on these profiles.

The ACE parser-generator does not (yet) report the token lattices (before and after
token mapping, i.e. the `tsdb(1)` fields ‘`p-input`’ and ‘`p-tokens`’) to <<itsdb>>;
both in exporting and post-processing results as well as in regression testing (or
cross-platform comparison), precise token information is often desirable.

The exact interpretation of the ‘`--best`’ and ‘`--limit`’ parameters remains to be
defined for forest-based parsing-generating (unlike in the classic setups, a
‘`--limit`’ of `0` should probably mean recording no derivations, rather than an
unlimited number of them; arguably, this interpretation should also be applied
retroactively to n-best parsing-generating).
Line 72: Line 56:

Derivations deposited in <<itsdb>> profiles upon successful forest disambiguation
fail to preserve the token identifiers (from the entries of the original
forest), which may not matter to many downstream applications but would be
a missing link in the conversion of ERG derivations to PTB-style tokenization.
Line 88: Line 66:
select the sub-set of remaining items that require annotation. select the sub-set of remaining items that require annotation. (As of July 26 2013,
the treebanking tool flags these by setting t-active = -1, but for some reason
<<itsdb>> is currently unwilling to apply a TSQL condition to the FFTB invocation).

Background

This page provides (emerging) information on support in the LOGON tree for the Answer Constraint Engine (ACE). As of mid-2013, binaries for parsing-generating and treebanking with ACE are included with the LOGON tree, albeit (currently) only for 64-bit environments. These binaries are maintained by the ACE developer, WoodleyPackard.

Parsing

  $LOGONROOT/parse --binary --terg+tnt/ace --protocol 2 --best 1 --limit 0 --count 8 cb

Full-Forest Treebanking

The answer wrapper script in the LOGON tree currently does not support the full range of standard command-line options to activate grammars (that come with the LOGON distribution), but only ‘--erg’ and ‘--terg’, for the release and trunk versions of the ERG, respectively.

To invoke full-forest treebanking, in the [incr tsdb()] podium, select Trees|Switches|External Treebanking Tool’. While this toggle is in effect, the ‘Trees|Annotate’ and ‘Trees|Updatecommands will invoke the external ACE Full-Forest Treebanker. To give an illusion of tight integration, the following [incr tsdb()] parameters will have an effect on the external treebanking tool: (a) selection of a ‘working set’ of items, through a condition on profile entries (e.g. as determined through ‘Options|TSQL Condition’ or ‘Options|New Condition’); (b) the selection of a ‘gold’ profile (by clicking the middle mouse button in the [incr tsdb()] podium), as the source for update information; and (c) the toggle for batch vs. interactive updates (‘Trees|Switches|Automatic Update’).

Furthermore, the invocation of the Answer treebanker application can be customized through ‘Trees|Variables|Treebanking Tool’ or setting the [incr tsdb()] variable *redwoods-treebanker-application* in the per-user ‘.tsdbrc’. The default value, for the time being, is "answer --annotate --terg".

In principle, it should work to follow the ‘common’ release procedure for treebank creation, i.e. first call for an automatic update immediately following the parsing, by adding the option ‘--update/external’ to the ‘parsecommand line. This functionality remains to be validated, though (in fact, items that do not auto-update successfully are currently not flagged, i.e. it will not be possible to select remaining unannotated items by virtue of a TSQL condition).

Known Issues

Resource limit specifications through command-line options to the LOGON parse or generate scripts (i.e. --time, --memory, and --edges) are not communicated to the ACE parser-generator.

Use of the Answer treebanking tool for updates from an existing ‘gold’ profile presupposes prior normalization of the ‘gold’ profile, i.e. the treebanker will not honor in-profile versioning (through the ‘t-version’ field in the various relations used to record annotations).

During automated updates, the treebanking tool will not explicitly flag items that remain unannotated, i.e. for which the update was not successful. [incr tsdb()] provides the flag ‘Trees|Switches|Update Flag Failures’ (on by default, but currently not communicated to the external treebanking tool) to make it easy to select the sub-set of remaining items that require annotation. (As of July 26 2013, the treebanking tool flags these by setting t-active = -1, but for some reason [incr tsdb()] is currently unwilling to apply a TSQL condition to the FFTB invocation).

The <Control-G> or <Control-C> key bindings in the [incr tsdb()] podium are not yet enabled during external treebanking. Hence, it is vital to exit from the external treebanking tool by clicking on Exit in its browser window (or otherwise making it terminate, e.g. through a shell command like ‘killall fftb’) to regain control in the [incr tsdb()] podium.

LogonAnswer (last edited 2016-05-23 22:07:28 by StephanOepen)

(The DELPH-IN infrastructure is hosted at the University of Oslo)