Differences between revisions 1 and 21 (spanning 20 versions)
Revision 1 as of 2013-06-23 16:39:33
Size: 1991
Editor: StephanOepen
Comment:
Revision 21 as of 2013-07-09 19:38:09
Size: 5227
Editor: StephanOepen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl StephanOepen,WoodleyPackard:read,write,admin,delete,revert LogonGroup:read,write,admin All:read
Line 5: Line 7:
As of mid-2013, ACE binaries for parsing-generating and treebanking are included As of mid-2013, binaries for parsing-generating and treebanking with ACE
are included
Line 18: Line 21:
In the <<itsdb>> podium, select ‘''Trees|Switches|External Treebanking Tool''’.
While this toggle is in effect, the ‘''Trees|Annotate''’ and ‘''Trees|update''’
The `answer` wrapper script in the LOGON tree currently does not support the
full range of standard command-line options to activate grammars (that come
with the LOGON distribution), but only ‘`--erg`’ and ‘`--terg`’, for the
release and trunk versions of the ERG, respectively.

To invoke full-forest treebanking, in the <<itsdb>> podium, select
‘''Trees|Switches|External Treebanking Tool''’.
While this toggle is in effect, the ‘''Trees|Annotate''’ and ‘''Trees|Update''’
Line 29: Line 38:
Furthermore, the invokation of the ACE treebanker application can be customized Furthermore, the invocation of the Answer treebanker application can be customized
Line 32: Line 41:
The default value, for the time being, is `answer --annotate --terg’. The default value, for the time being, is `"answer --annotate --terg"`.

In principle, it should work to follow the ‘common’ release procedure for
treebank creation, i.e. first call for an automatic update immediately following
the parsing, by adding the option ‘`--update/external`’ to the ‘`parse`’
command line.
This functionality remains to be validated, though (in fact, items that do not
auto-update successfully are currently not flagged, i.e. it will not be possible
to select remaining unannotated items by virtue of a TSQL condition).
Line 40: Line 57:

The ACE parser-generator does not (yet) report the token lattices (before and after
token mapping, i.e. the `tsdb(1)` fields ‘`p-input`’ and ‘`p-tokens`’) to <<itsdb>>;
both in exporting and post-processing results as well as in regression testing (or
cross-platform comparison), precise token information is often desirable.

The exact interpretation of the ‘`--best`’ and ‘`--limit`’ parameters remains to be
defined for forest-based parsing-generating (unlike in the classic setups, a
‘`--limit`’ of `0` should probably mean recording no derivations, rather than an
unlimited number of them; arguably, this interpretation should also be applied
retroactively to n-best parsing-generating).

Resource limit specifications through command-line options to the LOGON `parse` or
`generate` scripts (i.e. `--time`, `--memory`, and `--edges`) are not communicated
to the ACE parser-generator.

Derivations deposited in <<itsdb>> profiles upon successful forest disambiguation
fail to preserve the token identifiers (from the entries of the original
forest), which may not matter to many downstream applications but would be
a missing link in the conversion of ERG derivations to PTB-style tokenization.


Use of the Answer treebanking tool for updates from an existing ‘gold’ profile
presupposes prior normalization of the ‘gold’ profile, i.e. the treebanker will
not honor in-profile versioning (through the ‘`t-version`’ field in the various
relations used to record annotations).

During automated updates, the treebanking tool will not explicitly flag items
that remain unannotated, i.e. for which the update was not successful. <<itsdb>>
provides the flag ‘''Trees|Switches|Update Flag Failures''’ (on by default, but
currently not communicated to the external treebanking tool) to make it easy to
select the sub-set of remaining items that require annotation.

The <Control-G> or <Control-C> key bindings in the <<itsdb>> podium are not yet
enabled during external treebanking.
Hence, it is vital to exit from the external treebanking tool by clicking on ''Exit''
in its browser window (or otherwise making it terminate, e.g. through a shell command
like ‘`killall fftb`’) to regain control in the <<itsdb>> podium.

Background

This page provides (emerging) information on support in the LOGON tree for the Answer Constraint Engine (ACE). As of mid-2013, binaries for parsing-generating and treebanking with ACE are included with the LOGON tree, albeit (currently) only for 64-bit environments. These binaries are maintained by the ACE developer, WoodleyPackard.

Parsing

  $LOGONROOT/parse --binary --terg+tnt/ace --protocol 2 --best 1 --limit 0 --count 8 cb

Full-Forest Treebanking

The answer wrapper script in the LOGON tree currently does not support the full range of standard command-line options to activate grammars (that come with the LOGON distribution), but only ‘--erg’ and ‘--terg’, for the release and trunk versions of the ERG, respectively.

To invoke full-forest treebanking, in the [incr tsdb()] podium, select Trees|Switches|External Treebanking Tool’. While this toggle is in effect, the ‘Trees|Annotate’ and ‘Trees|Updatecommands will invoke the external ACE Full-Forest Treebanker. To give an illusion of tight integration, the following [incr tsdb()] parameters will have an effect on the external treebanking tool: (a) selection of a ‘working set’ of items, through a condition on profile entries (e.g. as determined through ‘Options|TSQL Condition’ or ‘Options|New Condition’); (b) the selection of a ‘gold’ profile (by clicking the middle mouse button in the [incr tsdb()] podium), as the source for update information; and (c) the toggle for batch vs. interactive updates (‘Trees|Switches|Automatic Update’).

Furthermore, the invocation of the Answer treebanker application can be customized through ‘Trees|Variables|Treebanking Tool’ or setting the [incr tsdb()] variable *redwoods-treebanker-application* in the per-user ‘.tsdbrc’. The default value, for the time being, is "answer --annotate --terg".

In principle, it should work to follow the ‘common’ release procedure for treebank creation, i.e. first call for an automatic update immediately following the parsing, by adding the option ‘--update/external’ to the ‘parsecommand line. This functionality remains to be validated, though (in fact, items that do not auto-update successfully are currently not flagged, i.e. it will not be possible to select remaining unannotated items by virtue of a TSQL condition).

Known Issues

Edge identifiers in full ACE derivations (as reported to [incr tsdb()]) are not unique in the context of one input to the parser-generator; this means that ‘classic’ [incr tsdb()] treebanking tools (i.e. tree-based annotation, using syntactic or semantic discriminants) cannot be used on these profiles.

The ACE parser-generator does not (yet) report the token lattices (before and after token mapping, i.e. the tsdb(1) fields ‘p-input’ and ‘p-tokens’) to [incr tsdb()]; both in exporting and post-processing results as well as in regression testing (or cross-platform comparison), precise token information is often desirable.

The exact interpretation of the ‘--best’ and ‘--limit’ parameters remains to be defined for forest-based parsing-generating (unlike in the classic setups, a --limit’ of 0 should probably mean recording no derivations, rather than an unlimited number of them; arguably, this interpretation should also be applied retroactively to n-best parsing-generating).

Resource limit specifications through command-line options to the LOGON parse or generate scripts (i.e. --time, --memory, and --edges) are not communicated to the ACE parser-generator.

Derivations deposited in [incr tsdb()] profiles upon successful forest disambiguation fail to preserve the token identifiers (from the entries of the original forest), which may not matter to many downstream applications but would be a missing link in the conversion of ERG derivations to PTB-style tokenization.

Use of the Answer treebanking tool for updates from an existing ‘gold’ profile presupposes prior normalization of the ‘gold’ profile, i.e. the treebanker will not honor in-profile versioning (through the ‘t-version’ field in the various relations used to record annotations).

During automated updates, the treebanking tool will not explicitly flag items that remain unannotated, i.e. for which the update was not successful. [incr tsdb()] provides the flag ‘Trees|Switches|Update Flag Failures’ (on by default, but currently not communicated to the external treebanking tool) to make it easy to select the sub-set of remaining items that require annotation.

The <Control-G> or <Control-C> key bindings in the [incr tsdb()] podium are not yet enabled during external treebanking. Hence, it is vital to exit from the external treebanking tool by clicking on Exit in its browser window (or otherwise making it terminate, e.g. through a shell command like ‘killall fftb’) to regain control in the [incr tsdb()] podium.

LogonAnswer (last edited 2016-05-23 22:07:28 by StephanOepen)

(The DELPH-IN infrastructure is hosted at the University of Oslo)