Background

In an informal, grass-roots development (with partial support from the WeSearch project) this is an attempt at working towards an ‘encyclopedia’ of semantic analyses developed in the English Resource Grammar (ERG), i.e. what one might call documentation of the downstream interface to the ERG parser, or input interface to its generator.

This page and its descendants, for the time being at least, represent work in progress. Thus, read on with a grain of salt and at your own risk.

We organize this documentation in terms of what we consider semantic phenomena; the emerging inventory of phenomena is available as the ErgSemantics/Inventory, ordered lexicographically.

Fundamentals

Part of our goals in documenting the ERG semantics is to make explicit important differences in our degrees of confidence in individual analyses. In some cases, current semantic analyses reflect a careful design process (possibly building on supporting background literature or revisions of earlier attempts); in other cases, there may be known minor deficiencies; and for yet another group of semantic phenomena, current analyses may be mere placeholders (‘tying things together’ somehow, without a deep commitment to the specifics of the analysis) or plain broken, i.e. formally not well-formed or otherwise ludicrous.

Discovery Procedure

We developed a discovery procedure which starts from grammar entities (phrase structure rules, lexical rules, and lexical types) in the current version of the ERG to enable a data-driven exploration of semantic phenomena which have received treatments in the ERG to date. The discovery procedure starts by identifying grammar entities which are likely to contribute to the composition of semantic representations that go beyond the basics. In the case of phrase structure rules and lexical rules, we identified all instances with non-empty C-CONT.RELS values. In the case of lexical types, we found all those which exhibit at least one of the following properties:

We plan to also look for (classes of) lexical entries with CARGs and grammar predicates (suggesting lexical decomposition), but this has yet to happen. Each extracted grammar entity is associated with a signature, for now just the (multi-)set of PRED values in the EPs in its RELS list.

We then took the grammar entities and created rough clusters based on shared signatures, clustering only within broad grammar entity class (phrase structure rules, lexical rules, or lexical types). While it might be more informative to extend the sets of EPs to a more proper semantic signature, this was not done in the first pass.

Once the grammar entities and clusters were extracted, we indexed the existing collection of Redwoods treebanks (including DeepBank) for all grammar entity types. This enabled us to extract examples for each grammar entity of interest. These lists of examples include the three shortest available across the Redwoods corpora, plus all examples from the MRS test suite. Working from the phrase structure and lexical rule clusters and their associated examples, we produced an initial set of proposed phenomena to document (listed on the inventory page). Among the phenomena, we found a few types of information in the MRS which we believe to be MRS-based encodings of quasi-semantic or para-semantic phenomena. These are listed separately.

This procedure seems to have been effective for the rules, but less so for the lexical types. We looked some at the clusters and examples for lexical types, and from there were able to extract a non-exhaustive list of phenomena as well as a candidate set of basic components of semantic analyses which should be documented. These, too, are noted on the inventory page.

ERG Semantic Documentation (ESD) Test Suite

One aspect of the documentation produced in this work is a test suite illustrating each identified phenomenon with one or more short, simple sentences, attempting to balance restricted vocabulary size with the clarity of the intended reading of each example. This test suite can be viewed as an extension of the MRS Test Suite.

Semantic Fingerprints

In capturing semantic phenomena (and hopefully also in future work on automated regression testing) we invoke a notion of semantic fingerprints, i.e. characteristics of the MRS configuration that identify the phenomenon. We utilize a compact template language for MRS fingerprints (similar in form to the MRS LaTeX style) that makes the specification of labels and (characterization) links optional, and further allows wild-carding of predicate symbols and role labels (using ‘_’, i.e. just an underscore). For plain N–N compounding, as in garden dog, for example, we take the semantic fingerprint to look something like the following:

  h0:compound[ARG1 x1, ARG2 x2]
  h0:[ARG0 x1]
  [ARG0 x2]

In other words, the phenomenon is characterized by the appearance of the two-place compound relation, linking together another two EPs in the configuration indicated by the shared label h0 (of the compound head and the two-place modifier relation) and the shared referential indices x1 and x2. We do not include the covert quantifier required when the modifier is a non-quantified nominal, or the =q handle constraint holding between the udef_q and the EP introducing x2 (corresponding to garden in our example), because this part of the semantic analysis of the compound construction follows from the analyses of separate phenomena (though ones that are typically co-present with this type of compounding), i.e. general ERG assumptions about the representations of common nouns and quantifiers.

How to Cite this Work

Links

ErgSemantics (last edited 2014-07-14 15:28:52 by StephanOepen)

(The DELPH-IN infrastructure is hosted at the University of Oslo)