SEM-I

A SEM-I, or SEMantic-Interface, is a description of the semantic structures output by the grammar, and may include entries for variables, properties, predicates, and roles. SEM-Is can be useful for validating the semantic output of grammars without having to load the entire grammar.

A related, but separate, component is the Variable Property Mapping (VPM), which maps grammar-internal variable types, properties, and property values into grammar-external ones. A SEM-I describes the valid grammar-external values, and hence the primary VPM for a grammar is conventionally called semi.vpm.

Note for Developers

As of March 2016, the 1.0 version of the SEM-I is available, which introduces support for predicate hierarchies among other changes. Previous iterations of SEM-Is were underexploited and are not described in the primary text of this wiki.

Sections

There are four sections in a SEM-I:

variables

Define variable type, their hierarchical relations, and allowed properties. E.g.:

   1 u.
   2 i < u.
   3 e < i : PERF bool, PROGR bool, MOOD bool, TENSE tense, SF sf.

properties

Define allowed property values and value hierarchies. E.g.:

   1 bool.
   2 + < bool.
   3 - < bool.

roles

Define allowed predicate roles and constraints on values. E.g.:

   1 ARG0 : i.
   2 RSTR : h.
   3 CARG : string.

predicates

Define the predicate hierarchy and predicate synopses (required and optional roles and constraints on role values). E.g.:

   1 _a+little_q < abstract_q : ARG0 i { NUM sg }, RSTR h, BODY h.
   2 _accuse_v_of : ARG0 e, ARG1 i, ARG2 p, [ ARG3 i ].

Predicate entries may be divided among several files. One file may contain just the hierarchical relations (e.g. hierarchy.smi in the ERG 1214), another for abstract predicates (e.g. abstract.smi), and another for surface predicates (e.g. surface.smi). Some very top-level, perhaps extragrammatical, entries may appear in the main .smi file as well (e.g. erg.smi).

.smi file syntax

The .smi files (e.g. erg.smi, hierarchy.smi etc.) use a simplified (non-TDL) syntax to characterize notions of inheritance (e.g. specializations of predicates) and appropriateness (e.g. the frame of arguments and associated value constraints associated with each predicate). Here is a descriptive example:

   1 ; comments begin with semicolons
   2 
   3 ; sections begin at column 0 and are followed by a colon
   4 variables:
   5   ; definitions (by convention) are indented
   6   ; entries end in .
   7   u.
   8   ; inheritance is specified by < with supertypes delimited by &
   9   i < u.
  10   ; features/properties follow : and are delimited by ,
  11   x < i & p : DIV bool, IND bool, GEND gender, PERS person, NUM number, PT pt.
  12 
  13 predicates:
  14   ; variable property constraints are bound by { and }, and are delimited by ,
  15   _acclimitization_n_1 : ARG0 x { NUM sg, IND - }.
  16   ; optional roles are bound by [ and ] (note that commas appear outside of [ and ])
  17   _advance_v_1 : ARG0 e, ARG1 i, [ ARG2 i ], [ ARG3 i ].
  18 
  19 ; external files can be included
  20 ; sections in included files are merged with sections in the main file
  21 include: surface.smi

This BNF describes the general syntax (whitespace is allowed around tokens):

   1 SEMI        := (Comment | Section | Include)*
   2 
   3 Comment     := /;.*/ EOL
   4 Section     := ("variables" | "properties" | "roles" | "predicates") ":" EOL Contents
   5 Include     := "include" ":" Filename EOL
   6 
   7 Contents    := (Comment | Entry)*
   8 Entry       := Identifier Parents? Features? "." EOL
   9 
  10 Parents     := "<" Identifier ("&" Identifier)*
  11 
  12 Features    := ":" (ReqFeats OptFeatures? | OptFeatures)
  13 ReqFeatures := Feature ("," Feature)*
  14 OptFeatures := "[" Feature "]" ("," "[" Feature "]")*
  15 Feature     := Identifier Value
  16 Value       := Identifier Constraints?
  17 Constraints := "{" Identifier Identifier ("," Identifier Identifier)* "}"
  18 
  19 Identifier  := /[^ ]+/
  20 EOL         := "\n"

To keep the BNF simple, I didn't specialize the sections, but some paths, such as OptFeatures and Constraints are only valid on entries in the "predicates" section, and the values of features on variables must be properties, whereas on predicates they are variables, etc.

Also, we can assume that entries that don't specify a list of parents inherit from some top symbol (like *top*). And string is another available symbol that can be used without being defined.

Including Files

The directory of an including file is used as the parent directory of the included file (i.e. the filename is relative). Thus, given the following directory structure:

start.smi
next.smi
subdir/
    a1.smi
    a2.smi

The start.smi file can include next.smi and a1.smi like this:

   1 include: next.smi
   2 include: subdir/a1.smi

And then a1.smi can subsequently include a2.smi like this:

   1 include: a2.smi

Implementation

Details concerning the implementation of SEM-Is in a grammar processor go here.

Redefined Predicate Hierarchies

When a predicate's hierarchical relationship is redefined (with the < operator), subsequent definitions should completely override previous definitions. This allows users of a grammar to dynamically make changes to a SEM-I (e.g., for use in some application) without having to rewrite the grammar.

For example, with the ERG 1214 release, there is a quantifier hierarchy that has existential_q as a relatively high-level node with many subtypes. In translation, one may use such a type for an underspecified quantifier, but this type may be too broad (e.g. for an MRS about dogs barking, you might get "Dogs bark.", "The dogs bark.", "Those dogs bark.", "Some dogs bark.", "Many a dog barks.", etc.). To restrict the hierarchy so there's a quantifier predicate that only generates "The dogs bark.", "The dog barks.", "A dog barks.", and "Dogs bark.", one could rewrite the hierarchy by including an additional SEM-I file as follows:

   1  predicates:
   2    def_udef_a_q < existential_q.
   3    def_explicit_q < def_udef_a_q.
   4    def_implicit_q < def_udef_a_q.
   5    udef_q < def_udef_a_q.
   6    _the_q < def_udef_a_q.
   7    _a_q < def_udef_a_q.

Here, def_udef_a_q is the supertype that will generate those 4 sentences.

Proposal based on discussion at the Abbey on 2013-08-28

* Predicate hierarchies done

* Linking preds that differ by sense (e.g. number of arguments, like "he ate" vs "he ate a banana"), or mass/count distinctions ("every paper" vs "all the paper"). This is not trying to recreate something like WordNet.

SemiRfc (last edited 2017-03-24 18:36:24 by MichaelGoodman)

(The DELPH-IN infrastructure is hosted at the University of Oslo)