8204
Comment:
|
10506
Updating formal defs and serialization BNF (last commit was part of this one)
|
Deletions are marked like this. | Additions are marked like this. |
Line 12: | Line 12: |
{{{#!highlight tcl | {{{#!highlight ruby |
Line 68: | Line 68: |
their name value is a constant string on a general-purpose named_rel. The same thing happens with numbers and the generic card_rel. The value of a constant argument is not |
their name value is a constant string on a general-purpose `named_rel`. The same thing happens with numbers and the generic `card_rel`. The value of a constant argument is not |
Line 76: | Line 76: |
The XML format for MRS clearly distinguishes <constant> and <var> argument values, but | The XML format for MRS clearly distinguishes `<constant>` and `<var>` argument values, but |
Line 79: | Line 79: |
(*value-feats* for the LKB and mrs-value-feat for PET). These definitions suggest that | (`*value-feats*` for the LKB and `mrs-value-feat` for PET). These definitions suggest that |
Line 130: | Line 130: |
===== Simple MRS ===== {{{#!highlight ruby MRS = "[" Top Index Rels Hcons "]" |
==== Simple MRS ==== This is a BNF for the commonly used v1.0 form of SimpleMRS: {{{#!highlight ruby MRS := "[" Top Index Rels Hcons "]" |
Line 135: | Line 137: |
Var = /[A-Za-z][^\d\s]*\d+/ }}} ''<mrs>'' `::= "[" "LTOP:"` ''<var>'' `"INDEX:"` ''<var>'' [` "[" `''<vtype>'' ''<proplist>''` "]" `]` "RELS:" "<" `''<eplist>''` ">" "HCONS:" "<" `''<hconss>''` ">" "]"` ''<var>'' `::= /[A-Za-z][^\d\s]*\d+/` ''<vtype>'' `::= /[A-Za-z][^\d\s]*/` ''<proplist>'' `::= `(''<prop>'' ''<val>'')* ''<prop>'' `::= /[^\s]+:/` ''<val>'' `::= /[^\s]+/` ''<eplist>'' `::= `''<ep>''`*` ''<ep>'' `::= "[" `''<pred>'' ''<roles>''` "]"` ''<pred>'' `::= `(''<realpred>''|''<grammarpred>'')[''<span>''] ''<realpred>'' `::= /^"_((\\")|[^"]*)*_rel"/` ''<grammarpred>'' `::= /\w+_rel/` ''<span>'' `::= /<(-)?\d+:(-)?\d+>/` ''<roles>'' `::=` ''<role>''`*` ''<role>'' `::= "LBL:" `''<var>'' | `"CARG:" `''<string>'' | ''<rolename>'' ''<var>'' [`"[" `''<vtype>'' ''<proplist>''` "]"`] ''<rolename>'' `::= /[^\s]+:/` ''<string>'' `::= "string" `| ''<starredstring>'' | ''<quotedstring>'' ''<starredstring>'' `::= /\*\w*\*/` ''<quotedstring>'' `::= /"((\\")|[^"]*)*"/` ''<hconss>'' `::=` ''<hcons>''`*` ''<hcons>'' `::=` ''<var>'' ''<relation>'' ''<var>'' ''<relation>'' `::= "qeq" `|` "lheq" `|` "outscopes"` ===== Indexed MRS ===== ''<mrs>'' `::= "<"` <ltop> `","` <index> `","` <rels> `","` <hcons> `">"` ''<ltop>'' `::=` <var> ''<index>'' `::=` <var> ''<rels>'' `::= "{" "}"` | `"{"` [<rel> `","`]* <rel> `"}"` ''<rel>'' `::=` <varname>`":"`<predname>`"("`[<arg>`","`]* <arg>`")"` ''<arg>'' `::=` <var> | <string> | <constname> ''<var>'' `::=` <varname>[`":"`<constname>]* ''<hcons>'' `::= "{" "}"` | `"{"` [<qeq> `","`]* <qeq> `"}"` ''<qeq>'' `::=` <varname> <relnname> <varname> |
Index := "INDEX" ":" Var Handle := Variable Var := Variable VarProps? Variable := /[A-Za-z][-A-Za-z]*\d+/ VarProps := "[" VarSort ExtraPair* "]" VarSort := /[A-Za-z][-A-Za-z]*/ ExtraPair := Path ":" Value Path := /[A-Za-z]\w+/ Value := Token | QuotedString Token := /[^:\]>\s]+/ QuotedString := /"[^"\\]*(?:\\.[^"\\]*)*"/ Rels := "RELS" ":" "<" EP* ">" EP := "[" Pred Lnk? Label Rarg* Carg? "]" Pred := StringPred | TypePred StringPred := QuotedString TypePred := /_?([^_\s]+_)*(_rel)?/ Lnk := "<" /-?\d+/ ":" /-?\d+/ ">" Label := "LBL" ":" Handle Rarg := RargName ":" Var RargName := Token Carg := "CARG" ":" Value Hcons := "HCONS" ":" "<" Hcon* ">" Hcon := Var HconReln Handle HconReln := "qeq" | "lheq" | "outscopes" }}} The newer v1.1 format redefines some symbols to allow SimpleMRS to encode nearly all the same information as the XML format, including MRS-level Lnk values and surface strings. It also changes "LTOP" to "TOP" and includes ICONS. The following are just the redefined and additional symbols from the v1.0 BNF. {{{#!highlight ruby MRS := "[" Lnk? Surface? Top Index Rels Hcons Icons? "]" Top := "TOP" ":" Handle EP := "[" Pred Lnk? Surface? Label Rarg* Carg? "]" Surface := QuotedString Icons := "ICONS" ":" "<" Icon ">" Icon := Var IconReln Var IconReln := Token }}} ===== Other notes ===== The BNFs above are strict in some places, but robust in others. It also leaves out possible variations. * `Variable`, `VarSort`, and `Path` (variable property key) are perhaps too strict * `Token`, `StringPred`, and `TypePred` are perhaps too permissive * `IconReln` and `RargName` (and even `Path`) could be more specifically defined, but they depend on the grammar * `Lnk` only includes the character-span lnk type, but there are 3 other (uncommon and less-supported) types: chart-span, edge-number, and token-list ==== XML MRS (aka MRX) ==== This [[http://relaxng.org/|RelaxNG]] compact schema defines the XML format for MRS. Note that this version includes ICONS, while the [[http://svn.emmtee.net/trunk/lingo/lkb/src/mrs/mrs.dtd|original DTD]] does not. {{{#!highlight ruby start = MrsList MrsList = element mrs-list { Mrs* } Mrs = element mrs { Label, Var, (EP|Hcons|Icons)*, attribute cfrom { xsd:int }?, attribute cto { xsd:int }?, attribute surface { text }?, attribute ident { text }? } EP = element ep { (Pred|SPred|RealPred), Label, FVPair*, attribute cfrom { xsd:int }?, attribute cto { xsd:int }?, attribute surface { text }?, attribute base { text }? } Pred = element pred { text } SPred = element spred { text } RealPred = element realpred { attribute lemma { text }, attribute pos { xsd:string { pattern="[nvajrscpqxud]" } }, attribute sense { text }? } Label = element label { ExtraPair*, attribute vid { xsd:int } } Var = element var { ExtraPair*, attribute vid { xsd:int }, attribute sort { text } } ExtraPair = element extrapair { element path { text }, element value { text } } FVPair = element fvpair { element rargname { text }, ( Var | element constant { text } ) } Hcons = element hcons { element hi { Var }, element lo { ( Label | Var ) }, attribute hreln { "qeq" | "lheq" | "outscopes" } } Icons = element icons { element iarg1 { Var }, element iarg2 { Var }, attribute ireln { text } } }}} ==== Indexed MRS ==== Mostly for historical interest, here is an incompete BNF for the Indexed MRS format, adapted from comments in the [[http://svn.emmtee.net/trunk/lingo/lkb/src/mrs/basemrs.lisp|LKB source]]: {{{#!highlight ruby MRS := "<" Ltop "," Index "," Rels "," Hcons ">" Ltop := Var Index := Var Rels := "{" ( Rel ( "," Rel )* )? "}" Hcons := "{" ( Qeq ( "," Qeq )* )? "}" Rel := VarName ":" PredName "(" Arg ( "," Arg )* ")" Arg := Var | QuotedString | ConstName Var := VarName (":" ConstName)* Qeq := VarName RelnName VarName }}} |
Request For Comments: MRS
Overview
Minimal Recursion Semantics (MRS; see Copestake et al., 2005) is a framework for computational semantics characterised by a flat structure (hence the "minimal recursion"). It allows for underspecification, so true scopal ambiguities can be left ambiguous, or fully specified if needed. This RFC aims to be a reference document for developers writing code to process MRSs. See below for the formal properties of MRS objects. For ways to represent MRS objects textually, see the serialization formats section.
Unless otherwise noted, most information below is adapted from Copestake et al. (2005). The reader is referred to this paper for more information on the theory of MRS.
Here is an example in the Simple-MRS serialization of the sentence "The road rises from there."
1 [ LTOP: h1
2 INDEX: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ]
3 RELS: < [ _the_q_rel<0:3> LBL: h3 ARG0: x5 [ x PERS: 3 NUM: SG IND: + ] RSTR: h6 BODY: h4 ]
4 [ "_road_n_1_rel"<4:8> LBL: h7 ARG0: x5 ]
5 [ "_rise_v_1_rel"<9:14> LBL: h8 ARG0: e2 ARG1: x5 ]
6 [ _from_p_dir_rel<15:19> LBL: h8 ARG0: e9 [ e SF: PROP TENSE: UNTENSED MOOD: INDICATIVE PROG: - PERF: - ] ARG1: e2 ARG2: x10 [ x PERS: 3 NUM: SG ] ]
7 [ place_n_rel<20:26> LBL: h11 ARG0: x10 ]
8 [ def_implicit_q_rel<20:26> LBL: h12 ARG0: x10 RSTR: h13 BODY: h14 ]
9 [ _there_a_1_rel<20:26> LBL: h11 ARG0: e15 [ e SF: PROP TENSE: UNTENSED MOOD: INDICATIVE PROG: - PERF: - ] ARG1: x10 ] >
10 HCONS: < h6 qeq h7 h13 qeq h11 > ]
Formal Properties
As defined by Copestake et al. 2005, MRS objects are partly composed of Elementary Predications (EPs), which are defined as the following 4-tuple:
EP : < h, p, a, s >
Where:
- h is the handle, or label, of the EP
p is the relation, or "predicate"
- a is a list of 0 or more variable arguments of the relation
- s is a list of 0 or more scopal arguments of the relation
A MRS structure is the following triple:
MRS : < gt, r, c >
Where:
- gt is the top handle (the label of the highest EP)
- r is a bag of EPs
- c is a bag of handle constraints
Modern usage of MRS, however, introduces several other elements that were not discussed in Copestake et al. 2005, leading to these expanded definitions:
EP : < h, p, a, s, c >
Where:
- c is a constant value (e.g. a string, for numbers, names, etc.)
MRS : < gt, ind, r, c, i >
Where:
- ind is an index (i.e. the top individual as opposed to the top handle)
- i is a bag of individual constraints (i.e. ICONS)
The additional elements are explained below.
Constant Value
Proper names, like "Kim" or "IBM", do not always get their own predicate, but rather their name value is a constant string on a general-purpose named_rel. The same thing happens with numbers and the generic card_rel. The value of a constant argument is not a variable, handle, or hole, but just a string that is not reentrant to the MRS graph. Most kinds of predications do not include a constant argument.
Note for Developers
The XML format for MRS clearly distinguishes <constant> and <var> argument values, but other serializations do not make it as clear. If one has access to grammar definition files, then there are a definitions for the argument name for a constant argument (*value-feats* for the LKB and mrs-value-feat for PET). These definitions suggest that a predication may contain no more than one constant argument. Without such definitions, a solution may be to look for argument values that are quoted, or simply ones that don't look like variables, and treat those as constant values.
INDEX
There is some debate about the status of INDEX. It is not part of the formal definition of a complete MRS (see Copestake et al., 2005), hence Ann has at times argued it should be suppressed when constructing an MRS from its TFS description. On that point of view, INDEX is an element of the composition process, but not the 'final' product.
Conversely, it has been argued (by Dan and Francis, among others) that composition does not stop at the utterance level, i.e. if we were to move into discourse-level analysis, we might still need access to INDEX. Furthermore, in semantic transfer it is often convenient to have access to the INDEX (even more so as the current ERG leaves the TOP underspecified). In conclusion, as of mid-2011, I believe INDEX can be considered a legitimate component of MRSs, even though it remains true that it has a slightly different formal status than the others
Individual Constraints (ICONS)
Where handle constraints encode relations between holes and labels, individual constraints encode relations between individual (referential-index or eventuality) variables. One use of ICONS is for encoding Information Structure (see Song and Bender, 2012). Individual Constraints are supported by most processors of MRS, but, being relatively new, are not yet used by most grammars.
Note for Developers
Unlike handle constraints, individual constraints may use variables that are not instantiated on a predication (e.g., for a dropped or contextual predication). Furthermore, they do not always co-occur with any scopal or non-scopal arguments.
General Remarks
Predicate names are not case-sensitive, but constants (CARGs) are. Furthermore, even though much current MRS manipulation software maintains a distinction between double-quoted predicate names (corresponding to Lisp strings) and non-quoted ones (corresponding to Lisp symbols, often naming types in some hierarchy); this distinction is not meaningful either and arguably should be suppressed in MRS in- and output. More information is available at PredicateRfc.
Serialization Formats
Simple MRS
This is a BNF for the commonly used v1.0 form of SimpleMRS:
1 MRS := "[" Top Index Rels Hcons "]"
2 Top := "LTOP" ":" Handle
3 Index := "INDEX" ":" Var
4 Handle := Variable
5 Var := Variable VarProps?
6 Variable := /[A-Za-z][-A-Za-z]*\d+/
7 VarProps := "[" VarSort ExtraPair* "]"
8 VarSort := /[A-Za-z][-A-Za-z]*/
9 ExtraPair := Path ":" Value
10 Path := /[A-Za-z]\w+/
11 Value := Token | QuotedString
12 Token := /[^:\]>\s]+/
13 QuotedString := /"[^"\\]*(?:\\.[^"\\]*)*"/
14 Rels := "RELS" ":" "<" EP* ">"
15 EP := "[" Pred Lnk? Label Rarg* Carg? "]"
16 Pred := StringPred | TypePred
17 StringPred := QuotedString
18 TypePred := /_?([^_\s]+_)*(_rel)?/
19 Lnk := "<" /-?\d+/ ":" /-?\d+/ ">"
20 Label := "LBL" ":" Handle
21 Rarg := RargName ":" Var
22 RargName := Token
23 Carg := "CARG" ":" Value
24 Hcons := "HCONS" ":" "<" Hcon* ">"
25 Hcon := Var HconReln Handle
26 HconReln := "qeq" | "lheq" | "outscopes"
The newer v1.1 format redefines some symbols to allow SimpleMRS to encode nearly all the same information as the XML format, including MRS-level Lnk values and surface strings. It also changes "LTOP" to "TOP" and includes ICONS. The following are just the redefined and additional symbols from the v1.0 BNF.
Other notes
The BNFs above are strict in some places, but robust in others. It also leaves out possible variations.
Variable, VarSort, and Path (variable property key) are perhaps too strict
Token, StringPred, and TypePred are perhaps too permissive
IconReln and RargName (and even Path) could be more specifically defined, but they depend on the grammar
Lnk only includes the character-span lnk type, but there are 3 other (uncommon and less-supported) types: chart-span, edge-number, and token-list
XML MRS (aka MRX)
This RelaxNG compact schema defines the XML format for MRS. Note that this version includes ICONS, while the original DTD does not.
1 start = MrsList
2 MrsList = element mrs-list { Mrs* }
3 Mrs = element mrs {
4 Label,
5 Var,
6 (EP|Hcons|Icons)*,
7 attribute cfrom { xsd:int }?,
8 attribute cto { xsd:int }?,
9 attribute surface { text }?,
10 attribute ident { text }?
11 }
12 EP = element ep {
13 (Pred|SPred|RealPred),
14 Label,
15 FVPair*,
16 attribute cfrom { xsd:int }?,
17 attribute cto { xsd:int }?,
18 attribute surface { text }?,
19 attribute base { text }?
20 }
21 Pred = element pred { text }
22 SPred = element spred { text }
23 RealPred = element realpred {
24 attribute lemma { text },
25 attribute pos { xsd:string { pattern="[nvajrscpqxud]" } },
26 attribute sense { text }?
27 }
28 Label = element label {
29 ExtraPair*,
30 attribute vid { xsd:int }
31 }
32 Var = element var {
33 ExtraPair*,
34 attribute vid { xsd:int },
35 attribute sort { text }
36 }
37 ExtraPair = element extrapair {
38 element path { text },
39 element value { text }
40 }
41 FVPair = element fvpair {
42 element rargname { text },
43 ( Var | element constant { text } )
44 }
45 Hcons = element hcons {
46 element hi { Var },
47 element lo { ( Label | Var ) },
48 attribute hreln { "qeq" | "lheq" | "outscopes" }
49 }
50 Icons = element icons {
51 element iarg1 { Var },
52 element iarg2 { Var },
53 attribute ireln { text }
54 }
Indexed MRS
Mostly for historical interest, here is an incompete BNF for the Indexed MRS format, adapted from comments in the LKB source:
1 MRS := "<" Ltop "," Index "," Rels "," Hcons ">"
2 Ltop := Var
3 Index := Var
4 Rels := "{" ( Rel ( "," Rel )* )? "}"
5 Hcons := "{" ( Qeq ( "," Qeq )* )? "}"
6 Rel := VarName ":" PredName "(" Arg ( "," Arg )* ")"
7 Arg := Var | QuotedString | ConstName
8 Var := VarName (":" ConstName)*
9 Qeq := VarName RelnName VarName