Size: 2363
Comment:
|
Size: 5801
Comment: Added category
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
Minimal Recursion Semantics (MRS; see Copestake et al., 2005) is a framework for computational semantics characterised by a flat structure (hence the "minimal recursion"). It allows for underspecification, so true scopal ambiguities can be left ambiguous, or fully specified if needed. This RFC aims to be a reference document for developers writing code to process MRSs. See below for the [[#FormalProperties|formal properties]] of MRS objects. For ways to represent MRS objects textually, see the [[#SerializationFormats|serialization formats]] section. Unless otherwise noted, most information below is adapted from Copestake et al. (2005). The reader is referred to this paper for more information on the theory of MRS. Here is an example in the Simple-MRS serialization of the sentence "The road rises from there." ## Consider creating a SimpleMrs parser for syntax highlighting {{{#!SimpleMrs [ LTOP: h1 INDEX: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ] RELS: < [ _the_q_rel<0:3> LBL: h3 ARG0: x5 [ x PERS: 3 NUM: SG IND: + ] RSTR: h6 BODY: h4 ] [ "_road_n_1_rel"<4:8> LBL: h7 ARG0: x5 ] [ "_rise_v_1_rel"<9:14> LBL: h8 ARG0: e2 ARG1: x5 ] [ _from_p_dir_rel<15:19> LBL: h8 ARG0: e9 [ e SF: PROP TENSE: UNTENSED MOOD: INDICATIVE PROG: - PERF: - ] ARG1: e2 ARG2: x10 [ x PERS: 3 NUM: SG ] ] [ place_n_rel<20:26> LBL: h11 ARG0: x10 ] [ def_implicit_q_rel<20:26> LBL: h12 ARG0: x10 RSTR: h13 BODY: h14 ] [ _there_a_1_rel<20:26> LBL: h11 ARG0: e15 [ e SF: PROP TENSE: UNTENSED MOOD: INDICATIVE PROG: - PERF: - ] ARG1: x10 ] > HCONS: < h6 qeq h7 h13 qeq h11 > ] }}} |
|
Line 8: | Line 27: |
Required properties: * TOP * INDEX * RELS * HCONS |
MRS objects are partly composed of Elementary Predications (EPs), which are defined as the following 4-tuple: |
Line 14: | Line 29: |
Note that there is debate about the status of INDEX. It is not part of the formal definition of a complete MRS (see Copestake et al., 2005), hence Ann has at times argued it should be suppressed when constructing an MRS from its TFS description. On that point of view, INDEX is an element of the composition process, but not the 'final' product. |
EP : < h, r, a*, s* > Where: * h is the handle, or label, of the EP * r is the relation * a* is a list of 0 or more variable arguments of the relation * s* is a list of 0 or more scopal arguments of the relation A MRS structure is the following triple: MRS : < gt, r, c > Where: * gt is the top handle (the label of the highest EP) * r is a bag of EPs * c is a bag of handle constraints ==== INDEX ==== Note that there is another property of MRS structures: INDEX. There is some debate about the status of INDEX. It is not part of the formal definition of a complete MRS (see Copestake et al., 2005), hence Ann has at times argued it should be suppressed when constructing an MRS from its TFS description. On that point of view, INDEX is an element of the composition process, but not the 'final' product. |
Line 28: | Line 60: |
== Syntaces == | == General Remarks == Predicate names are ''not'' case-sensitive, but constants (`CARG`s) are. Furthermore, even though much current MRS manipulation software maintains a distinction between double-quoted predicate names (corresponding to Lisp strings) and non-quoted ones (corresponding to Lisp symbols, often naming types in some hierarchy); this distinction is not meaningful either and arguably should be suppressed in MRS in- and output. {{{#!wiki caution '''Variable properties on HCONS variables''' The DTD seemingly allows variable properties on any var or label, which includes the HCONS. Is this truly possible or encouraged? }}} <<Anchor(SerializationFormats)>> == Serialization Formats == |
Line 33: | Line 82: |
'''mrs''' `::= "[" "LTOP:"` '''var''' `"INDEX:"` '''var''' `[ "[" `'''vtype''' '''proplist'''` "]" ] "RELS:" "<" `'''eplist'''` ">" "HCONS:" "<" `'''hconss'''` ">" "]"` | ''<mrs>'' `::= "[" "LTOP:"` ''<var>'' `"INDEX:"` ''<var>'' [` "[" `''<vtype>'' ''<proplist>''` "]" `]` "RELS:" "<" `''<eplist>''` ">" "HCONS:" "<" `''<hconss>''` ">" "]"` |
Line 35: | Line 84: |
'''var''' `::= /[A-Za-z][^\d\s]*\d+/` | ''<var>'' `::= /[A-Za-z][^\d\s]*\d+/` |
Line 37: | Line 86: |
'''vtype''' `::= /[A-Za-z][^\d\s]*/` | ''<vtype>'' `::= /[A-Za-z][^\d\s]*/` |
Line 39: | Line 88: |
'''proplist''' `::= (`'''prop''' '''val'''`)*` | ''<proplist>'' `::= `(''<prop>'' ''<val>'')* |
Line 41: | Line 90: |
'''prop''' `::= /[^\s]+:/` | ''<prop>'' `::= /[^\s]+:/` |
Line 43: | Line 92: |
'''val''' `::= /[^\s]+/` | ''<val>'' `::= /[^\s]+/` |
Line 45: | Line 94: |
'''eplist''' `::= `'''ep'''`*` | ''<eplist>'' `::= `''<ep>''`*` |
Line 47: | Line 96: |
'''ep''' `::= "[" `'''pred''' '''roles'''` "]"` | ''<ep>'' `::= "[" `''<pred>'' ''<roles>''` "]"` |
Line 49: | Line 98: |
'''pred''' `::= (`'''realpred'''`|`'''grammarpred'''`)[`'''span'''`]` | ''<pred>'' `::= `(''<realpred>''|''<grammarpred>'')[''<span>''] |
Line 51: | Line 100: |
'''realpred''' `::= /^"_((\\")|[^"]*)*_rel"/` | ''<realpred>'' `::= /^"_((\\")|[^"]*)*_rel"/` |
Line 53: | Line 102: |
'''grammarpred''' `::= /\w+_rel/` | ''<grammarpred>'' `::= /\w+_rel/` |
Line 55: | Line 104: |
'''span''' `::= /<(-)?\d+:(-)?\d+>/` | ''<span>'' `::= /<(-)?\d+:(-)?\d+>/` |
Line 57: | Line 106: |
'''roles''' `::=` '''role'''`*` | ''<roles>'' `::=` ''<role>''`*` |
Line 59: | Line 108: |
'''role''' `::= "LBL:" `'''var'''` | "CARG:" `'''string'''` | `'''rolename''' '''var'''` ["[" `'''vtype''' '''proplist'''` "]"]` | ''<role>'' `::= "LBL:" `''<var>'' | `"CARG:" `''<string>'' | ''<rolename>'' ''<var>'' [`"[" `''<vtype>'' ''<proplist>''` "]"`] |
Line 61: | Line 110: |
'''rolename''' `::= /[^\s]+:/` | ''<rolename>'' `::= /[^\s]+:/` |
Line 63: | Line 112: |
'''string''' `::= "string" | `'''starredstring'''` | `'''quotedstring''' | ''<string>'' `::= "string" `| ''<starredstring>'' | ''<quotedstring>'' |
Line 65: | Line 114: |
'''starredstring''' `::= /\*\w*\*/` | ''<starredstring>'' `::= /\*\w*\*/` |
Line 67: | Line 116: |
'''quotedstring''' `::= /"((\\")|[^"]*)*"/` | ''<quotedstring>'' `::= /"((\\")|[^"]*)*"/` |
Line 69: | Line 118: |
'''hconss''' `::=` '''hcons'''`*` | ''<hconss>'' `::=` ''<hcons>''`*` |
Line 71: | Line 120: |
'''hcons''' `::=` '''var''' '''relation''' '''var''' | ''<hcons>'' `::=` ''<var>'' ''<relation>'' ''<var>'' |
Line 73: | Line 122: |
'''relation''' `::= "qeq" | "lheq" | "outscopes"` | ''<relation>'' `::= "qeq" `|` "lheq" `|` "outscopes"` |
Line 78: | Line 127: |
''<mrs>'' `::= "<"` <ltop> `","` <index> `","` <rels> `","` <hcons> `">"` ''<ltop>'' `::=` <var> ''<index>'' `::=` <var> ''<rels>'' `::= "{" "}"` | `"{"` [<rel> `","`]* <rel> `"}"` ''<rel>'' `::=` <varname>`":"`<predname>`"("`[<arg>`","`]* <arg>`")"` ''<arg>'' `::=` <var> | <string> | <constname> ''<var>'' `::=` <varname>[`":"`<constname>]* ''<hcons>'' `::= "{" "}"` | `"{"` [<qeq> `","`]* <qeq> `"}"` ''<qeq>'' `::=` <varname> <relnname> <varname> ---- CategoryInformalism |
Request For Comments: MRS
Overview
Minimal Recursion Semantics (MRS; see Copestake et al., 2005) is a framework for computational semantics characterised by a flat structure (hence the "minimal recursion"). It allows for underspecification, so true scopal ambiguities can be left ambiguous, or fully specified if needed. This RFC aims to be a reference document for developers writing code to process MRSs. See below for the formal properties of MRS objects. For ways to represent MRS objects textually, see the serialization formats section.
Unless otherwise noted, most information below is adapted from Copestake et al. (2005). The reader is referred to this paper for more information on the theory of MRS.
Here is an example in the Simple-MRS serialization of the sentence "The road rises from there."
[ LTOP: h1 INDEX: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ] RELS: < [ _the_q_rel<0:3> LBL: h3 ARG0: x5 [ x PERS: 3 NUM: SG IND: + ] RSTR: h6 BODY: h4 ] [ "_road_n_1_rel"<4:8> LBL: h7 ARG0: x5 ] [ "_rise_v_1_rel"<9:14> LBL: h8 ARG0: e2 ARG1: x5 ] [ _from_p_dir_rel<15:19> LBL: h8 ARG0: e9 [ e SF: PROP TENSE: UNTENSED MOOD: INDICATIVE PROG: - PERF: - ] ARG1: e2 ARG2: x10 [ x PERS: 3 NUM: SG ] ] [ place_n_rel<20:26> LBL: h11 ARG0: x10 ] [ def_implicit_q_rel<20:26> LBL: h12 ARG0: x10 RSTR: h13 BODY: h14 ] [ _there_a_1_rel<20:26> LBL: h11 ARG0: e15 [ e SF: PROP TENSE: UNTENSED MOOD: INDICATIVE PROG: - PERF: - ] ARG1: x10 ] > HCONS: < h6 qeq h7 h13 qeq h11 > ]
Formal Properties
MRS objects are partly composed of Elementary Predications (EPs), which are defined as the following 4-tuple:
EP : < h, r, a*, s* >
Where:
- h is the handle, or label, of the EP
- r is the relation
- a* is a list of 0 or more variable arguments of the relation
- s* is a list of 0 or more scopal arguments of the relation
A MRS structure is the following triple:
MRS : < gt, r, c >
Where:
- gt is the top handle (the label of the highest EP)
- r is a bag of EPs
- c is a bag of handle constraints
INDEX
Note that there is another property of MRS structures: INDEX. There is some debate about the status of INDEX. It is not part of the formal definition of a complete MRS (see Copestake et al., 2005), hence Ann has at times argued it should be suppressed when constructing an MRS from its TFS description. On that point of view, INDEX is an element of the composition process, but not the 'final' product.
Conversely, it has been argued (by Dan and Francis, among others) that composition does not stop at the utterance level, i.e. if we were to move into discourse-level analysis, we might still need access to INDEX. Furthermore, in semantic transfer it is often convenient to have access to the INDEX (even more so as the current ERG leaves the TOP underspecified). In conclusion, as of mid-2011, I believe INDEX can be considered a legitimate component of MRSs, even though it remains true that it has a slightly different formal status than the others
General Remarks
Predicate names are not case-sensitive, but constants (CARGs) are. Furthermore, even though much current MRS manipulation software maintains a distinction between double-quoted predicate names (corresponding to Lisp strings) and non-quoted ones (corresponding to Lisp symbols, often naming types in some hierarchy); this distinction is not meaningful either and arguably should be suppressed in MRS in- and output.
Variable properties on HCONS variables
The DTD seemingly allows variable properties on any var or label, which includes the HCONS. Is this truly possible or encouraged?
Serialization Formats
Simple MRS
<mrs> ::= "[" "LTOP:" <var> "INDEX:" <var> [ "[" <vtype> <proplist> "]" ] "RELS:" "<" <eplist> ">" "HCONS:" "<" <hconss> ">" "]"
<var> ::= /[A-Za-z][^\d\s]*\d+/
<vtype> ::= /[A-Za-z][^\d\s]*/
<proplist> ::= (<prop> <val>)*
<prop> ::= /[^\s]+:/
<val> ::= /[^\s]+/
<eplist> ::= <ep>*
<ep> ::= "[" <pred> <roles> "]"
<pred> ::= (<realpred>|<grammarpred>)[<span>]
<realpred> ::= /^"_((\\")|[^"]*)*_rel"/
<grammarpred> ::= /\w+_rel/
<span> ::= /<(-)?\d+:(-)?\d+>/
<roles> ::= <role>*
<role> ::= "LBL:" <var> | "CARG:" <string> | <rolename> <var> ["[" <vtype> <proplist> "]"]
<rolename> ::= /[^\s]+:/
<string> ::= "string" | <starredstring> | <quotedstring>
<starredstring> ::= /\*\w*\*/
<quotedstring> ::= /"((\\")|[^"]*)*"/
<hconss> ::= <hcons>*
<hcons> ::= <var> <relation> <var>
<relation> ::= "qeq" | "lheq" | "outscopes"
Indexed MRS
<mrs> ::= "<" <ltop> "," <index> "," <rels> "," <hcons> ">"
<ltop> ::= <var>
<index> ::= <var>
<rels> ::= "{" "}" | "{" [<rel> ","]* <rel> "}"
<rel> ::= <varname>":"<predname>"("[<arg>","]* <arg>")"
<arg> ::= <var> | <string> | <constname>
<var> ::= <varname>[":"<constname>]*
<hcons> ::= "{" "}" | "{" [<qeq> ","]* <qeq> "}"
<qeq> ::= <varname> <relnname> <varname>