1052
Comment: Added example of doc-string.
|
9398
|
Deletions are marked like this. | Additions are marked like this. |
Line 35: | Line 35: |
== TDL File Syntax == {{{#!highlight ruby # File Contents TdlTypeFile := ( TypeDef | Spacing )* EOF TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF # Types and Lexical Rules TypeDef := Type ( AvmDef | AvmAddendum ) Dot AvmDef := DefOp DefBody AvmAddendum := AddOp ( DefBody | DocString? Conjunction | DocString ) LexRuleDef := LexRuleId DefOp Affix? DefBody Dot DefBody := Supertypes ( And DocString? Conjunction | DocString? ) Supertypes := Type ( And Type )* Type := Identifier Spacing LexRuleId := Identifier Spacing DocString := DQString Conjunction := Term ( And Term )* Term := ( Type | FeatureTerm | DiffList | ConsList | Coreference | DQString | QSymbol | Regex ) FeatureTerm := LBrack AttrVals? RBrack AttrVals := AttrVal ( Comma AttrVal )* AttrVal := Attribute ( Dot Attribute )* Conjunction Attribute := Identifier Spacing DiffList := DLOpen Conjunctions? DLClose ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose ConsEnd := Comma Ellipsis | Dot Conjunction Conjunctions := Conjunction ( Comma Conjunction )* Coreference := "#" Identifier Spacing # Letter-sets, Wild-cards, and Affixes MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")" LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")" WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")" LetterSetVar := /![^ ]/ WildCardVar := /\?[^ ]/ LetterSet := /([^)\\]|\\.)+/ Affix := AffixClass AffixPattern+ Spacing AffixClass := "%prefix" | "%suffix" AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")" CharList := ( LetterSetVar | WildCardVar | AffixChar )+ NullChar := "*" AffixChar := /([^!?\s*\\]|\\[^ ])+/ # Whitespace and Comments Spacing := Space? Comment* Space := /\s+/ Comment := ( LineComment | BlockComment ) Space? LineComment := /;.*$/ BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#" # Literals DefOp := ":=" Spacing AddOp := ":+" Spacing Identifier := /[^\s.:<=&,#[]$()>!^\/]+/ Dot := "." Spacing And := "&" Spacing Comma := "," Spacing LBrack := "[" Spacing RBrack := "]" Spacing DLOpen := "<!" Spacing DLClose := "!>" Spacing CLOpen := "<" Spacing CLClose := ">" Spacing Ellipsis := "..." Spacing DQString := /"([^"\\]|\\.)*"/ Spacing QSymbol := "'" Identifier Spacing Regex := "^" /([^$\\]|\\.)*/ "$" }}} == Docstring Revision == Currently docstrings are regular strings that appear before a Term in an !TypeDef, presumably after the list of supertypes: {{{ type := supertype1 & supertype2 & "Docstring" [ ... ]. }}} But this syntax is not supported in all processors (namely PET), and the others allow variations. At the 2018 summit in Paris (see DiderotSchedule), there was a decision to distinguish docstrings from other strings by using triple-quotes (three double-quotes in a row, similar to Python), which additionally allows quotes to appear inside the docstring. {{{ type := supertype1 & supertype2 & """Docstring""" [ ... ]. }}} This changed the !DocString production like so: {{{#!highlight ruby DocString := /"""([^"\\]|\\.|"[^"]|""[^"])*"""/ Spacing }}} (note that an unescaped quote cannot appear directly before the ending triple-quotes (or rather, it can, but the string would be terminated early and there'd be an extra quote character in the stream)) There are remaining questions about their placement. === Option 1: Placed before any Term with multiple docstrings per type allowed === Where multiple docstrings occur, the type's final docstring is the concatenation of them. Example: {{{ type := """here""" supertype1 & """here""" supertype2 & """here, too""" [ ... ] """maybe here?""". }}} This can be implemented by changing the following producitons: {{{#!highlight ruby TypeDef := Type ( AvmDef | AvmAddendum ) DocString? Dot # maybe LexRuleDef := LexRuleId DefOp Affix? DefBody DocString? Dot # maybe AvmAddendum := AddOp ( DefBody | Conjunction | DocString ) DefBody := Supertypes ( And Conjunction )? Supertypes := DocString? Type ( And DocString? Type )* Term := Docstring? ( Type | FeatureTerm | DiffList | ConsList | Coreference | DQString | QSymbol | Regex ) }}} === Option 2: Placed before any Term with only one docstring per type allowed === Example: {{{ type := supertype1 & """just one, somewhere""" supertype2 & [ ... ]. }}} This is more complicated to describe as production rules (need to duplicate several productions; some for use before docstring is encountered, then others for use after), but the implementation may be simple (just set a flag after reading a docstring). === Option 3: Once, after the list of supertypes and before any feature list === Example: {{{ type := supertype1 & supertype2 & """just one, here""" [ ... ]. type2 := supertype1 & """what about this?""" [ ... ] & supertype2. }}} This is not hard to implement. If it only needs to appear after *a* list of supertypes (both examples above), it's the same as in the full production list above (but other supertypes could appear after a feature list, for instance). If one wants to ensure that all supertypes appear before any docstring or feature list (only the first example above), then we need to duplicate the Conjunction and Term productions to disallow Types at the top level. If that's something desired, it would look like this: {{{#!highlight AvmAddendum := AddOp ( DefBody | DocString? NoTypeConj | DocString ) DefBody := Supertypes DocString? ( And NoTypeConj )? NoTypeConj := NoTypeTerm ( And NoTypeTerm )* NoTypeTerm := ( FeatureTerm | DiffList | ConsList | Coreference | DQString | QSymbol | Regex ) }}} === Option 4: Once, immediately after the typedef or addendum operators === Example: {{{ type := """just one, here""" supertype1 & supertype2 & [ ... ]. type := """ example with multiple lines """ supertype1 & supertype2 & [ ... ]. }}} This is the simplest to implement, and the !DefBody and !Supertypes productions would be unnecessary (unless we still want supertypes to appear first): {{{#!highlight ruby AvmDef := DefOp DocString? Conjunction AvmAddendum := AddOp ( DocString? Conjunction | DocString ) LexRuleDef := LexRuleId DefOp DocString? Affix? Conjunction Dot }}} Previously some did not like it for aesthetic reasons, though (although that is subjective). == Questions == 1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? 2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype? 3. When supertypes are required (e.g., on a !TypeDef), must they appear before other Terms in the Conjunction? (see [[#Docstring_Revision]] above) 4. Should the (deprecated or repurposed) subtype operator (`:<`) be included in the syntax description? 5. Is variation allowed with regards to the position of docstrings? (see [[#Docstring_Revision]] above) 6. Are spaces allowed inside a feature path? Comments? {{{ type := supertype & [ ATTR1 . ; comment here? ATTR2 value ]; }}} For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)? == Discussions == * ParisDefeasibleConstraints * StanfordDefaults * [[http://www.delph-in.net/2017/append.pdf|(Diff)List Appends in TDL]] |
Type Description Language and other aspects of DELPH-IN Joint Reference Formalism
Case Sensitivity
Case Sensitive
Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)
Case Insensitive
- Everything in TDL not inside of quotes.
- Lexicon look-up.
- Proper names?
- Acronyms?
- .. approach these with token-mapping (preserve the info, and then downcase anyway)
Unknown
- Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)
Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)
Doc Strings
TDL types allow a doc string:
n_-_c_le := n_intr_lex_entry & "Intransitive count noun (icn) <ex>The dog barked. <nex>Much dog bark.".
TDL File Syntax
1 # File Contents
2
3 TdlTypeFile := ( TypeDef | Spacing )* EOF
4 TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF
5
6 # Types and Lexical Rules
7
8 TypeDef := Type ( AvmDef | AvmAddendum ) Dot
9 AvmDef := DefOp DefBody
10 AvmAddendum := AddOp ( DefBody
11 | DocString? Conjunction
12 | DocString )
13 LexRuleDef := LexRuleId DefOp Affix? DefBody Dot
14 DefBody := Supertypes ( And DocString? Conjunction | DocString? )
15 Supertypes := Type ( And Type )*
16 Type := Identifier Spacing
17 LexRuleId := Identifier Spacing
18 DocString := DQString
19 Conjunction := Term ( And Term )*
20 Term := ( Type
21 | FeatureTerm
22 | DiffList
23 | ConsList
24 | Coreference
25 | DQString
26 | QSymbol
27 | Regex
28 )
29 FeatureTerm := LBrack AttrVals? RBrack
30 AttrVals := AttrVal ( Comma AttrVal )*
31 AttrVal := Attribute ( Dot Attribute )* Conjunction
32 Attribute := Identifier Spacing
33 DiffList := DLOpen Conjunctions? DLClose
34 ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
35 ConsEnd := Comma Ellipsis | Dot Conjunction
36 Conjunctions := Conjunction ( Comma Conjunction )*
37 Coreference := "#" Identifier Spacing
38
39 # Letter-sets, Wild-cards, and Affixes
40
41 MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
42 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
43 WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
44 LetterSetVar := /![^ ]/
45 WildCardVar := /\?[^ ]/
46 LetterSet := /([^)\\]|\\.)+/
47 Affix := AffixClass AffixPattern+ Spacing
48 AffixClass := "%prefix" | "%suffix"
49 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
50 CharList := ( LetterSetVar | WildCardVar | AffixChar )+
51 NullChar := "*"
52 AffixChar := /([^!?\s*\\]|\\[^ ])+/
53
54 # Whitespace and Comments
55
56 Spacing := Space? Comment*
57 Space := /\s+/
58 Comment := ( LineComment | BlockComment ) Space?
59 LineComment := /;.*$/
60 BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"
61
62 # Literals
63
64 DefOp := ":=" Spacing
65 AddOp := ":+" Spacing
66 Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
67 Dot := "." Spacing
68 And := "&" Spacing
69 Comma := "," Spacing
70 LBrack := "[" Spacing
71 RBrack := "]" Spacing
72 DLOpen := "<!" Spacing
73 DLClose := "!>" Spacing
74 CLOpen := "<" Spacing
75 CLClose := ">" Spacing
76 Ellipsis := "..." Spacing
77 DQString := /"([^"\\]|\\.)*"/ Spacing
78 QSymbol := "'" Identifier Spacing
79 Regex := "^" /([^$\\]|\\.)*/ "$"
Docstring Revision
Currently docstrings are regular strings that appear before a Term in an TypeDef, presumably after the list of supertypes:
type := supertype1 & supertype2 & "Docstring" [ ... ].
But this syntax is not supported in all processors (namely PET), and the others allow variations. At the 2018 summit in Paris (see DiderotSchedule), there was a decision to distinguish docstrings from other strings by using triple-quotes (three double-quotes in a row, similar to Python), which additionally allows quotes to appear inside the docstring.
type := supertype1 & supertype2 & """Docstring""" [ ... ].
This changed the DocString production like so:
1 DocString := /"""([^"\\]|\\.|"[^"]|""[^"])*"""/ Spacing
(note that an unescaped quote cannot appear directly before the ending triple-quotes (or rather, it can, but the string would be terminated early and there'd be an extra quote character in the stream))
There are remaining questions about their placement.
Option 1: Placed before any Term with multiple docstrings per type allowed
Where multiple docstrings occur, the type's final docstring is the concatenation of them.
Example:
type := """here""" supertype1 & """here""" supertype2 & """here, too""" [ ... ] """maybe here?""".
This can be implemented by changing the following producitons:
1 TypeDef := Type ( AvmDef | AvmAddendum ) DocString? Dot # maybe
2 LexRuleDef := LexRuleId DefOp Affix? DefBody DocString? Dot # maybe
3 AvmAddendum := AddOp ( DefBody | Conjunction | DocString )
4 DefBody := Supertypes ( And Conjunction )?
5 Supertypes := DocString? Type ( And DocString? Type )*
6 Term := Docstring? ( Type
7 | FeatureTerm
8 | DiffList
9 | ConsList
10 | Coreference
11 | DQString
12 | QSymbol
13 | Regex
14 )
Option 2: Placed before any Term with only one docstring per type allowed
Example:
type := supertype1 & """just one, somewhere""" supertype2 & [ ... ].
This is more complicated to describe as production rules (need to duplicate several productions; some for use before docstring is encountered, then others for use after), but the implementation may be simple (just set a flag after reading a docstring).
Option 3: Once, after the list of supertypes and before any feature list
Example:
type := supertype1 & supertype2 & """just one, here""" [ ... ]. type2 := supertype1 & """what about this?""" [ ... ] & supertype2.
This is not hard to implement. If it only needs to appear after *a* list of supertypes (both examples above), it's the same as in the full production list above (but other supertypes could appear after a feature list, for instance). If one wants to ensure that all supertypes appear before any docstring or feature list (only the first example above), then we need to duplicate the Conjunction and Term productions to disallow Types at the top level. If that's something desired, it would look like this:
Option 4: Once, immediately after the typedef or addendum operators
Example:
type := """just one, here""" supertype1 & supertype2 & [ ... ]. type := """ example with multiple lines """ supertype1 & supertype2 & [ ... ].
This is the simplest to implement, and the DefBody and !Supertypes productions would be unnecessary (unless we still want supertypes to appear first):
Previously some did not like it for aesthetic reasons, though (although that is subjective).
Questions
1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?
2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?
3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction? (see #Docstring_Revision above)
4. Should the (deprecated or repurposed) subtype operator (:<) be included in the syntax description?
5. Is variation allowed with regards to the position of docstrings? (see #Docstring_Revision above)
6. Are spaces allowed inside a feature path? Comments?
type := supertype & [ ATTR1 . ; comment here? ATTR2 value ];
For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?