872
Comment:
|
4792
Added discussion links
|
Deletions are marked like this. | Additions are marked like this. |
Line 24: | Line 24: |
== Doc Strings == TDL types allow a doc string: {{{ n_-_c_le := n_intr_lex_entry & "Intransitive count noun (icn) <ex>The dog barked. <nex>Much dog bark.". }}} == TDL File Syntax == {{{#!highlight ruby # File Contents TdlTypeFile := ( TypeDef | Spacing )* EOF TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF # Types and Lexical Rules TypeDef := Type ( AvmDef | AvmAddendum ) AvmDef := DefOp DefBody AvmAddendum := AddOp ( DefBody | DocString? Conjunction | DocString ) LexRuleDef := Type DefOp Affix? DefBody DefBody := Supertypes ( And DocString? Conjunction | DocString? ) Supertypes := Type ( And Type )* Type := Identifier Spacing DocString := DQString Conjunction := Term ( And Term )* Term := ( Type | FeatureTerm | DiffList | ConsList | Coreference | DQString | QSymbol | Regex ) FeatureTerm := LBrack AttrVals? RBrack AttrVals := AttrVal ( Comma AttrVal )* AttrVal := Attribute ( Dot Attribute )* Conjunction Attribute := Identifier DiffList := DLOpen Conjunctions? DLClose ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose ConsEnd := Comma Ellipsis | Dot Conjunction Conjunctions := Conjunction ( Comma Conjunction )* Coreference := "#" Identifier Spacing # Letter-sets, Wild-cards, and Affixes MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")" LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")" WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")" LetterSetVar := /![^ ]/ WildCardVar := /\?[^ ]/ LetterSet := /([^)\\]|\\.)+/ Affix := AffixClass AffixPattern+ Spacing AffixClass := "%prefix" | "%suffix" AffixPattern := Space? "(" AffixSubPat Space AffixSubPat ")" AffixSubPat := LetterSetVar | WildCardVar | AffixNull | AffixChar AffixNull := "*" AffixChar := /([^!?\s*\\]|\\[^ ])+/ # Whitespace and Comments Spacing := Space? Comment* Space := /\s+/ Comment := ( LineComment | BlockComment ) Space? LineComment := /;.*$/ BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#" # Literals DefOp := ":=" Spacing AddOp := ":+" Spacing Identifier := /[^\s.:<=&,#[]$()>!^\/]+/ Dot := "." Spacing And := "&" Spacing Comma := "," Spacing LBrack := "[" Spacing RBrack := "]" Spacing DLOpen := "<!" Spacing DLClose := "!>" Spacing CLOpen := "<" Spacing CLClose := ">" Spacing Ellipsis := "..." Spacing DQString := /"([^"\\]|\\.)*"/ Spacing QSymbol := "'" Identifier Spacing Regex := "^" /([^$\\]|\\.)*/ "$" }}} == Questions == 1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? 2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype? 3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction? 4. Should the (deprecated or repurposed) subtype operator (`:<`) be included in the syntax description? 5. Is variation allowed with regards to the position of docstrings? 6. Are spaces allowed inside a feature path? Comments? {{{ type := supertype & [ ATTR1 . ; comment here? ATTR2 value ]; }}} For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)? == Discussions == * ParisDefeasibleConstraints * StanfordDefaults * [[http://www.delph-in.net/2017/append.pdf|(Diff)List Appends in TDL]] |
Type Description Language and other aspects of DELPH-IN Joint Reference Formalism
Case Sensitivity
Case Sensitive
Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)
Case Insensitive
- Everything in TDL not inside of quotes.
- Lexicon look-up.
- Proper names?
- Acronyms?
- .. approach these with token-mapping (preserve the info, and then downcase anyway)
Unknown
- Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)
Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)
Doc Strings
TDL types allow a doc string:
n_-_c_le := n_intr_lex_entry & "Intransitive count noun (icn) <ex>The dog barked. <nex>Much dog bark.".
TDL File Syntax
1 # File Contents
2
3 TdlTypeFile := ( TypeDef | Spacing )* EOF
4 TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF
5
6 # Types and Lexical Rules
7
8 TypeDef := Type ( AvmDef | AvmAddendum )
9 AvmDef := DefOp DefBody
10 AvmAddendum := AddOp ( DefBody
11 | DocString? Conjunction
12 | DocString )
13 LexRuleDef := Type DefOp Affix? DefBody
14 DefBody := Supertypes ( And DocString? Conjunction | DocString? )
15 Supertypes := Type ( And Type )*
16 Type := Identifier Spacing
17 DocString := DQString
18 Conjunction := Term ( And Term )*
19 Term := ( Type
20 | FeatureTerm
21 | DiffList
22 | ConsList
23 | Coreference
24 | DQString
25 | QSymbol
26 | Regex
27 )
28 FeatureTerm := LBrack AttrVals? RBrack
29 AttrVals := AttrVal ( Comma AttrVal )*
30 AttrVal := Attribute ( Dot Attribute )* Conjunction
31 Attribute := Identifier
32 DiffList := DLOpen Conjunctions? DLClose
33 ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
34 ConsEnd := Comma Ellipsis | Dot Conjunction
35 Conjunctions := Conjunction ( Comma Conjunction )*
36 Coreference := "#" Identifier Spacing
37
38 # Letter-sets, Wild-cards, and Affixes
39
40 MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
41 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
42 WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
43 LetterSetVar := /![^ ]/
44 WildCardVar := /\?[^ ]/
45 LetterSet := /([^)\\]|\\.)+/
46 Affix := AffixClass AffixPattern+ Spacing
47 AffixClass := "%prefix" | "%suffix"
48 AffixPattern := Space? "(" AffixSubPat Space AffixSubPat ")"
49 AffixSubPat := LetterSetVar | WildCardVar | AffixNull | AffixChar
50 AffixNull := "*"
51 AffixChar := /([^!?\s*\\]|\\[^ ])+/
52
53 # Whitespace and Comments
54
55 Spacing := Space? Comment*
56 Space := /\s+/
57 Comment := ( LineComment | BlockComment ) Space?
58 LineComment := /;.*$/
59 BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"
60
61 # Literals
62
63 DefOp := ":=" Spacing
64 AddOp := ":+" Spacing
65 Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
66 Dot := "." Spacing
67 And := "&" Spacing
68 Comma := "," Spacing
69 LBrack := "[" Spacing
70 RBrack := "]" Spacing
71 DLOpen := "<!" Spacing
72 DLClose := "!>" Spacing
73 CLOpen := "<" Spacing
74 CLClose := ">" Spacing
75 Ellipsis := "..." Spacing
76 DQString := /"([^"\\]|\\.)*"/ Spacing
77 QSymbol := "'" Identifier Spacing
78 Regex := "^" /([^$\\]|\\.)*/ "$"
Questions
1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?
2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?
3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction?
4. Should the (deprecated or repurposed) subtype operator (:<) be included in the syntax description?
5. Is variation allowed with regards to the position of docstrings?
6. Are spaces allowed inside a feature path? Comments?
type := supertype & [ ATTR1 . ; comment here? ATTR2 value ];
For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?