4641
Comment: Added an attempt at TDL file syntax, with related questions
|
4802
Allow spacing between Attribute and Dot; Only allow Null "*" in affixes on LHS
|
Deletions are marked like this. | Additions are marked like this. |
Line 68: | Line 68: |
Attribute := Identifier | Attribute := Identifier Spacing |
Line 85: | Line 85: |
AffixPattern := Space? "(" AffixSubPat Space AffixSubPat ")" AffixSubPat := LetterSetVar | WildCardVar | AffixNull | AffixChar AffixNull := "*" |
AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")" CharList := ( LetterSetVar | WildCardVar | AffixChar )+ NullChar := "*" |
Line 139: | Line 139: |
== Discussions == * ParisDefeasibleConstraints * StanfordDefaults * [[http://www.delph-in.net/2017/append.pdf|(Diff)List Appends in TDL]] |
Type Description Language and other aspects of DELPH-IN Joint Reference Formalism
Case Sensitivity
Case Sensitive
Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)
Case Insensitive
- Everything in TDL not inside of quotes.
- Lexicon look-up.
- Proper names?
- Acronyms?
- .. approach these with token-mapping (preserve the info, and then downcase anyway)
Unknown
- Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)
Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)
Doc Strings
TDL types allow a doc string:
n_-_c_le := n_intr_lex_entry & "Intransitive count noun (icn) <ex>The dog barked. <nex>Much dog bark.".
TDL File Syntax
1 # File Contents
2
3 TdlTypeFile := ( TypeDef | Spacing )* EOF
4 TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF
5
6 # Types and Lexical Rules
7
8 TypeDef := Type ( AvmDef | AvmAddendum )
9 AvmDef := DefOp DefBody
10 AvmAddendum := AddOp ( DefBody
11 | DocString? Conjunction
12 | DocString )
13 LexRuleDef := Type DefOp Affix? DefBody
14 DefBody := Supertypes ( And DocString? Conjunction | DocString? )
15 Supertypes := Type ( And Type )*
16 Type := Identifier Spacing
17 DocString := DQString
18 Conjunction := Term ( And Term )*
19 Term := ( Type
20 | FeatureTerm
21 | DiffList
22 | ConsList
23 | Coreference
24 | DQString
25 | QSymbol
26 | Regex
27 )
28 FeatureTerm := LBrack AttrVals? RBrack
29 AttrVals := AttrVal ( Comma AttrVal )*
30 AttrVal := Attribute ( Dot Attribute )* Conjunction
31 Attribute := Identifier Spacing
32 DiffList := DLOpen Conjunctions? DLClose
33 ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
34 ConsEnd := Comma Ellipsis | Dot Conjunction
35 Conjunctions := Conjunction ( Comma Conjunction )*
36 Coreference := "#" Identifier Spacing
37
38 # Letter-sets, Wild-cards, and Affixes
39
40 MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
41 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
42 WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
43 LetterSetVar := /![^ ]/
44 WildCardVar := /\?[^ ]/
45 LetterSet := /([^)\\]|\\.)+/
46 Affix := AffixClass AffixPattern+ Spacing
47 AffixClass := "%prefix" | "%suffix"
48 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
49 CharList := ( LetterSetVar | WildCardVar | AffixChar )+
50 NullChar := "*"
51 AffixChar := /([^!?\s*\\]|\\[^ ])+/
52
53 # Whitespace and Comments
54
55 Spacing := Space? Comment*
56 Space := /\s+/
57 Comment := ( LineComment | BlockComment ) Space?
58 LineComment := /;.*$/
59 BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"
60
61 # Literals
62
63 DefOp := ":=" Spacing
64 AddOp := ":+" Spacing
65 Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
66 Dot := "." Spacing
67 And := "&" Spacing
68 Comma := "," Spacing
69 LBrack := "[" Spacing
70 RBrack := "]" Spacing
71 DLOpen := "<!" Spacing
72 DLClose := "!>" Spacing
73 CLOpen := "<" Spacing
74 CLClose := ">" Spacing
75 Ellipsis := "..." Spacing
76 DQString := /"([^"\\]|\\.)*"/ Spacing
77 QSymbol := "'" Identifier Spacing
78 Regex := "^" /([^$\\]|\\.)*/ "$"
Questions
1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?
2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?
3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction?
4. Should the (deprecated or repurposed) subtype operator (:<) be included in the syntax description?
5. Is variation allowed with regards to the position of docstrings?
6. Are spaces allowed inside a feature path? Comments?
type := supertype & [ ATTR1 . ; comment here? ATTR2 value ];
For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?