Differences between revisions 7 and 9 (spanning 2 versions)
Revision 7 as of 2018-07-11 00:39:14
Size: 4641
Comment: Added an attempt at TDL file syntax, with related questions
Revision 9 as of 2018-07-12 02:12:24
Size: 4802
Comment: Allow spacing between Attribute and Dot; Only allow Null "*" in affixes on LHS
Deletions are marked like this. Additions are marked like this.
Line 68: Line 68:
Attribute := Identifier Attribute := Identifier Spacing
Line 85: Line 85:
AffixPattern := Space? "(" AffixSubPat Space AffixSubPat ")"
AffixSubPat := LetterSetVar | WildCardVar | AffixNull | AffixChar
AffixNull
:= "*"
AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
CharList    := ( LetterSetVar | WildCardVar | AffixChar )+
NullChar  := "*"
Line 139: Line 139:


== Discussions ==

 * ParisDefeasibleConstraints
 * StanfordDefaults
 * [[http://www.delph-in.net/2017/append.pdf|(Diff)List Appends in TDL]]

Type Description Language and other aspects of DELPH-IN Joint Reference Formalism

Case Sensitivity

Case Sensitive

  • Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)

Case Insensitive

  • Everything in TDL not inside of quotes.
  • Lexicon look-up.
    • Proper names?
    • Acronyms?
  • .. approach these with token-mapping (preserve the info, and then downcase anyway)

Unknown

  • Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)

Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)

Doc Strings

TDL types allow a doc string:

n_-_c_le := n_intr_lex_entry &
"Intransitive count noun (icn)    
<ex>The dog barked.
<nex>Much dog bark.".

TDL File Syntax

   1 # File Contents
   2 
   3 TdlTypeFile  := ( TypeDef | Spacing )* EOF
   4 TdlRuleFile  := ( LexRuleDef | MorphSet | Spacing )* EOF
   5 
   6 # Types and Lexical Rules
   7 
   8 TypeDef      := Type ( AvmDef | AvmAddendum )
   9 AvmDef       := DefOp DefBody
  10 AvmAddendum  := AddOp ( DefBody
  11                       | DocString? Conjunction
  12                       | DocString )
  13 LexRuleDef   := Type DefOp Affix? DefBody
  14 DefBody      := Supertypes ( And DocString? Conjunction | DocString? )
  15 Supertypes   := Type ( And Type )*
  16 Type         := Identifier Spacing
  17 DocString    := DQString
  18 Conjunction  := Term ( And Term )*
  19 Term         := ( Type
  20                 | FeatureTerm
  21                 | DiffList
  22                 | ConsList
  23                 | Coreference
  24                 | DQString
  25                 | QSymbol
  26                 | Regex
  27                 )
  28 FeatureTerm  := LBrack AttrVals? RBrack
  29 AttrVals     := AttrVal ( Comma AttrVal )*
  30 AttrVal      := Attribute ( Dot Attribute )* Conjunction
  31 Attribute    := Identifier Spacing
  32 DiffList     := DLOpen Conjunctions? DLClose
  33 ConsList     := CLOpen ( Conjunctions ConsEnd? )? CLClose
  34 ConsEnd      := Comma Ellipsis | Dot Conjunction
  35 Conjunctions := Conjunction ( Comma Conjunction )*
  36 Coreference  := "#" Identifier Spacing
  37 
  38 # Letter-sets, Wild-cards, and Affixes
  39 
  40 MorphSet     := "%" "(" ( LetterSetDef | WildCardDef ) ")"
  41 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
  42 WildCardDef  := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
  43 LetterSetVar := /![^ ]/
  44 WildCardVar  := /\?[^ ]/
  45 LetterSet    := /([^)\\]|\\.)+/
  46 Affix        := AffixClass AffixPattern+ Spacing
  47 AffixClass   := "%prefix" | "%suffix"
  48 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
  49 CharList     := ( LetterSetVar | WildCardVar | AffixChar )+
  50 NullChar     := "*"
  51 AffixChar    := /([^!?\s*\\]|\\[^ ])+/
  52 
  53 # Whitespace and Comments
  54 
  55 Spacing      := Space? Comment*
  56 Space        := /\s+/
  57 Comment      := ( LineComment | BlockComment ) Space?
  58 LineComment  := /;.*$/
  59 BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"
  60 
  61 # Literals
  62 
  63 DefOp        := ":=" Spacing
  64 AddOp        := ":+" Spacing
  65 Identifier   := /[^\s.:<=&,#[]$()>!^\/]+/
  66 Dot          := "." Spacing
  67 And          := "&" Spacing
  68 Comma        := "," Spacing
  69 LBrack       := "[" Spacing
  70 RBrack       := "]" Spacing
  71 DLOpen       := "<!" Spacing
  72 DLClose      := "!>" Spacing
  73 CLOpen       := "<" Spacing
  74 CLClose      := ">" Spacing
  75 Ellipsis     := "..." Spacing
  76 DQString     := /"([^"\\]|\\.)*"/ Spacing
  77 QSymbol      := "'" Identifier Spacing
  78 Regex        := "^" /([^$\\]|\\.)*/ "$"

Questions

1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?

3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction?

4. Should the (deprecated or repurposed) subtype operator (:<) be included in the syntax description?

5. Is variation allowed with regards to the position of docstrings?

6. Are spaces allowed inside a feature path? Comments?

  •    type := supertype &
         [ ATTR1
           .     ; comment here?
           ATTR2 value ];
    For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?

Discussions

TdlRfc (last edited 2020-06-05 06:38:36 by FrancisBond)

(The DELPH-IN infrastructure is hosted at the University of Oslo)