Differences between revisions 6 and 7
Revision 6 as of 2018-06-19 13:16:35
Size: 1052
Editor: FrancisBond
Comment: Added example of doc-string.
Revision 7 as of 2018-07-11 00:39:14
Size: 4641
Comment: Added an attempt at TDL file syntax, with related questions
Deletions are marked like this. Additions are marked like this.
Line 35: Line 35:
== TDL File Syntax ==
Line 36: Line 37:
{{{#!highlight ruby
# File Contents

TdlTypeFile := ( TypeDef | Spacing )* EOF
TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF

# Types and Lexical Rules

TypeDef := Type ( AvmDef | AvmAddendum )
AvmDef := DefOp DefBody
AvmAddendum := AddOp ( DefBody
                      | DocString? Conjunction
                      | DocString )
LexRuleDef := Type DefOp Affix? DefBody
DefBody := Supertypes ( And DocString? Conjunction | DocString? )
Supertypes := Type ( And Type )*
Type := Identifier Spacing
DocString := DQString
Conjunction := Term ( And Term )*
Term := ( Type
                | FeatureTerm
                | DiffList
                | ConsList
                | Coreference
                | DQString
                | QSymbol
                | Regex
                )
FeatureTerm := LBrack AttrVals? RBrack
AttrVals := AttrVal ( Comma AttrVal )*
AttrVal := Attribute ( Dot Attribute )* Conjunction
Attribute := Identifier
DiffList := DLOpen Conjunctions? DLClose
ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
ConsEnd := Comma Ellipsis | Dot Conjunction
Conjunctions := Conjunction ( Comma Conjunction )*
Coreference := "#" Identifier Spacing

# Letter-sets, Wild-cards, and Affixes

MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
LetterSetVar := /![^ ]/
WildCardVar := /\?[^ ]/
LetterSet := /([^)\\]|\\.)+/
Affix := AffixClass AffixPattern+ Spacing
AffixClass := "%prefix" | "%suffix"
AffixPattern := Space? "(" AffixSubPat Space AffixSubPat ")"
AffixSubPat := LetterSetVar | WildCardVar | AffixNull | AffixChar
AffixNull := "*"
AffixChar := /([^!?\s*\\]|\\[^ ])+/

# Whitespace and Comments

Spacing := Space? Comment*
Space := /\s+/
Comment := ( LineComment | BlockComment ) Space?
LineComment := /;.*$/
BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"

# Literals

DefOp := ":=" Spacing
AddOp := ":+" Spacing
Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
Dot := "." Spacing
And := "&" Spacing
Comma := "," Spacing
LBrack := "[" Spacing
RBrack := "]" Spacing
DLOpen := "<!" Spacing
DLClose := "!>" Spacing
CLOpen := "<" Spacing
CLClose := ">" Spacing
Ellipsis := "..." Spacing
DQString := /"([^"\\]|\\.)*"/ Spacing
QSymbol := "'" Identifier Spacing
Regex := "^" /([^$\\]|\\.)*/ "$"
}}}


== Questions ==

1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?

3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction?

4. Should the (deprecated or repurposed) subtype operator (`:<`) be included in the syntax description?

5. Is variation allowed with regards to the position of docstrings?

6. Are spaces allowed inside a feature path? Comments?
   {{{
   type := supertype &
     [ ATTR1
       . ; comment here?
       ATTR2 value ];
   }}}
   For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?

Type Description Language and other aspects of DELPH-IN Joint Reference Formalism

Case Sensitivity

Case Sensitive

  • Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)

Case Insensitive

  • Everything in TDL not inside of quotes.
  • Lexicon look-up.
    • Proper names?
    • Acronyms?
  • .. approach these with token-mapping (preserve the info, and then downcase anyway)

Unknown

  • Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)

Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)

Doc Strings

TDL types allow a doc string:

n_-_c_le := n_intr_lex_entry &
"Intransitive count noun (icn)    
<ex>The dog barked.
<nex>Much dog bark.".

TDL File Syntax

   1 # File Contents
   2 
   3 TdlTypeFile  := ( TypeDef | Spacing )* EOF
   4 TdlRuleFile  := ( LexRuleDef | MorphSet | Spacing )* EOF
   5 
   6 # Types and Lexical Rules
   7 
   8 TypeDef      := Type ( AvmDef | AvmAddendum )
   9 AvmDef       := DefOp DefBody
  10 AvmAddendum  := AddOp ( DefBody
  11                       | DocString? Conjunction
  12                       | DocString )
  13 LexRuleDef   := Type DefOp Affix? DefBody
  14 DefBody      := Supertypes ( And DocString? Conjunction | DocString? )
  15 Supertypes   := Type ( And Type )*
  16 Type         := Identifier Spacing
  17 DocString    := DQString
  18 Conjunction  := Term ( And Term )*
  19 Term         := ( Type
  20                 | FeatureTerm
  21                 | DiffList
  22                 | ConsList
  23                 | Coreference
  24                 | DQString
  25                 | QSymbol
  26                 | Regex
  27                 )
  28 FeatureTerm  := LBrack AttrVals? RBrack
  29 AttrVals     := AttrVal ( Comma AttrVal )*
  30 AttrVal      := Attribute ( Dot Attribute )* Conjunction
  31 Attribute    := Identifier
  32 DiffList     := DLOpen Conjunctions? DLClose
  33 ConsList     := CLOpen ( Conjunctions ConsEnd? )? CLClose
  34 ConsEnd      := Comma Ellipsis | Dot Conjunction
  35 Conjunctions := Conjunction ( Comma Conjunction )*
  36 Coreference  := "#" Identifier Spacing
  37 
  38 # Letter-sets, Wild-cards, and Affixes
  39 
  40 MorphSet     := "%" "(" ( LetterSetDef | WildCardDef ) ")"
  41 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
  42 WildCardDef  := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
  43 LetterSetVar := /![^ ]/
  44 WildCardVar  := /\?[^ ]/
  45 LetterSet    := /([^)\\]|\\.)+/
  46 Affix        := AffixClass AffixPattern+ Spacing
  47 AffixClass   := "%prefix" | "%suffix"
  48 AffixPattern := Space? "(" AffixSubPat Space AffixSubPat ")"
  49 AffixSubPat  := LetterSetVar | WildCardVar | AffixNull | AffixChar
  50 AffixNull    := "*"
  51 AffixChar    := /([^!?\s*\\]|\\[^ ])+/
  52 
  53 # Whitespace and Comments
  54 
  55 Spacing      := Space? Comment*
  56 Space        := /\s+/
  57 Comment      := ( LineComment | BlockComment ) Space?
  58 LineComment  := /;.*$/
  59 BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"
  60 
  61 # Literals
  62 
  63 DefOp        := ":=" Spacing
  64 AddOp        := ":+" Spacing
  65 Identifier   := /[^\s.:<=&,#[]$()>!^\/]+/
  66 Dot          := "." Spacing
  67 And          := "&" Spacing
  68 Comma        := "," Spacing
  69 LBrack       := "[" Spacing
  70 RBrack       := "]" Spacing
  71 DLOpen       := "<!" Spacing
  72 DLClose      := "!>" Spacing
  73 CLOpen       := "<" Spacing
  74 CLClose      := ">" Spacing
  75 Ellipsis     := "..." Spacing
  76 DQString     := /"([^"\\]|\\.)*"/ Spacing
  77 QSymbol      := "'" Identifier Spacing
  78 Regex        := "^" /([^$\\]|\\.)*/ "$"

Questions

1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?

3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction?

4. Should the (deprecated or repurposed) subtype operator (:<) be included in the syntax description?

5. Is variation allowed with regards to the position of docstrings?

6. Are spaces allowed inside a feature path? Comments?

  •    type := supertype &
         [ ATTR1
           .     ; comment here?
           ATTR2 value ];
    For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?

TdlRfc (last edited 2020-06-05 06:38:36 by FrancisBond)

(The DELPH-IN infrastructure is hosted at the University of Oslo)