Differences between revisions 1 and 11 (spanning 10 versions)
Revision 1 as of 2011-06-28 00:00:24
Size: 27
Comment:
Revision 11 as of 2018-07-15 02:47:46
Size: 9398
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Type Description Language ## page was renamed from TdlRFC
Type Description Language and other aspects of DELPH-IN Joint Reference Formalism

== Case Sensitivity ==

=== Case Sensitive ===

 * Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not `CARG`s)

=== Case Insensitive ===
 
 * Everything in TDL not inside of quotes.
 * Lexicon look-up.
   * Proper names?
   * Acronyms?

 ... approach these with token-mapping (preserve the info, and then downcase anyway)

=== Unknown ===

 * Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)

Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)

== Doc Strings ==

TDL types allow a doc string:
{{{
n_-_c_le := n_intr_lex_entry &
"Intransitive count noun (icn)
<ex>The dog barked.
<nex>Much dog bark.".
}}}

== TDL File Syntax ==

{{{#!highlight ruby
# File Contents

TdlTypeFile := ( TypeDef | Spacing )* EOF
TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF

# Types and Lexical Rules

TypeDef := Type ( AvmDef | AvmAddendum ) Dot
AvmDef := DefOp DefBody
AvmAddendum := AddOp ( DefBody
                      | DocString? Conjunction
                      | DocString )
LexRuleDef := LexRuleId DefOp Affix? DefBody Dot
DefBody := Supertypes ( And DocString? Conjunction | DocString? )
Supertypes := Type ( And Type )*
Type := Identifier Spacing
LexRuleId := Identifier Spacing
DocString := DQString
Conjunction := Term ( And Term )*
Term := ( Type
                | FeatureTerm
                | DiffList
                | ConsList
                | Coreference
                | DQString
                | QSymbol
                | Regex
                )
FeatureTerm := LBrack AttrVals? RBrack
AttrVals := AttrVal ( Comma AttrVal )*
AttrVal := Attribute ( Dot Attribute )* Conjunction
Attribute := Identifier Spacing
DiffList := DLOpen Conjunctions? DLClose
ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
ConsEnd := Comma Ellipsis | Dot Conjunction
Conjunctions := Conjunction ( Comma Conjunction )*
Coreference := "#" Identifier Spacing

# Letter-sets, Wild-cards, and Affixes

MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
LetterSetVar := /![^ ]/
WildCardVar := /\?[^ ]/
LetterSet := /([^)\\]|\\.)+/
Affix := AffixClass AffixPattern+ Spacing
AffixClass := "%prefix" | "%suffix"
AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
CharList := ( LetterSetVar | WildCardVar | AffixChar )+
NullChar := "*"
AffixChar := /([^!?\s*\\]|\\[^ ])+/

# Whitespace and Comments

Spacing := Space? Comment*
Space := /\s+/
Comment := ( LineComment | BlockComment ) Space?
LineComment := /;.*$/
BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"

# Literals

DefOp := ":=" Spacing
AddOp := ":+" Spacing
Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
Dot := "." Spacing
And := "&" Spacing
Comma := "," Spacing
LBrack := "[" Spacing
RBrack := "]" Spacing
DLOpen := "<!" Spacing
DLClose := "!>" Spacing
CLOpen := "<" Spacing
CLClose := ">" Spacing
Ellipsis := "..." Spacing
DQString := /"([^"\\]|\\.)*"/ Spacing
QSymbol := "'" Identifier Spacing
Regex := "^" /([^$\\]|\\.)*/ "$"
}}}


== Docstring Revision ==

Currently docstrings are regular strings that appear before a Term in an !TypeDef, presumably after the list of supertypes:

{{{
type := supertype1 & supertype2 &
  "Docstring"
  [ ... ].
}}}

But this syntax is not supported in all processors (namely PET), and the others allow variations. At the 2018 summit in Paris (see DiderotSchedule), there was a decision to distinguish docstrings from other strings by using triple-quotes (three double-quotes in a row, similar to Python), which additionally allows quotes to appear inside the docstring.


{{{
type := supertype1 & supertype2 &
  """Docstring"""
  [ ... ].
}}}

This changed the !DocString production like so:

{{{#!highlight ruby
DocString := /"""([^"\\]|\\.|"[^"]|""[^"])*"""/ Spacing
}}}

(note that an unescaped quote cannot appear directly before the ending triple-quotes (or rather, it can, but the string would be terminated early and there'd be an extra quote character in the stream))

There are remaining questions about their placement.

=== Option 1: Placed before any Term with multiple docstrings per type allowed ===

Where multiple docstrings occur, the type's final docstring is the concatenation of them.

Example:

{{{
type := """here""" supertype1 & """here""" supertype2 &
  """here, too"""
  [ ... ] """maybe here?""".
}}}

This can be implemented by changing the following producitons:

{{{#!highlight ruby
TypeDef := Type ( AvmDef | AvmAddendum ) DocString? Dot # maybe
LexRuleDef := LexRuleId DefOp Affix? DefBody DocString? Dot # maybe
AvmAddendum := AddOp ( DefBody | Conjunction | DocString )
DefBody := Supertypes ( And Conjunction )?
Supertypes := DocString? Type ( And DocString? Type )*
Term := Docstring? ( Type
                           | FeatureTerm
                           | DiffList
                           | ConsList
                           | Coreference
                           | DQString
                           | QSymbol
                           | Regex
                           )
}}}

=== Option 2: Placed before any Term with only one docstring per type allowed ===

Example:

{{{
type := supertype1 & """just one, somewhere""" supertype2 &
  [ ... ].
}}}

This is more complicated to describe as production rules (need to duplicate several productions; some for use before docstring is encountered, then others for use after), but the implementation may be simple (just set a flag after reading a docstring).

=== Option 3: Once, after the list of supertypes and before any feature list ===

Example:

{{{
type := supertype1 & supertype2 &
  """just one, here"""
  [ ... ].

type2 := supertype1 &
  """what about this?"""
  [ ... ] & supertype2.
}}}

This is not hard to implement. If it only needs to appear after *a* list of supertypes (both examples above), it's the same as in the full production list above (but other supertypes could appear after a feature list, for instance). If one wants to ensure that all supertypes appear before any docstring or feature list (only the first example above), then we need to duplicate the Conjunction and Term productions to disallow Types at the top level. If that's something desired, it would look like this:

{{{#!highlight
AvmAddendum := AddOp ( DefBody | DocString? NoTypeConj | DocString )
DefBody := Supertypes DocString? ( And NoTypeConj )?
NoTypeConj := NoTypeTerm ( And NoTypeTerm )*
NoTypeTerm := ( FeatureTerm
                | DiffList
                | ConsList
                | Coreference
                | DQString
                | QSymbol
                | Regex
                )
}}}

=== Option 4: Once, immediately after the typedef or addendum operators ===

Example:

{{{
type := """just one, here"""
  supertype1 & supertype2 &
  [ ... ].

type :=
  """
  example
  with
  multiple
  lines
  """
  supertype1 & supertype2 &
  [ ... ].
}}}

This is the simplest to implement, and the !DefBody and !Supertypes productions would be unnecessary (unless we still want supertypes to appear first):

{{{#!highlight ruby
AvmDef := DefOp DocString? Conjunction
AvmAddendum := AddOp ( DocString? Conjunction | DocString )
LexRuleDef := LexRuleId DefOp DocString? Affix? Conjunction Dot
}}}

Previously some did not like it for aesthetic reasons, though (although that is subjective).

== Questions ==

1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?

3. When supertypes are required (e.g., on a !TypeDef), must they appear before other Terms in the Conjunction? (see [[#Docstring_Revision]] above)

4. Should the (deprecated or repurposed) subtype operator (`:<`) be included in the syntax description?

5. Is variation allowed with regards to the position of docstrings? (see [[#Docstring_Revision]] above)

6. Are spaces allowed inside a feature path? Comments?
   {{{
   type := supertype &
     [ ATTR1
       . ; comment here?
       ATTR2 value ];
   }}}
   For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?


== Discussions ==

 * ParisDefeasibleConstraints
 * StanfordDefaults
 * [[http://www.delph-in.net/2017/append.pdf|(Diff)List Appends in TDL]]

Type Description Language and other aspects of DELPH-IN Joint Reference Formalism

Case Sensitivity

Case Sensitive

  • Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)

Case Insensitive

  • Everything in TDL not inside of quotes.
  • Lexicon look-up.
    • Proper names?
    • Acronyms?
  • .. approach these with token-mapping (preserve the info, and then downcase anyway)

Unknown

  • Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)

Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)

Doc Strings

TDL types allow a doc string:

n_-_c_le := n_intr_lex_entry &
"Intransitive count noun (icn)    
<ex>The dog barked.
<nex>Much dog bark.".

TDL File Syntax

   1 # File Contents
   2 
   3 TdlTypeFile  := ( TypeDef | Spacing )* EOF
   4 TdlRuleFile  := ( LexRuleDef | MorphSet | Spacing )* EOF
   5 
   6 # Types and Lexical Rules
   7 
   8 TypeDef      := Type ( AvmDef | AvmAddendum ) Dot
   9 AvmDef       := DefOp DefBody
  10 AvmAddendum  := AddOp ( DefBody
  11                       | DocString? Conjunction
  12                       | DocString )
  13 LexRuleDef   := LexRuleId DefOp Affix? DefBody Dot
  14 DefBody      := Supertypes ( And DocString? Conjunction | DocString? )
  15 Supertypes   := Type ( And Type )*
  16 Type         := Identifier Spacing
  17 LexRuleId    := Identifier Spacing
  18 DocString    := DQString
  19 Conjunction  := Term ( And Term )*
  20 Term         := ( Type
  21                 | FeatureTerm
  22                 | DiffList
  23                 | ConsList
  24                 | Coreference
  25                 | DQString
  26                 | QSymbol
  27                 | Regex
  28                 )
  29 FeatureTerm  := LBrack AttrVals? RBrack
  30 AttrVals     := AttrVal ( Comma AttrVal )*
  31 AttrVal      := Attribute ( Dot Attribute )* Conjunction
  32 Attribute    := Identifier Spacing
  33 DiffList     := DLOpen Conjunctions? DLClose
  34 ConsList     := CLOpen ( Conjunctions ConsEnd? )? CLClose
  35 ConsEnd      := Comma Ellipsis | Dot Conjunction
  36 Conjunctions := Conjunction ( Comma Conjunction )*
  37 Coreference  := "#" Identifier Spacing
  38 
  39 # Letter-sets, Wild-cards, and Affixes
  40 
  41 MorphSet     := "%" "(" ( LetterSetDef | WildCardDef ) ")"
  42 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
  43 WildCardDef  := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
  44 LetterSetVar := /![^ ]/
  45 WildCardVar  := /\?[^ ]/
  46 LetterSet    := /([^)\\]|\\.)+/
  47 Affix        := AffixClass AffixPattern+ Spacing
  48 AffixClass   := "%prefix" | "%suffix"
  49 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
  50 CharList     := ( LetterSetVar | WildCardVar | AffixChar )+
  51 NullChar     := "*"
  52 AffixChar    := /([^!?\s*\\]|\\[^ ])+/
  53 
  54 # Whitespace and Comments
  55 
  56 Spacing      := Space? Comment*
  57 Space        := /\s+/
  58 Comment      := ( LineComment | BlockComment ) Space?
  59 LineComment  := /;.*$/
  60 BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#"
  61 
  62 # Literals
  63 
  64 DefOp        := ":=" Spacing
  65 AddOp        := ":+" Spacing
  66 Identifier   := /[^\s.:<=&,#[]$()>!^\/]+/
  67 Dot          := "." Spacing
  68 And          := "&" Spacing
  69 Comma        := "," Spacing
  70 LBrack       := "[" Spacing
  71 RBrack       := "]" Spacing
  72 DLOpen       := "<!" Spacing
  73 DLClose      := "!>" Spacing
  74 CLOpen       := "<" Spacing
  75 CLClose      := ">" Spacing
  76 Ellipsis     := "..." Spacing
  77 DQString     := /"([^"\\]|\\.)*"/ Spacing
  78 QSymbol      := "'" Identifier Spacing
  79 Regex        := "^" /([^$\\]|\\.)*/ "$"

Docstring Revision

Currently docstrings are regular strings that appear before a Term in an TypeDef, presumably after the list of supertypes:

type := supertype1 & supertype2 &
  "Docstring"
  [ ... ].

But this syntax is not supported in all processors (namely PET), and the others allow variations. At the 2018 summit in Paris (see DiderotSchedule), there was a decision to distinguish docstrings from other strings by using triple-quotes (three double-quotes in a row, similar to Python), which additionally allows quotes to appear inside the docstring.

type := supertype1 & supertype2 &
  """Docstring"""
  [ ... ].

This changed the DocString production like so:

   1 DocString    := /"""([^"\\]|\\.|"[^"]|""[^"])*"""/ Spacing

(note that an unescaped quote cannot appear directly before the ending triple-quotes (or rather, it can, but the string would be terminated early and there'd be an extra quote character in the stream))

There are remaining questions about their placement.

Option 1: Placed before any Term with multiple docstrings per type allowed

Where multiple docstrings occur, the type's final docstring is the concatenation of them.

Example:

type := """here""" supertype1 & """here""" supertype2 &
  """here, too"""
  [ ... ] """maybe here?""".

This can be implemented by changing the following producitons:

   1 TypeDef      := Type ( AvmDef | AvmAddendum ) DocString? Dot   # maybe
   2 LexRuleDef   := LexRuleId DefOp Affix? DefBody DocString? Dot  # maybe
   3 AvmAddendum  := AddOp ( DefBody | Conjunction | DocString )
   4 DefBody      := Supertypes ( And Conjunction )?
   5 Supertypes   := DocString? Type ( And DocString? Type )*
   6 Term         := Docstring? ( Type
   7                            | FeatureTerm
   8                            | DiffList
   9                            | ConsList
  10                            | Coreference
  11                            | DQString
  12                            | QSymbol
  13                            | Regex
  14                            )

Option 2: Placed before any Term with only one docstring per type allowed

Example:

type := supertype1 & """just one, somewhere""" supertype2 &
  [ ... ].

This is more complicated to describe as production rules (need to duplicate several productions; some for use before docstring is encountered, then others for use after), but the implementation may be simple (just set a flag after reading a docstring).

Option 3: Once, after the list of supertypes and before any feature list

Example:

type := supertype1 & supertype2 &
  """just one, here"""
  [ ... ].

type2 := supertype1 &
  """what about this?"""
  [ ... ] & supertype2.

This is not hard to implement. If it only needs to appear after *a* list of supertypes (both examples above), it's the same as in the full production list above (but other supertypes could appear after a feature list, for instance). If one wants to ensure that all supertypes appear before any docstring or feature list (only the first example above), then we need to duplicate the Conjunction and Term productions to disallow Types at the top level. If that's something desired, it would look like this:

   1 AvmAddendum  := AddOp ( DefBody | DocString? NoTypeConj | DocString )
   2 DefBody      := Supertypes DocString? ( And NoTypeConj )?
   3 NoTypeConj   := NoTypeTerm ( And NoTypeTerm )*
   4 NoTypeTerm   := ( FeatureTerm
   5                 | DiffList
   6                 | ConsList
   7                 | Coreference
   8                 | DQString
   9                 | QSymbol
  10                 | Regex
  11                 )

Option 4: Once, immediately after the typedef or addendum operators

Example:

type := """just one, here"""
  supertype1 & supertype2 &
  [ ... ].

type :=
  """
  example
  with
  multiple
  lines
  """
  supertype1 & supertype2 &
  [ ... ].

This is the simplest to implement, and the DefBody and !Supertypes productions would be unnecessary (unless we still want supertypes to appear first):

   1 AvmDef       := DefOp DocString? Conjunction
   2 AvmAddendum  := AddOp ( DocString? Conjunction | DocString )
   3 LexRuleDef   := LexRuleId DefOp DocString? Affix? Conjunction Dot

Previously some did not like it for aesthetic reasons, though (although that is subjective).

Questions

1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them?

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?

3. When supertypes are required (e.g., on a TypeDef), must they appear before other Terms in the Conjunction? (see #Docstring_Revision above)

4. Should the (deprecated or repurposed) subtype operator (:<) be included in the syntax description?

5. Is variation allowed with regards to the position of docstrings? (see #Docstring_Revision above)

6. Are spaces allowed inside a feature path? Comments?

  •    type := supertype &
         [ ATTR1
           .     ; comment here?
           ATTR2 value ];
    For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)?

Discussions

TdlRfc (last edited 2020-06-05 06:38:36 by FrancisBond)

(The DELPH-IN infrastructure is hosted at the University of Oslo)