Diff for "TdlRfc" - Deep Linguistic Processing with HPSG (DELPH-IN)

Differences between revisions 2 and 12 (spanning 10 versions)

Type Description Language and other aspects of DELPH-IN Joint Reference Formalism

Case Sensitivity

Case Sensitive

Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)

Case Insensitive

Everything in TDL not inside of quotes.
Lexicon look-up.
- Proper names?
- Acronyms?
.. approach these with token-mapping (preserve the info, and then downcase anyway)

Unknown

Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)

Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)

Doc Strings

TDL definitions allow documentation strings ("docstrings") before any term in the top-level conjunction or before the terminating dot (.) character:

n_-_c_le := n_intr_lex_entry
"""Intransitive count noun (icn)    
<ex>The dog barked.
<nex>Much dog bark.""".

TDL File Syntax

   1 # File Contents
   2 
   3 TdlTypeFile  := ( TypeDef | TypeAddendum | Spacing )* EOF
   4 TdlRuleFile  := ( LexRuleDef | MorphSet | Spacing )* EOF
   5 
   6 # Types and Lexical Rules
   7 
   8 TypeDef      := Type DefOp TypedDefBody Dot
   9 Typeddendum  := Type AddOp ( DefBody | DocString ) Dot
  10 LexRuleDef   := LexRuleId DefOp Affix? TypedDefBody Dot
  11 LexRuleId    := Identifier Spacing
  12 
  13 # Definition Bodies (top-level conjunctions of terms)
  14 #
  15 #   The body of a type definition, type addendum, or lexical rule is
  16 #   essentially a conjunction of Terms, but there are two special features
  17 #   of top-level conjunctions (i.e., those outside of an AVM):
  18 #
  19 #     (1) """DocStrings""" may precede any Term or the final Dot (.)
  20 #
  21 #     (2) TypeDef and LexRuleDef require at least one Type (supertype)
  22 #         somewhere in the conjunction (conventionally the first Term)
  23 
  24 TypedDefBody := ( TopLevelConj And )? DocString? Type ( And TopLevelConj )? DocString?
  25 DefBody      := TopLevelConj DocString?
  26 TopLevelConj := DocString? Term ( And DocString? Term )*
  27 DocString    := TQString
  28 
  29 # Terms and Conjunctions
  30 
  31 Conjunction  := Term ( And Term )*
  32 Type         := Identifier Spacing
  33 Term         := ( Type
  34                 | FeatureTerm
  35                 | DiffList
  36                 | ConsList
  37                 | Coreference
  38                 | DQString
  39                 | QSymbol
  40                 | Regex
  41                 )
  42 FeatureTerm  := LBrack AttrVals? RBrack
  43 AttrVals     := AttrVal ( Comma AttrVal )*
  44 AttrVal      := Attribute ( Dot Attribute )* Conjunction
  45 Attribute    := Identifier Spacing
  46 DiffList     := DLOpen Conjunctions? DLClose
  47 ConsList     := CLOpen ( Conjunctions ConsEnd? )? CLClose
  48 ConsEnd      := Comma Ellipsis | Dot Conjunction
  49 Conjunctions := Conjunction ( Comma Conjunction )*
  50 Coreference  := "#" Identifier Spacing
  51 
  52 # Letter-sets, Wild-cards, and Affixes
  53 
  54 MorphSet     := "%" "(" ( LetterSetDef | WildCardDef ) ")"
  55 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
  56 WildCardDef  := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
  57 LetterSetVar := /![^ ]/
  58 WildCardVar  := /\?[^ ]/
  59 LetterSet    := /([^)\\]|\\.)+/
  60 Affix        := AffixClass AffixPattern+ Spacing
  61 AffixClass   := "%prefix" | "%suffix"
  62 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
  63 CharList     := ( LetterSetVar | WildCardVar | AffixChar )+
  64 NullChar     := "*"
  65 AffixChar    := /([^!?\s*\\]|\\[^ ])+/
  66 
  67 # Whitespace and Comments
  68 
  69 Spacing      := Space? Comment*
  70 Space        := /\s+/
  71 Comment      := ( LineComment | BlockComment ) Space?
  72 LineComment  := /;.*$/
  73 BlockComment := "#|" /([^|\\]|\\.|\|(?!#))*/ "|#"
  74 
  75 # Literals
  76 
  77 DefOp        := ":=" Spacing
  78 AddOp        := ":+" Spacing
  79 Identifier   := /[^\s.:<=&,#[]$()>!^\/]+/
  80 Dot          := "." Spacing
  81 And          := "&" Spacing
  82 Comma        := "," Spacing
  83 LBrack       := "[" Spacing
  84 RBrack       := "]" Spacing
  85 DLOpen       := "<!" Spacing
  86 DLClose      := "!>" Spacing
  87 CLOpen       := "<" Spacing
  88 CLClose      := ">" Spacing
  89 Ellipsis     := "..." Spacing
  90 DQString     := /"([^"\\]|\\.)*"/ Spacing
  91 TQString     := /"""([^"\\]|\\.|"(?!")|""(?!"))*"""/ Spacing
  92 QSymbol      := "'" Identifier Spacing
  93 Regex        := "^" /([^$\\]|\\.)*/ "$"

Notes for implementation

DocStrings

Multiple docstrings may be present on a single definition, but only the first one encountered on a definition is considered its primary docstring, and implementers are free to store or discard the other doc strings as they see fit. Docstrings on type-addenda should be concatenated with a newline to the previous docstring(s), or appended to a list of docstrings, associated with the type.

Comments

The syntax description above allows for comments anywhere that separating whitespace is allowed (not including those within strings, regular expressions, letter sets, etc.). This includes within a dotted attribute path (e.g., [ SYNSEM #| comment |# . #| comment |# LOCAL ... ]), although grammar developers may want to use this flexibility sparingly.

Questions

1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? (see this thread on the 'developers' mailing list)

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?

-  ⇤ ← Revision 2 as of 2011-06-28 15:36:21 → 
  Size: 60
  Editor: StephanOepen
  Comment:
+   ← Revision 12 as of 2018-08-08 03:06:23 → ⇥
  Size: 6947
  Editor: MichaelGoodman
  Comment: Updated BNF for selected docstring pattern; removed obsolete discussion
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
-Type Description Language
+Type Description Language and other aspects of DELPH-IN Joint Reference Formalism

== Case Sensitivity ==

=== Case Sensitive ===

 * Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not `CARG`s)

=== Case Insensitive ===
 
 * Everything in TDL not inside of quotes.
 * Lexicon look-up.
   * Proper names?
   * Acronyms?

 ... approach these with token-mapping (preserve the info, and then downcase anyway)

=== Unknown ===

 * Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)

Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)

== Doc Strings ==

TDL definitions allow documentation strings ("docstrings") before any term in the top-level conjunction or before the terminating dot (`.`) character:

{{{
n_-_c_le := n_intr_lex_entry
"""Intransitive count noun (icn)    
<ex>The dog barked.
<nex>Much dog bark.""".
}}}

== TDL File Syntax ==

{{{#!highlight ruby
# File Contents

TdlTypeFile  := ( TypeDef | TypeAddendum | Spacing )* EOF
TdlRuleFile  := ( LexRuleDef | MorphSet | Spacing )* EOF

# Types and Lexical Rules

TypeDef      := Type DefOp TypedDefBody Dot
Typeddendum  := Type AddOp ( DefBody | DocString ) Dot
LexRuleDef   := LexRuleId DefOp Affix? TypedDefBody Dot
LexRuleId    := Identifier Spacing

# Definition Bodies (top-level conjunctions of terms)
#
#   The body of a type definition, type addendum, or lexical rule is
#   essentially a conjunction of Terms, but there are two special features
#   of top-level conjunctions (i.e., those outside of an AVM):
#
#     (1) """DocStrings""" may precede any Term or the final Dot (.)
#
#     (2) TypeDef and LexRuleDef require at least one Type (supertype)
#         somewhere in the conjunction (conventionally the first Term)

TypedDefBody := ( TopLevelConj And )? DocString? Type ( And TopLevelConj )? DocString?
DefBody      := TopLevelConj DocString?
TopLevelConj := DocString? Term ( And DocString? Term )*
DocString    := TQString

# Terms and Conjunctions

Conjunction  := Term ( And Term )*
Type         := Identifier Spacing
Term         := ( Type
                | FeatureTerm
                | DiffList
                | ConsList
                | Coreference
                | DQString
                | QSymbol
                | Regex
                )
FeatureTerm  := LBrack AttrVals? RBrack
AttrVals     := AttrVal ( Comma AttrVal )*
AttrVal      := Attribute ( Dot Attribute )* Conjunction
Attribute    := Identifier Spacing
DiffList     := DLOpen Conjunctions? DLClose
ConsList     := CLOpen ( Conjunctions ConsEnd? )? CLClose
ConsEnd      := Comma Ellipsis | Dot Conjunction
Conjunctions := Conjunction ( Comma Conjunction )*
Coreference  := "#" Identifier Spacing

# Letter-sets, Wild-cards, and Affixes

MorphSet     := "%" "(" ( LetterSetDef | WildCardDef ) ")"
LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
WildCardDef  := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
LetterSetVar := /![^ ]/
WildCardVar  := /\?[^ ]/
LetterSet    := /([^)\\]|\\.)+/
Affix        := AffixClass AffixPattern+ Spacing
AffixClass   := "%prefix" | "%suffix"
AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
CharList     := ( LetterSetVar | WildCardVar | AffixChar )+
NullChar     := "*"
AffixChar    := /([^!?\s*\\]|\\[^ ])+/

# Whitespace and Comments

Spacing      := Space? Comment*
Space        := /\s+/
Comment      := ( LineComment | BlockComment ) Space?
LineComment  := /;.*$/
BlockComment := "#|" /([^|\\]|\\.|\|(?!#))*/ "|#"

# Literals

DefOp        := ":=" Spacing
AddOp        := ":+" Spacing
Identifier   := /[^\s.:<=&,#[]$()>!^\/]+/
Dot          := "." Spacing
And          := "&" Spacing
Comma        := "," Spacing
LBrack       := "[" Spacing
RBrack       := "]" Spacing
DLOpen       := "<!" Spacing
DLClose      := "!>" Spacing
CLOpen       := "<" Spacing
CLClose      := ">" Spacing
Ellipsis     := "..." Spacing
DQString     := /"([^"\\]|\\.)*"/ Spacing
TQString     := /"""([^"\\]|\\.|"(?!")|""(?!"))*"""/ Spacing
QSymbol      := "'" Identifier Spacing
Regex        := "^" /([^$\\]|\\.)*/ "$"
}}}

=== Notes for implementation ===

==== DocStrings ====

Multiple docstrings may be present on a single definition, but only the first one encountered on a definition is considered its primary docstring, and implementers are free to store or discard the other doc strings as they see fit. Docstrings on type-addenda should be concatenated with a newline to the previous docstring(s), or appended to a list of docstrings, associated with the type. 

==== Comments ====

The syntax description above allows for comments anywhere that separating whitespace is allowed (not including those within strings, regular expressions, letter sets, etc.). This includes within a dotted attribute path (e.g., `[ SYNSEM #| comment |# . #| comment |# LOCAL ... ]`), although grammar developers may want to use this flexibility sparingly.

== Questions ==

1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? (see [[http://lists.delph-in.net/archives/developers/2009/thread.html#1082|this thread]] on the 'developers' mailing list)

2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?


== Discussions ==

 * ParisDefeasibleConstraints
 * StanfordDefaults
 * [[http://www.delph-in.net/2017/append.pdf|(Diff)List Appends in TDL]]
 * [[http://lists.delph-in.net/archives/developers/2006/000419.html|Mailing list discussion about docstrings (Feb 2006)]]
 * [[http://lists.delph-in.net/archives/developers/2006/000550.html|Mailing list discussion about type addenda (Jul 2006)]]
 * [[http://lists.delph-in.net/archives/developers/2007/000762.html|Mailing list discussion about docstrings (Mar 2007)]]
 * [[http://lists.delph-in.net/archives/developers/2007/000868.html|Mailing list discussion about docstrings (Sep 2007)]]
 * [[http://lists.delph-in.net/archives/developers/2008/001037.html|Mailing list discussion about the :+ and :< operators (Nov 2008)]]
 * [[http://lists.delph-in.net/archives/developers/2009/001082.html|Mailing list discussion about regular expressions in TDL (Jan 2009)]]
 * [[http://lists.delph-in.net/archives/developers/2018/002754.html|Mailing list discussion about TDL syntax (Jul 2018)]]
 * [[http://lists.delph-in.net/archives/developers/2018/002792.html|Mailing list discussion about docstrings (Aug 2018)]]

Wiki

Page

User