9398
Comment: Add options for docstring positioning
|
6947
Updated BNF for selected docstring pattern; removed obsolete discussion
|
Deletions are marked like this. | Additions are marked like this. |
Line 27: | Line 27: |
TDL types allow a doc string: | TDL definitions allow documentation strings ("docstrings") before any term in the top-level conjunction or before the terminating dot (`.`) character: |
Line 29: | Line 30: |
n_-_c_le := n_intr_lex_entry & "Intransitive count noun (icn) |
n_-_c_le := n_intr_lex_entry """Intransitive count noun (icn) |
Line 32: | Line 33: |
<nex>Much dog bark.". | <nex>Much dog bark.""". |
Line 40: | Line 41: |
TdlTypeFile := ( TypeDef | Spacing )* EOF | TdlTypeFile := ( TypeDef | TypeAddendum | Spacing )* EOF |
Line 45: | Line 46: |
TypeDef := Type ( AvmDef | AvmAddendum ) Dot AvmDef := DefOp DefBody AvmAddendum := AddOp ( DefBody | DocString? Conjunction | DocString ) LexRuleDef := LexRuleId DefOp Affix? DefBody Dot DefBody := Supertypes ( And DocString? Conjunction | DocString? ) Supertypes := Type ( And Type )* |
TypeDef := Type DefOp TypedDefBody Dot Typeddendum := Type AddOp ( DefBody | DocString ) Dot LexRuleDef := LexRuleId DefOp Affix? TypedDefBody Dot LexRuleId := Identifier Spacing # Definition Bodies (top-level conjunctions of terms) # # The body of a type definition, type addendum, or lexical rule is # essentially a conjunction of Terms, but there are two special features # of top-level conjunctions (i.e., those outside of an AVM): # # (1) """DocStrings""" may precede any Term or the final Dot (.) # # (2) TypeDef and LexRuleDef require at least one Type (supertype) # somewhere in the conjunction (conventionally the first Term) TypedDefBody := ( TopLevelConj And )? DocString? Type ( And TopLevelConj )? DocString? DefBody := TopLevelConj DocString? TopLevelConj := DocString? Term ( And DocString? Term )* DocString := TQString # Terms and Conjunctions Conjunction := Term ( And Term )* |
Line 54: | Line 71: |
LexRuleId := Identifier Spacing DocString := DQString Conjunction := Term ( And Term )* |
|
Line 97: | Line 111: |
BlockComment := "#|" /([^|\\]|\\.|\|[^#])*/ "|#" | BlockComment := "#|" /([^|\\]|\\.|\|(?!#))*/ "|#" |
Line 115: | Line 129: |
TQString := /"""([^"\\]|\\.|"(?!")|""(?!"))*"""/ Spacing | |
Line 119: | Line 134: |
=== Notes for implementation === | |
Line 120: | Line 136: |
== Docstring Revision == | ==== DocStrings ==== |
Line 122: | Line 138: |
Currently docstrings are regular strings that appear before a Term in an !TypeDef, presumably after the list of supertypes: | Multiple docstrings may be present on a single definition, but only the first one encountered on a definition is considered its primary docstring, and implementers are free to store or discard the other doc strings as they see fit. Docstrings on type-addenda should be concatenated with a newline to the previous docstring(s), or appended to a list of docstrings, associated with the type. |
Line 124: | Line 140: |
{{{ type := supertype1 & supertype2 & "Docstring" [ ... ]. }}} |
==== Comments ==== |
Line 130: | Line 142: |
But this syntax is not supported in all processors (namely PET), and the others allow variations. At the 2018 summit in Paris (see DiderotSchedule), there was a decision to distinguish docstrings from other strings by using triple-quotes (three double-quotes in a row, similar to Python), which additionally allows quotes to appear inside the docstring. {{{ type := supertype1 & supertype2 & """Docstring""" [ ... ]. }}} This changed the !DocString production like so: {{{#!highlight ruby DocString := /"""([^"\\]|\\.|"[^"]|""[^"])*"""/ Spacing }}} (note that an unescaped quote cannot appear directly before the ending triple-quotes (or rather, it can, but the string would be terminated early and there'd be an extra quote character in the stream)) There are remaining questions about their placement. === Option 1: Placed before any Term with multiple docstrings per type allowed === Where multiple docstrings occur, the type's final docstring is the concatenation of them. Example: {{{ type := """here""" supertype1 & """here""" supertype2 & """here, too""" [ ... ] """maybe here?""". }}} This can be implemented by changing the following producitons: {{{#!highlight ruby TypeDef := Type ( AvmDef | AvmAddendum ) DocString? Dot # maybe LexRuleDef := LexRuleId DefOp Affix? DefBody DocString? Dot # maybe AvmAddendum := AddOp ( DefBody | Conjunction | DocString ) DefBody := Supertypes ( And Conjunction )? Supertypes := DocString? Type ( And DocString? Type )* Term := Docstring? ( Type | FeatureTerm | DiffList | ConsList | Coreference | DQString | QSymbol | Regex ) }}} === Option 2: Placed before any Term with only one docstring per type allowed === Example: {{{ type := supertype1 & """just one, somewhere""" supertype2 & [ ... ]. }}} This is more complicated to describe as production rules (need to duplicate several productions; some for use before docstring is encountered, then others for use after), but the implementation may be simple (just set a flag after reading a docstring). === Option 3: Once, after the list of supertypes and before any feature list === Example: {{{ type := supertype1 & supertype2 & """just one, here""" [ ... ]. type2 := supertype1 & """what about this?*** [ ... ] & supertype2. }}} This is not hard to implement. If it only needs to appear after *a* list of supertypes (both examples above), it's the same as in the full production list above (but other supertypes could appear after a feature list, for instance). If one wants to ensure that all supertypes appear before any docstring or feature list (only the first example above), then we need to duplicate the Conjunction and Term productions to disallow Types at the top level. If that's something desired, it would look like this: {{{#!highlight AvmAddendum := AddOp ( DefBody | DocString? NoTypeConj | DocString ) DefBody := Supertypes DocString? ( And NoTypeConj )? NoTypeConj := NoTypeTerm ( And NoTypeTerm )* NoTypeTerm := ( FeatureTerm | DiffList | ConsList | Coreference | DQString | QSymbol | Regex ) }}} === Option 4: Once, immediately after the typedef or addendum operators === Example: {{{ type := """just one, here""" supertype1 & supertype2 & [ ... ]. type := """ example with multiple lines """ supertype1 & supertype2 & [ ... ]. }}} This is the simplest to implement, and the !DefBody and !Supertypes productions would be unnecessary (unless we still want supertypes to appear first): {{{#!highlight ruby AvmDef := DefOp DocString? Conjunction AvmAddendum := AddOp ( DocString? Conjunction | DocString ) LexRuleDef := LexRuleId DefOp DocString? Affix? Conjunction Dot }}} Previously some did not like it for aesthetic reasons, though (although that is subjective). |
The syntax description above allows for comments anywhere that separating whitespace is allowed (not including those within strings, regular expressions, letter sets, etc.). This includes within a dotted attribute path (e.g., `[ SYNSEM #| comment |# . #| comment |# LOCAL ... ]`), although grammar developers may want to use this flexibility sparingly. |
Line 253: | Line 146: |
1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? | 1. The `^` character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? (see [[http://lists.delph-in.net/archives/developers/2009/thread.html#1082|this thread]] on the 'developers' mailing list) |
Line 256: | Line 149: |
3. When supertypes are required (e.g., on a !TypeDef), must they appear before other Terms in the Conjunction? (see [[#Docstring_Revision]] above) 4. Should the (deprecated or repurposed) subtype operator (`:<`) be included in the syntax description? 5. Is variation allowed with regards to the position of docstrings? (see [[#Docstring_Revision]] above) 6. Are spaces allowed inside a feature path? Comments? {{{ type := supertype & [ ATTR1 . ; comment here? ATTR2 value ]; }}} For that matter, are comments allowed anywhere that whitespace is (except maybe letter-sets and lex-rule affix patterns)? |
|
Line 278: | Line 156: |
* [[http://lists.delph-in.net/archives/developers/2006/000419.html|Mailing list discussion about docstrings (Feb 2006)]] * [[http://lists.delph-in.net/archives/developers/2006/000550.html|Mailing list discussion about type addenda (Jul 2006)]] * [[http://lists.delph-in.net/archives/developers/2007/000762.html|Mailing list discussion about docstrings (Mar 2007)]] * [[http://lists.delph-in.net/archives/developers/2007/000868.html|Mailing list discussion about docstrings (Sep 2007)]] * [[http://lists.delph-in.net/archives/developers/2008/001037.html|Mailing list discussion about the :+ and :< operators (Nov 2008)]] * [[http://lists.delph-in.net/archives/developers/2009/001082.html|Mailing list discussion about regular expressions in TDL (Jan 2009)]] * [[http://lists.delph-in.net/archives/developers/2018/002754.html|Mailing list discussion about TDL syntax (Jul 2018)]] * [[http://lists.delph-in.net/archives/developers/2018/002792.html|Mailing list discussion about docstrings (Aug 2018)]] |
Type Description Language and other aspects of DELPH-IN Joint Reference Formalism
Case Sensitivity
Case Sensitive
Things inside quotes (NB: strings passed from TFS world into MRS can be treated as case insensitive in MRS processing (i.e. as predicate symbols, but not CARGs)
Case Insensitive
- Everything in TDL not inside of quotes.
- Lexicon look-up.
- Proper names?
- Acronyms?
- .. approach these with token-mapping (preserve the info, and then downcase anyway)
Unknown
- Orthographic subrules (agree: case sensitive, ACE: [intended] case insensitive)
Notes: Arguments for case insensitive include shouting (call caps); Arguments for case sensitive include the use of upper case vowels in vowel harmony languages (linguistic representations, not orthography)
Doc Strings
TDL definitions allow documentation strings ("docstrings") before any term in the top-level conjunction or before the terminating dot (.) character:
n_-_c_le := n_intr_lex_entry """Intransitive count noun (icn) <ex>The dog barked. <nex>Much dog bark.""".
TDL File Syntax
1 # File Contents
2
3 TdlTypeFile := ( TypeDef | TypeAddendum | Spacing )* EOF
4 TdlRuleFile := ( LexRuleDef | MorphSet | Spacing )* EOF
5
6 # Types and Lexical Rules
7
8 TypeDef := Type DefOp TypedDefBody Dot
9 Typeddendum := Type AddOp ( DefBody | DocString ) Dot
10 LexRuleDef := LexRuleId DefOp Affix? TypedDefBody Dot
11 LexRuleId := Identifier Spacing
12
13 # Definition Bodies (top-level conjunctions of terms)
14 #
15 # The body of a type definition, type addendum, or lexical rule is
16 # essentially a conjunction of Terms, but there are two special features
17 # of top-level conjunctions (i.e., those outside of an AVM):
18 #
19 # (1) """DocStrings""" may precede any Term or the final Dot (.)
20 #
21 # (2) TypeDef and LexRuleDef require at least one Type (supertype)
22 # somewhere in the conjunction (conventionally the first Term)
23
24 TypedDefBody := ( TopLevelConj And )? DocString? Type ( And TopLevelConj )? DocString?
25 DefBody := TopLevelConj DocString?
26 TopLevelConj := DocString? Term ( And DocString? Term )*
27 DocString := TQString
28
29 # Terms and Conjunctions
30
31 Conjunction := Term ( And Term )*
32 Type := Identifier Spacing
33 Term := ( Type
34 | FeatureTerm
35 | DiffList
36 | ConsList
37 | Coreference
38 | DQString
39 | QSymbol
40 | Regex
41 )
42 FeatureTerm := LBrack AttrVals? RBrack
43 AttrVals := AttrVal ( Comma AttrVal )*
44 AttrVal := Attribute ( Dot Attribute )* Conjunction
45 Attribute := Identifier Spacing
46 DiffList := DLOpen Conjunctions? DLClose
47 ConsList := CLOpen ( Conjunctions ConsEnd? )? CLClose
48 ConsEnd := Comma Ellipsis | Dot Conjunction
49 Conjunctions := Conjunction ( Comma Conjunction )*
50 Coreference := "#" Identifier Spacing
51
52 # Letter-sets, Wild-cards, and Affixes
53
54 MorphSet := "%" "(" ( LetterSetDef | WildCardDef ) ")"
55 LetterSetDef := "letter-set" Space? "(" LetterSetVar Space LetterSet ")"
56 WildCardDef := "wild-card" Space? "(" WildCardVar Space LetterSet ")"
57 LetterSetVar := /![^ ]/
58 WildCardVar := /\?[^ ]/
59 LetterSet := /([^)\\]|\\.)+/
60 Affix := AffixClass AffixPattern+ Spacing
61 AffixClass := "%prefix" | "%suffix"
62 AffixPattern := Space? "(" ( NullChar | CharList ) Space CharList ")"
63 CharList := ( LetterSetVar | WildCardVar | AffixChar )+
64 NullChar := "*"
65 AffixChar := /([^!?\s*\\]|\\[^ ])+/
66
67 # Whitespace and Comments
68
69 Spacing := Space? Comment*
70 Space := /\s+/
71 Comment := ( LineComment | BlockComment ) Space?
72 LineComment := /;.*$/
73 BlockComment := "#|" /([^|\\]|\\.|\|(?!#))*/ "|#"
74
75 # Literals
76
77 DefOp := ":=" Spacing
78 AddOp := ":+" Spacing
79 Identifier := /[^\s.:<=&,#[]$()>!^\/]+/
80 Dot := "." Spacing
81 And := "&" Spacing
82 Comma := "," Spacing
83 LBrack := "[" Spacing
84 RBrack := "]" Spacing
85 DLOpen := "<!" Spacing
86 DLClose := "!>" Spacing
87 CLOpen := "<" Spacing
88 CLClose := ">" Spacing
89 Ellipsis := "..." Spacing
90 DQString := /"([^"\\]|\\.)*"/ Spacing
91 TQString := /"""([^"\\]|\\.|"(?!")|""(?!"))*"""/ Spacing
92 QSymbol := "'" Identifier Spacing
93 Regex := "^" /([^$\\]|\\.)*/ "$"
Notes for implementation
DocStrings
Multiple docstrings may be present on a single definition, but only the first one encountered on a definition is considered its primary docstring, and implementers are free to store or discard the other doc strings as they see fit. Docstrings on type-addenda should be concatenated with a newline to the previous docstring(s), or appended to a list of docstrings, associated with the type.
Comments
The syntax description above allows for comments anywhere that separating whitespace is allowed (not including those within strings, regular expressions, letter sets, etc.). This includes within a dotted attribute path (e.g., [ SYNSEM #| comment |# . #| comment |# LOCAL ... ]), although grammar developers may want to use this flexibility sparingly.
Questions
1. The ^ character is used to signal "expanded-syntax" in the LKB, but is this only used for regular expressions? Are there other expanded syntaxes? Do non-LKB processors support them? (see this thread on the 'developers' mailing list)
2. Are instances distinguishable from types? Are they (other other entities) restricted to having exactly one supertype?