Heuristics for efficient treebanking



Prefer Simpler

Technical choices

Complex proper names


Capitalized words in name

Profession modifier

Native names preferred when available

Proper names and punctuation

PP/modifier attachment

Temporal modifiers

Complex compound nouns


Passive verb vs. adjective



Measure phrases

Quotations with explicit attribution

Partitive NPs

Modification in noun phrases

Notes from Tomar meeting

  1. Where lexical ambiguity is hard to decide (e.g. even-deg vs even-conj), choose based on frequency in redwoods/deepbank
  2. Disprefer modifier attachment to semantically vacuous heads e.g. attach adverbs to hiring..., not be hiring...

  3. For there-copula:
    1. Avoid double-object choice and avoid modification of there-cop
    2. Also prefer low attachment of modifier after obj NP
    3. Accept extraction of PP for there-cop as is
  4. When choice of verb-particle or verb-mod as in go away, if you can modify the `particle' as in go far away, it is not verb-particle.

  5. When choice of spr-hd or mod-hd for Adv-Adj, choose mod-hd
  6. Avoid adv-add except for not

  7. When WH-Q of form NP-be-NP [EMB: guessing this is choose subj-head; Dan please confirm]
  8. For complement of saying, if there's a main clause option for the quoted material choose it:
    • |"Who did Kim hire" asked Mary| not |*Who Kim hired, asked Mary|
  9. No free relatives
  10. Attach three-dot punct as low as possible
  11. Reject ellipsis
  12. For ndash between clauses, use run-on
  13. For degree specifiers, when there's a choice, take the shortest lexent type name
  14. Attach subord clause high [EMB: subordinate clauses are understood as clauses with all arguments overt; do not include in+order+to purposives, etc.]

ErgTreebankingGuidelines (last edited 2020-07-23 14:50:48 by AlexandreRademaker)

(The DELPH-IN infrastructure is hosted at the University of Oslo)