Skip to content
FrancisBond edited this page Oct 25, 2013 · 14 revisions

SC corpus sense annotation alignment

SC corpus has now been automatically aligned to the SemCor sense annotations. The alignment process found realpred or gpred matches for 96.3% of SemCor word forms. The remaining word forms were either mapping to elements treated by the ERG as semantically empty (e.g., copulas), or treated as MWE by the ERG but not by WordNet (‘such+as’, ‘right+then’, ‘not+even’)

However, only 36.3% of the ERG predicates emerged as sense-tagged: 55.6% of realpreds and 11.3% of gpreds.

The alignment program generated modified DMRS files, with an optional <sense> element:

<node nodeid='10002' cfrom='0' cto='6'>
   <realpred lemma='first' pos='a' sense='1'/>
   <sortinfo cvarsort='e' sf='prop' tense='untensed' mood='indicative' prog='minus' perf='minus'/>
   <sense wn='2' lexsn='5:00:00:ordinal:00' wn_lemma='first'/>
</node>

The sense-annotated DMRS output is available here

There is also an updated dmrs.dtd and SemCoreMapping.csv: a mapping from each SC corpus item to the annotated SemCor 3.0 concordance, context, and sentence number.

Semcor data from Rada Mihalcea

⚠️

Clone this wiki locally