Tanaka Corpus Development Data

Change

Date

Parse Coverage

Transfer Coverage

Generation Coverage

End-to-End Coverage

BLEU

Oracle

Initial run

06/05

2779 / 4500 (61.76%)

549 / 2779 (19.76%)

383 / 549 (69.76%)

383 / 4500 (8.51%)

0.1433

0.2234

Fixed _eba_c_rel etc.

06/07

2779 / 4500 (61.76%)

662 / 2779 (23.82%)

478 / 662 (72.21%)

478 / 4500 (10.62%)

0.1436

0.2302

Rel name debugging + handcrafted rules

06/09

2779 / 4500 (61.76%)

679 / 2779 (24.43%)

487 / 679 (71.72%)

487 / 4500 (10.82%)

0.1470

0.2335

Generic entries w/o にる,だける

06/14

3014 / 4500 (66.98%)

691 / 3014 (22.93%)

491 / 691 (71.06%)

491 / 4500 (10.91%)

0.1404

0.2300

New vn handling

06/14

3014 / 4500 (66.98%)

703 / 3014 (23.32%)

500 / 703 (71.12%)

500 / 4500 (11.11%)

0.1369

0.2264

Generic Edict rules

06/15

3014 / 4500 (66.98%)

722 / 3014 (23.95%)

520 / 722 (72.02%)

520 / 4500 (11.56%)

0.1337

0.2231

Removed _ga_5_rel,_iru_6_rel,_iru_7_rel; fixed conjunction_mtr

06/15

3068 / 4500 (68.18%)

777 / 3068 (25.33%)

528 / 777 (67.95%)

528 / 4500 (11.73%)

0.1336

0.2232

MtJaen/MtJaenTanaka (last edited 2011-10-08 21:12:11 by localhost)

(The DELPH-IN infrastructure is hosted at the University of Oslo)