Skip to content

MtJaenTanaka

PetterHaugereid edited this page Aug 16, 2011 · 335 revisions

Tanaka Corpus Development Data (rtc000, rtc001, rtc002)

Changes Jaen Rev. Date Parse Coverage Transfer Coverage Generation Coverage End-to-End Coverage NEVA Oracle F1
settings as below, minus bad surface rules (aggressive version) 9388 2011/08/11 3726/4500 (82.8%) 2267/3726 (60.8%) 1247/2267 (55.0%) 1247/4500 (27.7%) 19.0 25.0 22.5
sg_relratio = 0.2, mwe_relratio = 1, sg_thresh = 0.15, mwe_thresh = 0.25 9388 2011/08/11 3726/4500 (82.8%) 2281/3726 (61.2%) 1252/2281 (54.9%) 1252/4500 (27.8%) 19.0 25.0 22.6
Passive 9388 2011/06/01 3726/4500 (82.8%) 2259/3726 (60.6%) 1240/2259 (54.9%) 1240/4500 (27.6%) 19.0 25.0 22.5
Relation prob > 0.1 9336 2011/05/31 3725/4499 (82.8%) 2295/3725 (61.6%) 1276/2295 (55.6%) 1276/4499 (28.4%) 18.7 24.3 22.5
MT version (more debugging + zero pronouns) 9343 2011/05/24 3726/4500 (82.8%) 2262/3726 (60.7%) 1246/2262 (55.1%) 1246/4500 (27.7%) 18.3 24.0 22.1
debugged alternation bug 9336 2011/05/20 3726/4500 (82.8%) 2079/3726 (55.8%) 1121/2079 (53.9%) 1121/4500 (24.9%) 18.0 23.7 20.9
+ unknown words / lower threshold on omtr 0.4>0.2 9336 2011/05/18 3726/4500 (82.8%) 1984/3726 (53.2%) 1086/1984 (54.7%) 1086/4500 (24.1%) 18.3 24.0 20.8
+ phrase table single rules (all MWEs) 9336 2011/05/16 3726/4500 (82.8%) 1933/3726 (51.9%) 1055/1933 (54.6%) 1055/4500 (23.4%) 18.3 23.7 20.6
+ phrase table single rules (some missing MWEs) 9319 2011/05/12 3726/4500 (82.8%) 1894/3726 (50.8%) 1037/1894 (54.8%) 1037/4500 (23.0%) 18.0 24.0 20.2
- phrase table single rules (some missing MWEs) 9319 2011/05/12 3725/4499 (82.8%) 1735/3725 (46.6%) 960/1735 (55.3%) 960/4499 (21.3%) 18.7 24.0 19.9
Selected rules for each profile 9319 2011/04/29 3726/4500 (82.8%) 1772/3726 (47.6%) 976/1772 (55.1%) 976/4500 (21.7%) 18.3 24.0 19.9
Selected transfer rules 9308 2011/04/28 3726/4500 (82.8%) 1772/3726 (47.6%) 973/1772 (54.9%) 973/4500 (21.6%) 18.3 24.0 19.8
All anymalign, small transfer model 9296 2011/04/21 3726/4500 (82.8%) 1788/3726 (48.0%) 983/1788 (55.0%) 983/4500 (21.8%) 18.3 23.3 19.9
New prepositions, - pcorp omtrs, + half anymalign 2011/04/20 3726/4500 (82.8%) 1779/3726 (47.7%) 955/1779 (53.7%) 955/4500 (21.2%) 18.3 23.7 19.7
Updated Edict, + MWEs 9140 2011/04/06 3725/4500 (82.8%) 1644/3725 (44.1%) 906/1644 (55.1%) 906/4500 (20.1%) 18.3 24.0 19.2
Updated Edict, - MWEs 9140 2011/04/04 3726/4500 (82.8%) 1600/3726 (42.9%) 874/1600 (54.6%) 874/4500 (19.4%) 18.0 23.7 18.7

Tanaka Corpus Test Data (rtc003, rtc004, rtc005)

Changes Date Parse Coverage Transfer Coverage Generation Coverage End-to-End Coverage NEVA Oracle F1
ACL v. baseline 2011/06/09 3614/4500 (80.3%) 2117/3614 (58.6%) 1163/2117 (54.9%) 1163/4500 (25.8%) 17.7 24.0 21.0
MT v. + complex MWE rules (relratio = 1, threshold = 0.35) 2011/06/07 3613/4499 (80.3%) 2151/3613 (59.5%) 1172/2151 (54.5%) 1172/4499 (26.1%) 17.7 23.7 21.1
MT v. + complex MWE rules (relratio = 0.8, threshold = 0.25) 2011/06/07 3614/4500 (80.3%) 2154/3614 (59.6%) 1169/2154 (54.3%) 1169/4500 (26.0%) 18.0 24.0 21.3
MT v. + complex MWE rules (relratio = 0.2, threshold = 0.15) 2011/06/04 3614/4500 (80.3%) 2157/3614 (59.7%) 1158/2157 (53.7%) 1158/4500 (25.7%) 17.7 23.7 21.0
MT version (more debugging + zero pronouns) 2011/05/24 3612/4498 (80.3%) 2172/3612 (60.1%) 1175/2172 (54.1%) 1175/4498 (26.1%) 17.7 24.0 21.1
debugged alternation bug 2011/05/20 3613/4499 (80.3%) 1983/3613 (54.9%) 1053/1983 (53.1%) 1053/4499 (23.4%) 17.7 24.0 20.1
+ unknown words / lower threshold on omtr 0.4>0.2 2011/05/18 3614/4500 (80.3%) 1861/3614 (51.5%) 1001/1861 (53.8%) 1001/4500 (22.2%) 18.0 24.3 19.9
+ phrase table single rules (all MWEs) 2011/05/16 3614/4500 (80.3%) 1854/3614 (51.3%) 986/1854 (53.2%) 986/4500 (21.9%) 17.7 23.7 19.6
+ n and adj MWEs from Moses and Anymalign 2011/05/08 3614/4500 (80.3%) 1704/3614 (47.1%) 900/1704 (52.8%) 900/4500 (20.0%) 18.0 24.0 18.9
+ PP MWEs from Moses and Anymalign 2011/05/08 3614/4500 (80.3%) 1659/3614 (45.9%) 877/1659 (52.9%) 877/4500 (19.5%) 18.0 24.0 18.7
+ MWEs (All MWEs from Moses and Anymalign) 2011/05/06 3614/4500 (80.3%) 1729/3614 (47.8%) 906/1729 (52.4%) 906/4500 (20.1%) 18.0 24.3 19.0
+ Verb MWEs from Moses and Anymalign 2011/05/06 3612/4498 (80.3%) 1688/3612 (46.7%) 885/1688 (52.4%) 885/4498 (19.7%) 17.7 23.3 18.6
- MWEs (Baseline for MWE paper) 2011/05/04 3613/4499 (80.3%) 1647/3613 (45.6%) 870/1647 (52.8%) 870/4499 (19.3%) 17.7 23.7 18.5
Corrected lexicon, + MWEs 2011/04/06 3614/4500 (80.3%) 1576/3614 (43.6%) 852/1576 (54.1%) 852/4500 (18.9%) 18.3 23.7 18.6
Corrected lexicon, - MWEs 2011/03/31 3614/4500 (80.3%) 1505/3614 (41.6%) 824/1505 (54.8%) 824/4500 (18.3%) 17.7 23.3 18.0
New Edict (with particle bug) + Auto MWEs 2011/03/29 3614/4500 (80.3%) 1526/3614 (42.2%) 830/1526 (54.4%) 830/4500 (18.4%) 18.0 23.7 18.2
Auto MWEs 2011/03/28 3612/4498 (80.3%) 1507/3612 (41.7%) 816/1507 (54.1%) 816/4498 (18.1%) 18.3 23.7 18.2
New proper and relative nouns 2011/03/20 3614/4500 (80.3%) 1503/3614 (41.6%) 815/1503 (54.2%) 815/4500 (18.1%) 17.7 24.0 17.9
Old cheap + tc + end2end 2011/03/14 3614/4500 (80.3%) 1487/3614 (41.1%) 785/1487 (52.8%) 785/4500 (17.4%) 17.3 23.3 17.4
Old cheap + rtc000 + end2end 2011/03/14 3614/4500 (80.3%) 1487/3614 (41.1%) 785/1487 (52.8%) 785/4500 (17.4%) 16.0 22.3 16.7
New cheap + end to end model 2011/03/10 3612/4498 (80.3%) 1252/3612 (34.7%) 664/1252 (53.0%) 664/4498 (14.8%) 18.0 24.3 16.2
New auto rules with parallel corpus rules 2011/03/08 3614/4500 (80.3%) 1487/3614 (41.1%) 785/1487 (52.8%) 785/4500 (17.4%) 15.7 22.0 16.5
New auto rules (no parallel corpus rules) 2011/03/07 3614/4500 (80.3%) 1374/3614 (38.0%) 739/1374 (53.8%) 739/4500 (16.4%) 15.3 21.7 15.9
New ERG 2011/03/07 3613/4499 (80.3%) 1374/3613 (38.0%) 730/1374 (53.1%) 730/4499 (16.2%) 15.7 22.0 15.9
Wikipedia rules (first batch) 2011/02/17 3614/4500 (80.3%) 1465/3614 (40.5%) 775/1465 (52.9%) 775/4500 (17.2%) 16.3 23.0 16.8
Debugging 2011/02/17 3613/4499 (80.3%) 1463/3613 (40.5%) 776/1463 (53.0%) 776/4499 (17.2%) 16.3 23.0 16.8
Modifications 2011/01/27 3613/4499 (80.3%) 1422/3613 (39.4%) 754/1422 (53.0%) 754/4499 (16.8%) 16.3 22.7 16.5
Multi-word expressions 2011/01/06 3614/4500 (80.3%) 1506/3614 (41.7%) 787/1506 (52.3%) 787/4500 (17.5%) 16.0 22.3 16.7
Parallel corpus rules (new batch) 2010/12/28 3613/4499 (80.3%) 1517/3613 (42.0%) 793/1517 (52.3%) 793/4499 (17.6%) 16.3 22.7 17.0
Parallel corpus rules (fixed possessives) 2010/12/15 3614/4500 (80.3%) 1499/3614 (41.5%) 789/1499 (52.6%) 789/4500 (17.5%) 16.3 22.3 16.9
New Auto Transfer (fixed possessives) 2010/12/14 3614/4500 (80.3%) 1375/3614 (38.0%) 739/1375 (53.7%) 739/4500 (16.4%) 15.3 21.7 15.9
Parallel corpus rules (wnjpn) 2010/12/06 3614/4500 (80.3%) 1518/3614 (42.0%) 719/1518 (47.4%) 719/4500 (16.0%) 0.0 0.0 0.0
New Auto Transfer 2010/12/02 3614/4500 (80.3%) 1370/3614 (37.9%) 670/1370 (48.9%) 670/4500 (14.9%) 0.0 0.0 0.0
TC Gen Model 2010/10/26 3613/4499 (80.3%) 1383/3613 (38.3%) 732/1383 (52.9%) 732/4499 (16.3%) 15.7 22.0 16.0
2010/09/23 3614/4500 (80.3%) 1384/3614 (38.3%) 731/1384 (52.8%) 731/4500 (16.2%) 14.3 20.7 15.2

Tanaka Corpus Development Data

Changes Date HG Rev PPID Parse Coverage Transfer Coverage Generation Coverage End-to-End Coverage NEVA Oracle F1
:wait=7200, :quantum=1200 2009/09/22 604@27 21502 3722 / 4499 (82.73%) 1496 / 3722 (40.19%) 809 / 1496 (54.08%) 809 / 4499 (17.98%) 14.90 20.70 16.30
:wait=3600, :quantum=600 2009/09/22 604@26 29895 3722 / 4499 (82.73%) 1496 / 3722 (40.19%) 810 / 1496 (54.14%) 810 / 4499 (18.00%) 14.94 20.74 16.33
:wait=3600, :quantum=600 2009/09/22 604@26 15939 3723 / 4500 (82.73%) 1497 / 3723 (40.21%) 811 / 1497 (54.18%) 811 / 4500 (18.02%) 14.94 20.74 16.34
T20000 2009/09/21 604@24 2265 3722 / 4499 (82.73%) 1496 / 3722 (40.19%) 811 / 1496 (54.21%) 811 / 4499 (18.03%) 14.94 20.74 16.34
T15000 2009/09/21 604@23 20118 3721 / 4499 (82.71%) 1495 / 3721 (40.18%) 811 / 1495 (54.25%) 811 / 4499 (18.03%) 14.94 20.74 16.34
T10000 2009/09/21 604@22 29341 3723 / 4500 (82.73%) 1497 / 3723 (40.21%) 810 / 1497 (54.11%) 810 / 4500 (18.00%) 14.94 20.74 16.33
+MRS_MODEL 2009/09/21 604@21 30836 3477 / 4200 (82.79%) 1407 / 3477 (40.47%) 765 / 1407 (54.37%) 765 / 4200 (18.21%) 14.94 20.72 16.41
-MRS_MODEL 2009/09/20 604@20 21485 3722 / 4499 (82.73%) 1496 / 3722 (40.19%) 787 / 1496 (52.61%) 787 / 4499 (17.49%) 14.92 20.41 16.11
MAX3, FB_CLEAN 2009/09/20 604@19 10091 3723 / 4500 (82.73%) 1497 / 3723 (40.21%) 810 / 1497 (54.11%) 810 / 4500 (18.00%) 14.94 20.74 16.33
+CHANGE 2009/09/18 585@16 19596 3722 / 4499 (82.73%) 1488 / 3722 (39.98%) 795 / 1488 (53.43%) 795 / 4499 (17.67%) 14.78 20.65 16.10
+CHANGE 2009/09/18 585@14 29791 3720 / 4500 (82.67%) 1310 / 3720 (35.22%) 705 / 1310 (53.82%) 705 / 4500 (15.67%) 15.50 21.36 15.58
+CHANGE 2009/09/18 565@17 22899 3723 / 4500 (82.73%) 1324 / 3723 (35.56%) 709 / 1324 (53.55%) 709 / 4500 (15.76%) 15.38 21.11 15.56
+CHANGE 2009/09/17 585@9 10748 3722 / 4499 (82.73%) 1491 / 3722 (40.06%) 797 / 1491 (53.45%) 797 / 4499 (17.72%) 15.05 20.75 16.28
+CHANGE 2009/09/17 585@8 19904 3722 / 4499 (82.73%) 1490 / 3722 (40.03%) 795 / 1490 (53.36%) 795 / 4499 (17.67%) 14.74 20.62 16.07
+CHANGE 2009/09/17 585@7 5196 3722 / 4499 (82.73%) 1496 / 3722 (40.19%) 811 / 1496 (54.21%) 811 / 4499 (18.03%) 14.94 20.74 16.34
+CHANGE 2009/09/17 585@13 22095 3721 / 4498 (82.73%) 1477 / 3721 (39.69%) 798 / 1477 (54.03%) 798 / 4498 (17.74%) 14.99 20.60 16.25
+CHANGE 2009/09/17 585@12 11178 3719 / 4498 (82.68%) 1471 / 3719 (39.55%) 797 / 1471 (54.18%) 797 / 4498 (17.72%) 14.68 20.40 16.06
+CHANGE 2009/09/16 585@6 16496 3723 / 4500 (82.73%) 1514 / 3723 (40.67%) 817 / 1514 (53.96%) 817 / 4500 (18.16%) 14.68 20.59 16.23
HG585 2009/09/15 585 8772 3715 / 4492 (82.70%) 1450 / 3715 (39.03%) 781 / 1450 (53.86%) 781 / 4492 (17.39%) 14.76 20.60 15.96
+MAX5, +T10000, -LONG_WAIT, +MOSES_2.05, -FEEDBACK 2009/09/15 3687 / 4457 (82.72%) 1436 / 3687 (38.95%) 776 / 1436 (54.04%) 776 / 4457 (17.41%) 14.76 20.66 15.98
+MAX3, +T10000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK 2009/09/14 3722 / 4499 (82.73%) 1496 / 3722 (40.19%) 809 / 1496 (54.08%) 809 / 4499 (17.98%) 14.94 20.73 16.32
+MAX3, +T5000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK 2009/09/14 3723 / 4500 (82.73%) 1497 / 3723 (40.21%) 811 / 1497 (54.18%) 811 / 4500 (18.02%) 14.94 20.74 16.34
+FEEDBACK, +MAX10, +T10000, -LONG_WAIT, +MOSES_2.05 2009/09/14 3721 / 4498 (82.73%) 1470 / 3721 (39.51%) 795 / 1470 (54.08%) 795 / 4498 (17.67%) 15.02 20.64 16.24
+MAX10, +T10000, -LONG_WAIT, +MOSES_2.05 2009/09/11 3721 / 4498 (82.73%) 1457 / 3721 (39.16%) 792 / 1457 (54.36%) 792 / 4498 (17.61%) 14.69 20.52 16.01
+FEEDBACK_SORTING 2009/09/03 3721 / 4498 (82.73%) 1371 / 3721 (36.84%) 741 / 1371 (54.05%) 741 / 4498 (16.47%) 14.60 20.39 15.48
+BURN_IN 2009/08/31 3719 / 4495 (82.74%) 1508 / 3719 (40.55%) 814 / 1508 (53.98%) 814 / 4495 (18.11%) 14.74 20.61 16.25
+FEEDBACK, +MAX3, +T5000, -LONG_WAIT, +MOSES_2.05 2009/08/29 3721 / 4497 (82.74%) 1518 / 3721 (40.80%) 826 / 1518 (54.41%) 826 / 4497 (18.37%) 14.75 20.32 16.36
+FEEDBACK, +MAX10, +T10000, +LONG_WAIT, +MOSES_2.05 2009/08/27 3654 / 4416 (82.74%) 1461 / 3654 (39.98%) 793 / 1461 (54.28%) 793 / 4416 (17.96%) 14.72 20.44 16.18
+MAX10, +T10000, +LONG_WAIT, +MOSES_2.05 2009/08/26-2 3722 / 4499 (82.73%) 1445 / 3722 (38.82%) 788 / 1445 (54.53%) 788 / 4499 (17.52%) 14.71 20.55 15.99
+FEEDBACK, +MAX10, +T5000, +LONG_WAIT, +MOSES_2.05 2009/08/26 3527 / 4256 (82.87%) 1414 / 3527 (40.09%) 769 / 1414 (54.38%) 769 / 4256 (18.07%) 14.64 20.34 16.17
+JACY_BARCELONA 2009/08/24 3594 / 4349 (82.64%) 1460 / 3594 (40.62%) 790 / 1460 (54.11%) 790 / 4349 (18.17%) 14.55 20.35 16.16
-BLACKLIST 2009/08/23-2 3722 / 4499 (82.73%) 1477 / 3722 (39.68%) 794 / 1477 (53.76%) 794 / 4499 (17.65%) 14.86 20.74 16.13
+T5000, -LONG_WAIT 2009/08/23 3716 / 4492 (82.72%) 1348 / 3716 (36.28%) 744 / 1348 (55.19%) 744 / 4492 (16.56%) 14.83 21.07 15.65
+MAX3 2009/08/19 3722 / 4499 (82.73%) 1364 / 3722 (36.65%) 765 / 1364 (56.09%) 765 / 4499 (17.00%) 15.11 21.21 16.00
+LONG_WAIT 2009/08/18-3 3722 / 4499 (82.73%) 1345 / 3722 (36.14%) 748 / 1345 (55.61%) 748 / 4499 (16.63%) 15.11 21.06 15.83
+MOSES_2.5, +MAX5, -LONG_WAIT 2009/08/18-2 3723 / 4500 (82.73%) 1356 / 3723 (36.42%) 752 / 1356 (55.46%) 752 / 4500 (16.71%) 15.05 21.19 15.84
-MOSES, +BLACKLIST 2009/08/18 3721 / 4498 (82.73%) 1354 / 3721 (36.39%) 756 / 1354 (55.83%) 756 / 4498 (16.81%) 14.99 21.35 15.85
+LEMMAS 2009/08/17 3720 / 4497 (82.72%) 1441 / 3720 (38.74%) 808 / 1441 (56.07%) 808 / 4497 (17.97%) 14.92 21.22 16.30
+T5000, +MOSES_2.25 2009/08/16 3721 / 4498 (82.73%) 1438 / 3721 (38.65%) 806 / 1438 (56.05%) 806 / 4498 (17.92%) 15.03 21.16 16.35
+MAX3 2009/08/12 3721 / 4498 (82.73%) 1543 / 3721 (41.47%) 816 / 1543 (52.88%) 816 / 4498 (18.14%) 13.88 19.33 15.72
-MRS_MODEL 2009/08/09 3493 / 4270 (81.80%) 1536 / 3493 (43.97%) 826 / 1536 (53.78%) 826 / 4270 (19.34%) 15.10 20.77 16.96
+T10000, +COMBINED_MTRS, +LONG_WAIT 2009/08/03 3721 / 4497 (82.74%) 1172 / 3721 (31.50%) 601 / 1172 (51.28%) 601 / 4497 (13.36%) 14.55 20.97 13.93
+BARCELONA_TEST3 2009/07/13 3723 / 4500 (82.73%) 1735 / 3723 (46.60%) 943 / 1735 (54.35%) 943 / 4500 (20.96%) 14.81 20.53 17.35
+BARCELONA_TEST2 2009/07/07 3722 / 4499 (82.73%) 1734 / 3722 (46.59%) 942 / 1734 (54.33%) 942 / 4499 (20.94%) 14.85 20.63 17.38
+BARCELONA_TEST 2009/07/03 3723 / 4500 (82.73%) 1733 / 3723 (46.55%) 939 / 1733 (54.18%) 939 / 4500 (20.87%) 14.81 20.71 17.32
+TC0906, -FEEDBACK2 2009/06/06 3721 / 4500 (82.69%) 1705 / 3721 (45.82%) 939 / 1705 (55.07%) 939 / 4500 (20.87%) 14.90 20.61 17.38
+FEEDBACK2 2009/06/04 3730 / 4500 (82.89%) 1698 / 3730 (45.52%) 927 / 1698 (54.59%) 927 / 4500 (20.60%) 14.77 20.53 17.21
+JACY_EXP, -FEEDBACK 2009/06/03 3730 / 4500 (82.89%) 1693 / 3730 (45.39%) 926 / 1693 (54.70%) 926 / 4500 (20.58%) 14.86 20.63 17.26
+FEEDBACK 2009/06/02 3717 / 4500 (82.60%) 1696 / 3717 (45.63%) 918 / 1696 (54.13%) 918 / 4500 (20.40%) 14.79 20.08 17.15
+PN_FIX, +JACY_SVN 2009/05/28 3717 / 4500 (82.60%) 1716 / 3717 (46.17%) 941 / 1716 (54.84%) 941 / 4500 (20.91%) 14.97 20.64 17.45
+LIKE 2009/05/21 3721 / 4500 (82.69%) 1719 / 3721 (46.20%) 944 / 1719 (54.92%) 944 / 4500 (20.98%) 14.97 20.64 17.47
+ZERO_FIX 2009/05/19 3721 / 4500 (82.69%) 1638 / 3721 (44.02%) 936 / 1638 (57.14%) 936 / 4500 (20.80%) 14.97 19.97 17.41
-TERG, -TERGDICT, +0902, +0902DICT 2009/05/18 3721 / 4500 (82.69%) 1631 / 3721 (43.83%) 721 / 1631 (44.21%) 721 / 4500 (16.02%) 16.56 21.56 16.29
+TERG, +TERGDICT 2009/05/16 3721 / 4500 (82.69%) 1565 / 3721 (42.06%) 624 / 1565 (39.87%) 624 / 4500 (13.87%) 15.62 21.62 14.69
+SVN 2009/05/14 3721 / 4500 (82.69%) 1600 / 3721 (43.00%) 723 / 1600 (45.19%) 723 / 4500 (16.07%) 16.26 22.63 16.16
+RELATIONAL_N2, +TAME, +NAKEREBA, +MOSES, +CVS_HEAD 2008/11/11 3599 / 4500 (79.98%) 1711 / 3599 (47.54%) 937 / 1711 (54.76%) 937 / 4500 (20.82%) 14.67 20.67 17.21
+T5000, +BOOT 2008/11/08 3541 / 4409 (80.31%) 1667 / 3541 (47.08%) 903 / 1667 (54.17%) 903 / 4409 (20.48%) 14.32 23.00 16.86
+UNKNOWN 2008/11/04 3606 / 4499 (80.15%) 1492 / 3606 (41.38%) 852 / 1492 (57.10%) 852 / 4499 (18.94%) 14.33 23.64 16.31
+LMC, +SEMI, -GIZA 2008/11/02 3607 / 4500 (80.16%) 1351 / 3607 (37.45%) 801 / 1351 (59.29%) 801 / 4500 (17.80%) 14.33 23.65 15.88
-LMD 2008/11/01 3606 / 4499 (80.15%) 988 / 3606 (27.40%) 681 / 988 (68.93%) 681 / 4499 (15.14%) 13.02 24.64 14.00
-LMC, +GIZA, +LMD 2008/10/31 3607 / 4500 (80.16%) 989 / 3607 (27.42%) 682 / 989 (68.96%) 682 / 4500 (15.16%) 14.37 24.64 14.75
-T10000, +CVS, +LMC 2008/10/30 3606 / 4499 (80.15%) 988 / 3606 (27.40%) 682 / 988 (69.03%) 682 / 4499 (15.16%) 14.69 24.64 14.92
+T10000 2008/10/26 3605 / 4499 (80.13%) 1095 / 3605 (30.37%) 724 / 1095 (66.12%) 724 / 4499 (16.09%) 13.67 24.29 14.78
+NO_AMBIGUOUS_V3, +RELATIONAL_N 2008/10/24 3607 / 4500 (80.16%) 1010 / 3607 (28.00%) 692 / 1010 (68.51%) 692 / 4500 (15.38%) 13.66 24.64 14.47
+GEN2, +GEDICT2, + NO_AMBIGUOUS_V2 2008/10/21 3625 / 4500 (80.56%) 872 / 3625 (24.06%) 605 / 872 (69.38%) 605 / 4500 (13.44%) 13.67 23.63 13.56
+NEW_TANAKA 2008/10/20 3484 / 4500 (77.42%) 882 / 3484 (25.32%) 618 / 882 (70.07%) 618 / 4500 (13.73%) 13.66 23.64 13.70
-UNKNOWN, +PN, +NO_AMBIGUOUS_V 2008/10/13 3550 / 4500 (78.89%) 911 / 3550 (25.66%) 636 / 911 (69.81%) 636 / 4500 (14.13%) 13.67 23.63 13.90
+UNKNOWN, -IN_DOMAIN 2008/10/11 3564 / 4499 (79.22%) 1288 / 3564 (36.14%) 822 / 1288 (63.82%) 822 / 4499 (18.27%) 12.36 21.66 14.75
+DISCOURSE, CORRECT_GRAMMAR 2008/09/28 3566 / 4500 (79.24%) 972 / 3566 (27.26%) 651 / 972 (66.98%) 651 / 4500 (14.47%) 12.67 21.97 13.51
+DISCOURSE 2008/09/25 3567 / 4500 (79.27%) 967 / 3567 (27.11%) 647 / 967 (66.91%) 647 / 4500 (14.38%) 12.67 22.32 13.47
+IN_DOMAIN 2008/07/22 3551 / 4500 (78.91%) 861 / 3551 (24.25%) 617 / 861 (71.66%) 617 / 4500 (13.71%) 13.65 23.28 13.68
+WA 2008/06/26 3551 / 4500 (78.91%) 866 / 3551 (24.39%) 625 / 866 (72.17%) 625 / 4500 (13.89%) 13.65 22.94 13.77
+VN3 2008/06/19 3543 / 4500 (78.73%) 861 / 3543 (24.30%) 610 / 861 (70.85%) 610 / 4500 (13.56%) 13.00 22.97 13.27
-STRICT_N, -STRICT_V, -VN, +VN2 2008/06/18 3543 / 4500 (78.73%) 857 / 3543 (24.19%) 598 / 857 (69.78%) 598 / 4500 (13.29%) 12.69 22.66 12.98
+PET, +PMODEL 2008/06/17 3543 / 4500 (78.73%) 882 / 3543 (24.89%) 612 / 882 (69.39%) 612 / 4500 (13.60%) 12.69 22.66 13.13
-NO_SPURIOUS 2008/06/16 3013 / 4499 (66.97%) 730 / 3013 (24.23%) 528 / 730 (72.33%) 528 / 4499 (11.74%) 12.68 22.32 12.19
-CONJ 2008/06/16 3068 / 4500 (68.18%) 742 / 3068 (24.19%) 510 / 742 (68.73%) 510 / 4500 (11.33%) 14.04 22.64 12.54
+NO_SPURIOUS, +CONJ 2008/06/15 3068 / 4500 (68.18%) 777 / 3068 (25.33%) 528 / 777 (67.95%) 528 / 4500 (11.73%) 13.36 22.32 12.49
+GEDICT 2008/06/15 3014 / 4500 (66.98%) 722 / 3014 (23.95%) 520 / 722 (72.02%) 520 / 4500 (11.56%) 13.37 22.31 12.40
+VN, +STRICT_N, +STRICT_V 2008/06/14 3014 / 4500 (66.98%) 703 / 3014 (23.32%) 500 / 703 (71.12%) 500 / 4500 (11.11%) 13.69 22.64 12.27
+GEN 2008/06/14 3014 / 4500 (66.98%) 691 / 3014 (22.93%) 491 / 691 (71.06%) 491 / 4500 (10.91%) 14.04 23.00 12.28
+IF/THEN 2008/06/09 2779 / 4500 (61.76%) 679 / 2779 (24.43%) 487 / 679 (71.72%) 487 / 4500 (10.82%) 14.70 23.35 12.47
+HAND, +SYNC 2008/06/07 2779 / 4500 (61.76%) 662 / 2779 (23.82%) 478 / 662 (72.21%) 478 / 4500 (10.62%) 14.36 23.02 12.21
2008/06/05 2779 / 4500 (61.76%) 549 / 2779 (19.76%) 383 / 549 (69.76%) 383 / 4500 (8.51%) 14.33 22.34 10.68

Tanaka Corpus Test Data

Changes Date Parse Coverage Transfer Coverage Generation Coverage End-to-End Coverage NEVA Oracle F1
+CHANGE 2009/09/22 604@27 21502 3195 / 3994 (79.99%) 1286 / 3195 (40.25%) 667 / 1286 (51.87%) 667 / 3994 (16.70%) 13.40 19.03 14.87
+CHANGE 2009/09/22 604@26 29895 3573 / 4450 (80.29%) 1441 / 3573 (40.33%) 742 / 1441 (51.49%) 742 / 4450 (16.67%) 13.23 18.83 14.75
+CHANGE 2009/09/22 604@26 15939 3609 / 4499 (80.22%) 1460 / 3609 (40.45%) 755 / 1460 (51.71%) 755 / 4499 (16.78%) 13.30 19.00 14.84
+CHANGE 2009/09/21 604@24 2265 3612 / 4499 (80.28%) 1462 / 3612 (40.48%) 756 / 1462 (51.71%) 756 / 4499 (16.80%) 13.30 19.01 14.85
+CHANGE 2009/09/21 604@23 20118 3614 / 4500 (80.31%) 1463 / 3614 (40.48%) 756 / 1463 (51.67%) 756 / 4500 (16.80%) 13.30 19.01 14.85
+CHANGE 2009/09/21 604@22 29341 3613 / 4499 (80.31%) 1461 / 3613 (40.44%) 755 / 1461 (51.68%) 755 / 4499 (16.78%) 13.30 19.00 14.84
+CHANGE 2009/09/20 604@20 21485 3612 / 4498 (80.30%) 1458 / 3612 (40.37%) 751 / 1458 (51.51%) 751 / 4498 (16.70%) 13.29 18.84 14.80
+CHANGE 2009/09/20 604@19 10091 3613 / 4500 (80.29%) 1459 / 3613 (40.38%) 755 / 1459 (51.75%) 755 / 4500 (16.78%) 13.30 19.00 14.84
+CHANGE 2009/09/18 585@16 19596 3613 / 4499 (80.31%) 1436 / 3613 (39.75%) 726 / 1436 (50.56%) 726 / 4499 (16.14%) 12.96 18.85 14.38
+CHANGE 2009/09/18 585@14 29791 3611 / 4498 (80.28%) 1249 / 3611 (34.59%) 650 / 1249 (52.04%) 650 / 4498 (14.45%) 14.00 19.71 14.22
+CHANGE 2009/09/18 565@17 22899 3613 / 4499 (80.31%) 1256 / 3613 (34.76%) 647 / 1256 (51.51%) 647 / 4499 (14.38%) 14.02 19.84 14.20
+CHANGE 2009/09/17 585@9 10748 3611 / 4499 (80.26%) 1448 / 3611 (40.10%) 750 / 1448 (51.80%) 750 / 4499 (16.67%) 13.24 19.08 14.76
+CHANGE 2009/09/17 585@8 19904 3614 / 4500 (80.31%) 1439 / 3614 (39.82%) 728 / 1439 (50.59%) 728 / 4500 (16.18%) 12.93 18.85 14.37
+CHANGE 2009/09/17 585@7 5196 3611 / 4498 (80.28%) 1461 / 3611 (40.46%) 755 / 1461 (51.68%) 755 / 4498 (16.79%) 13.30 19.00 14.84
+CHANGE 2009/09/17 585@13 22095 3612 / 4498 (80.30%) 1439 / 3612 (39.84%) 745 / 1439 (51.77%) 745 / 4498 (16.56%) 13.17 19.13 14.67
+CHANGE 2009/09/17 585@12 11178 3612 / 4498 (80.30%) 1425 / 3612 (39.45%) 725 / 1425 (50.88%) 725 / 4498 (16.12%) 12.89 18.69 14.33
+CHANGE 2009/09/16 585@6 16496 3611 / 4497 (80.30%) 1470 / 3611 (40.71%) 748 / 1470 (50.88%) 748 / 4497 (16.63%) 13.16 18.88 14.69
HG585 2009/09/15 3435 / 4287 (80.13%) 1328 / 3435 (38.66%) 683 / 1328 (51.43%) 683 / 4287 (15.93%) 12.88 19.00 14.24
HG584 2009/09/15 3613 / 4499 (80.31%) 1462 / 3613 (40.46%) 755 / 1462 (51.64%) 755 / 4499 (16.78%) 13.30 19.00 14.84
+MAX3, +T10000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK 2009/09/14 3613 / 4499 (80.31%) 1462 / 3613 (40.46%) 755 / 1462 (51.64%) 755 / 4499 (16.78%) 13.30 19.00 14.84
+MAX3, +T5000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK 2009/09/14 3612 / 4498 (80.30%) 1457 / 3612 (40.34%) 754 / 1457 (51.75%) 754 / 4498 (16.76%) 13.31 19.02 14.84
+MAX10, +T10000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK 2009/09/14 3608 / 4495 (80.27%) 1411 / 3608 (39.11%) 736 / 1411 (52.16%) 736 / 4495 (16.37%) 13.19 19.07 14.61
+MAX10, +T10000, -LONG_WAIT, +MOSES_2.05 2009/09/11 3612 / 4498 (80.30%) 1415 / 3612 (39.17%) 721 / 1415 (50.95%) 721 / 4498 (16.03%) 12.88 18.75 14.28
+FEEDBACK_SORTING 2009/09/04 3613 / 4500 (80.29%) 1336 / 3613 (36.98%) 672 / 1336 (50.30%) 672 / 4500 (14.93%) 13.53 19.40 14.19
+BURN_IN 2009/08/31 3613 / 4499 (80.31%) 1467 / 3613 (40.60%) 748 / 1467 (50.99%) 748 / 4499 (16.63%) 13.31 18.94 14.78
+FEEDBACK, +MAX3, +T5000, -LONG_WAIT, +MOSES_2.05 2009/08/29 3609 / 4495 (80.29%) 1476 / 3609 (40.90%) 750 / 1476 (50.81%) 750 / 4495 (16.69%) 13.57 18.98 14.96
+FEEDBACK, +MAX10, +T10000, +LONG_WAIT, +MOSES_2.05 2009/08/27 3612 / 4497 (80.32%) 1453 / 3612 (40.23%) 735 / 1453 (50.58%) 735 / 4497 (16.34%) 13.24 19.06 14.63
+MAX10, +T10000, +LONG_WAIT, +MOSES_2.05 2009/08/26-2 3607 / 4493 (80.28%) 1383 / 3607 (38.34%) 712 / 1383 (51.48%) 712 / 4493 (15.85%) 12.89 18.79 14.22
+FEEDBACK, +MAX10, +T5000, +LONG_WAIT, +MOSES_2.05 2009/08/26 3290 / 4088 (80.48%) 1303 / 3290 (39.60%) 665 / 1303 (51.04%) 665 / 4088 (16.27%) 13.25 19.00 14.60
+JACY_BARCELONA 2009/08/24 3613 / 4499 (80.31%) 1458 / 3613 (40.35%) 746 / 1458 (51.17%) 746 / 4499 (16.58%) 13.39 18.94 14.81
-BLACKLIST 2009/08/23-2 3613 / 4499 (80.31%) 1426 / 3613 (39.47%) 724 / 1426 (50.77%) 724 / 4499 (16.09%) 13.27 19.15 14.54
+T5000, -LONG_WAIT 2009/08/23 3614 / 4500 (80.31%) 1317 / 3614 (36.44%) 667 / 1317 (50.65%) 667 / 4500 (14.82%) 14.00 19.52 14.40
+MAX3 2009/08/19 3373 / 4199 (80.33%) 1246 / 3373 (36.94%) 648 / 1246 (52.01%) 648 / 4199 (15.43%) 13.81 19.68 14.57
+LONG_WAIT 2009-0818-3 3613 / 4499 (80.31%) 1311 / 3613 (36.29%) 672 / 1311 (51.26%) 672 / 4499 (14.94%) 13.45 19.42 14.15
+MOSES_2.5, +MAX5, -LONG_WAIT 2009/08/18-2 3614 / 4500 (80.31%) 1320 / 3614 (36.52%) 674 / 1320 (51.06%) 674 / 4500 (14.98%) 13.43 19.47 14.16
-MOSES, +BLACKLIST 2009/08/18 3613 / 4499 (80.31%) 1322 / 3613 (36.59%) 682 / 1322 (51.59%) 682 / 4499 (15.16%) 13.65 19.54 14.37
+LEMMAS 2009/08/17 3564 / 4440 (80.27%) 1364 / 3564 (38.27%) 710 / 1364 (52.05%) 710 / 4440 (15.99%) 13.31 19.48 14.53
+T5000, +MAX3, +COMBINED_MTRS, +MOSES_2.25 2009/08/16 3613 / 4499 (80.31%) 1389 / 3613 (38.44%) 724 / 1389 (52.12%) 724 / 4499 (16.09%) 13.66 19.81 14.78
+T10000, +MAX3, +COMBINED_MTRS 2009/08/12 3611 / 4497 (80.30%) 1455 / 3611 (40.29%) 746 / 1455 (51.27%) 746 / 4497 (16.59%) 11.99 18.21 13.92
+T10000, +COMBINED_MTRS, +LONG_WAIT 2009/08/03 3607 / 4493 (80.28%) 1066 / 3607 (29.55%) 548 / 1066 (51.41%) 548 / 4493 (12.20%) 13.97 20.18 13.02
+BARCELONA_TEST3 2009/07/13 3614 / 4500 (80.31%) 1679 / 3614 (46.46%) 863 / 1679 (51.40%) 863 / 4500 (19.18%) 14.12 20.11 16.27
+BARCELONA_TEST2 2009/07/07 3614 / 4500 (80.31%) 1680 / 3614 (46.49%) 863 / 1680 (51.37%) 863 / 4500 (19.18%) 14.25 20.04 16.35
+BARCELONA_TEST 2009/07/03 3613 / 4499 (80.31%) 1677 / 3613 (46.42%) 849 / 1677 (50.63%) 849 / 4499 (18.87%) 14.09 20.18 16.13
+TC0906, -FEEDBACK2 2009/06/06 3611 / 4499 (80.26%) 1654 / 3611 (45.80%) 832 / 1654 (50.30%) 832 / 4499 (18.49%) 14.50 20.64 16.25
+FEEDBACK2 2009/06/04 3626 / 4500 (80.58%) 1649 / 3626 (45.48%) 834 / 1649 (50.58%) 834 / 4500 (18.53%) 14.68 20.68 16.38
+JACY_EXP, -FEEDBACK 2009/06/03 3628 / 4500 (80.62%) 1642 / 3628 (45.26%) 827 / 1642 (50.37%) 827 / 4500 (18.38%) 14.51 20.75 16.21
+FEEDBACK 2009/06/02 3613 / 4500 (80.29%) 1638 / 3613 (45.34%) 838 / 1638 (51.16%) 838 / 4500 (18.62%) 14.97 20.87 16.60
+ PN_FIX, +JACY_SVN 2009/05/28 3611 / 4498 (80.28%) 1653 / 3611 (45.78%) 841 / 1653 (50.88%) 841 / 4498 (18.70%) 14.69 20.37 16.46
+LIKE 2009/05/23 3616 / 4500 (80.36%) 1653 / 3616 (45.71%) 841 / 1653 (50.88%) 841 / 4500 (18.69%) 14.69 20.37 16.45
+ZERO_FIX 2009/05/19 3616 / 4500 (80.36%) 1602 / 3616 (44.30%) 837 / 1602 (52.25%) 837 / 4500 (18.60%) 14.70 20.37 16.42
-TERG, -TERGDICT, +0902, + 0902DICT 2009/05/18 3616 / 4500 (80.36%) 1602 / 3616 (44.30%) 634 / 1602 (39.58%) 634 / 4500 (14.09%) 15.07 21.77 14.56
+TERG, +TERGDICT 2009/05/16 3616 / 4500 (80.36%) 1549 / 3616 (42.84%) 559 / 1549 (36.09%) 559 / 4500 (12.42%) 14.04 19.76 13.18
+SVN 2009/05/14 3616 / 4500 (80.36%) 1555 / 3616 (43.00%) 655 / 1555 (42.12%) 655 / 4500 (14.56%) 16.08 22.44 15.28
+SEMI, +T5000, +BOOT, +RELATIONAL_N2, +TAME, +NAKEREBA, +MOSES, +CVS_HEAD 2011/11 3500 / 4499 (77.80%) 1658 / 3500 (47.37%) 871 / 1658 (52.53%) 871 / 4499 (19.36%) 15.04 22.07 16.93
+CVS, +LMC 2008/10/30 3506 / 4500 (77.91%) 983 / 3506 (28.04%) 662 / 983 (67.34%) 662 / 4500 (14.71%) 13.66 24.01 14.17
+NO_AMBIGUOUS_V3, +GEN2, +GEDICT2, +RELATIONAL_N 2008/10/25 3505 / 4499 (77.91%) 1011 / 3505 (28.84%) 677 / 1011 (66.96%) 677 / 4499 (15.05%) 12.97 23.99 13.93
-UNKNOWN, +PN, +NO_AMBIGUOUS_V 2008/10/19 3491 / 4500 (77.58%) 921 / 3491 (26.38%) 623 / 921 (67.64%) 623 / 4500 (13.84%) 13.32 23.66 13.58
+UNKNOWN 2008/10/13 3509 / 4499 (78.00%) 1255 / 3509 (35.77%) 805 / 1255 (64.14%) 805 / 4499 (17.89%) 11.66 20.68 14.12
+WA 2008/06/27 3490 / 4500 (77.56%) 865 / 3490 (24.79%) 595 / 865 (68.79%) 595 / 4500 (13.22%) 13.03 22.70 13.12
+VN3 2008/06/21 3487 / 4500 (77.49%) 859 / 3487 (24.63%) 578 / 859 (67.29%) 578 / 4500 (12.84%) 12.01 22.67 12.41
+PET, +PMODEL 2008/06/18 3486 / 4499 (77.48%) 885 / 3486 (25.39%) 584 / 885 (65.99%) 584 / 4499 (12.98%) 12.01 22.01 12.48
-NO_SPURIOUS 2008/06/17 2939 / 4500 (65.31%) 757 / 2939 (25.76%) 507 / 757 (66.97%) 507 / 4500 (11.27%) 12.04 21.36 11.64
+GEN, +GEDICT, +VN, +STRICT_N, +STRICT_V, +NO_SPURIOUS 2008/06/16 3005 / 4500 (66.78%) 804 / 3005 (26.76%) 514 / 804 (63.93%) 514 / 4500 (11.42%) 12.03 21.35 11.72
+IF/THEN 2008/06/09 2764 / 4500 (61.42%) 720 / 2764 (26.05%) 494 / 720 (68.61%) 494 / 4500 (10.98%) 12.00 21.34 11.47
+HAND, +SYNC 2008/06/07 2764 / 4500 (61.42%) 698 / 2764 (25.25%) 488 / 698 (69.91%) 488 / 4500 (10.84%) 12.00 21.00 11.39
+PRO 2008/06/05 2764 / 4500 (61.42%) 572 / 2764 (20.69%) 398 / 572 (69.58%) 398 / 4500 (8.84%) 12.02 21.03 10.19

System Changes Legend

BARCELONA_TEST Test of jaen for Barcelona LOGON release
JACY_EXP Francis' experimental uncommitted Jacy fixes
FEEDBACK feedback cleaning round #1 (feedback clean won ;_;)
JACY_SVN re-checked out Jacy SVN
PN_FIX make pn-omtr inherit from pn-mtr instead of proper_noun-mtr
LIKE fixes for すること/のが好き/嫌い, some modification to idioms ("thank you", "ok")
ZERO_FIX FCB's fix to zero pronoun translation
0TGT allow rules where the target word doesn't appear in tc
0902DICT rebuilt EDICT rules with 0902 TERG mrs rels
0902 reverted to 0902 tip TERG
TERGDICT rebuilt EDICT rules with TERG mrs rels
TERG switched to trunk ERG in ja2en.lisp
SVN updated to the logon svn branch
CVS_HEAD updated the logon branch with cvs update -r HEAD
MOSES added rules acquired from Moses' phrase table
NAKEREBA added rules for nakerea/nai+to naranai/ikenai
TAME added rules for ため and its many variations
RELATIONAL_N2 fixed relational noun rules and added rules for embedding relational noun args
BOOT updated bootstrapped rules from Tanaka Corpus and SLT06 data
T5000 set transfer edges to 5,000
SEMI relaxed semi-test to (setf *semi-test* '(:predicates :properties))
LMD set language weights to 0.2/0.2/0.1/0.3/0.0/0.2
GIZA added giza++ alignment models for jaen
LMC set language model weights in .tsdbrc to 0.2/0.2/0.1/0.5
CVS updated LOGON CVS on 2008/10/28
T10000 increased transfer edges to 10,000
RELATIONAL_N added a clean-up rule to insert ARG1s into relational nouns (_n_of,_n_for,_n_to,_n_about)
NO_AMBIGUOUS_V3 added なう and にる to ambiguous verb blacklist
GEDICT2 updated mtrs for Tanaka corpus generic entries
NO_AMBIGUOUS_V2 updated ambiguous verb form blacklist and added to Jacy SVN
GEN2 generic entries updated for new Tanaka corpus
NEW_TANAKA cleaned up version of Tanaka corpus
NO_AMBIGUOUS_V removed ambiguous verb entries from tanaka corpus unknown lexical entries. this includes potential forms of verbs like 買える for 買う and kana verb entries that cause particle ambiguity like でる, にる, はる, etc.
PN Proper noun rules like シェクスピア→Shakespeare
UNKNOWN fixes to unknown word handling: reinstating common noun -> proper noun coersion, stripping off _rel, etc.
DISCOURSE changes to the grammar adding _d_ discourse rels for wa, mo, etc.
IN_DOMAIN include up to 3 translations where src and tgt are both in the training data
WA fixes to wa and topicalization in grammar
VN3 apply VN handling rules after dictionary rules
VN2 added a FLAG.SUBSUMES check for args to VN handling
PMODEL parsing model trained on Tanaka corpus
PET switched to PET for parsing Japanese
CONJ fixed conjunction_mtr definition
NO_SPURIOUS reduced spurious ambiguity by removing _ga_5_rel,_iru_6_rel,_iru_7_rel from Japanese grammar
STRICT_V added checks to make sure ARG0 is of type e for verb rules
STRICT_N added checks to make sure ARG0 is of type x for noun rules
VN convert verbal nouns to nouns by stripping nominalization_rel and converting ARG0 to x in preprocessing
GEDICT added translation rules from Edict for generic entries
GEN added generic entries to Japanese grammar for unknown words in Tanaka corpus
IF/THEN fixed handling of ~eba/~tara/~nara -> if/then
SYNC synchronized rel names in grammar and handcrafted rules
HAND added handcrafted lexical items
PRO fixed pronoun handling

Subgoal (2008-10)

  • 900 sentences (20%): 253 to get from

    • New lexicon (Edict+EDR, tanaka corpus P(E|J))
    • Empathy verbs
    • Time expressions
    • analyse by length
      • reorder training set
Clone this wiki locally