Tanaka Corpus Development Data (rtc000, rtc001, rtc002)

Changes

Jaen Rev.

Date

Parse Coverage

Transfer Coverage

Generation Coverage

End-to-End Coverage

NEVA

Oracle

F1

settings as below, minus bad surface rules (aggressive version)

9388

2011/08/11

3726/4500 (82.8%)

2267/3726 (60.8%)

1247/2267 (55.0%)

1247/4500 (27.7%)

19.0

25.0

22.5

sg_relratio = 0.2, mwe_relratio = 1, sg_thresh = 0.15, mwe_thresh = 0.25

9388

2011/08/11

3726/4500 (82.8%)

2281/3726 (61.2%)

1252/2281 (54.9%)

1252/4500 (27.8%)

19.0

25.0

22.6

Passive

9388

2011/06/01

3726/4500 (82.8%)

2259/3726 (60.6%)

1240/2259 (54.9%)

1240/4500 (27.6%)

19.0

25.0

22.5

Relation prob > 0.1

9336

2011/05/31

3725/4499 (82.8%)

2295/3725 (61.6%)

1276/2295 (55.6%)

1276/4499 (28.4%)

18.7

24.3

22.5

MT version (more debugging + zero pronouns)

9343

2011/05/24

3726/4500 (82.8%)

2262/3726 (60.7%)

1246/2262 (55.1%)

1246/4500 (27.7%)

18.3

24.0

22.1

debugged alternation bug

9336

2011/05/20

3726/4500 (82.8%)

2079/3726 (55.8%)

1121/2079 (53.9%)

1121/4500 (24.9%)

18.0

23.7

20.9

+ unknown words / lower threshold on omtr 0.4>0.2

9336

2011/05/18

3726/4500 (82.8%)

1984/3726 (53.2%)

1086/1984 (54.7%)

1086/4500 (24.1%)

18.3

24.0

20.8

+ phrase table single rules (all MWEs)

9336

2011/05/16

3726/4500 (82.8%)

1933/3726 (51.9%)

1055/1933 (54.6%)

1055/4500 (23.4%)

18.3

23.7

20.6

+ phrase table single rules (some missing MWEs)

9319

2011/05/12

3726/4500 (82.8%)

1894/3726 (50.8%)

1037/1894 (54.8%)

1037/4500 (23.0%)

18.0

24.0

20.2

- phrase table single rules (some missing MWEs)

9319

2011/05/12

3725/4499 (82.8%)

1735/3725 (46.6%)

960/1735 (55.3%)

960/4499 (21.3%)

18.7

24.0

19.9

Selected rules for each profile

9319

2011/04/29

3726/4500 (82.8%)

1772/3726 (47.6%)

976/1772 (55.1%)

976/4500 (21.7%)

18.3

24.0

19.9

Selected transfer rules

9308

2011/04/28

3726/4500 (82.8%)

1772/3726 (47.6%)

973/1772 (54.9%)

973/4500 (21.6%)

18.3

24.0

19.8

All anymalign, small transfer model

9296

2011/04/21

3726/4500 (82.8%)

1788/3726 (48.0%)

983/1788 (55.0%)

983/4500 (21.8%)

18.3

23.3

19.9

New prepositions, - pcorp omtrs, + half anymalign

2011/04/20

3726/4500 (82.8%)

1779/3726 (47.7%)

955/1779 (53.7%)

955/4500 (21.2%)

18.3

23.7

19.7

Updated Edict, + MWEs

9140

2011/04/06

3725/4500 (82.8%)

1644/3725 (44.1%)

906/1644 (55.1%)

906/4500 (20.1%)

18.3

24.0

19.2

Updated Edict, - MWEs

9140

2011/04/04

3726/4500 (82.8%)

1600/3726 (42.9%)

874/1600 (54.6%)

874/4500 (19.4%)

18.0

23.7

18.7

Tanaka Corpus Test Data (rtc003, rtc004, rtc005)

Changes

Date

Parse Coverage

Transfer Coverage

Generation Coverage

End-to-End Coverage

NEVA

Oracle

F1

ACL v. baseline

2011/06/09

3614/4500 (80.3%)

2117/3614 (58.6%)

1163/2117 (54.9%)

1163/4500 (25.8%)

17.7

24.0

21.0

MT v. + complex MWE rules (relratio = 1, threshold = 0.35)

2011/06/07

3613/4499 (80.3%)

2151/3613 (59.5%)

1172/2151 (54.5%)

1172/4499 (26.1%)

17.7

23.7

21.1

MT v. + complex MWE rules (relratio = 0.8, threshold = 0.25)

2011/06/07

3614/4500 (80.3%)

2154/3614 (59.6%)

1169/2154 (54.3%)

1169/4500 (26.0%)

18.0

24.0

21.3

MT v. + complex MWE rules (relratio = 0.2, threshold = 0.15)

2011/06/04

3614/4500 (80.3%)

2157/3614 (59.7%)

1158/2157 (53.7%)

1158/4500 (25.7%)

17.7

23.7

21.0

MT version (more debugging + zero pronouns)

2011/05/24

3612/4498 (80.3%)

2172/3612 (60.1%)

1175/2172 (54.1%)

1175/4498 (26.1%)

17.7

24.0

21.1

debugged alternation bug

2011/05/20

3613/4499 (80.3%)

1983/3613 (54.9%)

1053/1983 (53.1%)

1053/4499 (23.4%)

17.7

24.0

20.1

+ unknown words / lower threshold on omtr 0.4>0.2

2011/05/18

3614/4500 (80.3%)

1861/3614 (51.5%)

1001/1861 (53.8%)

1001/4500 (22.2%)

18.0

24.3

19.9

+ phrase table single rules (all MWEs)

2011/05/16

3614/4500 (80.3%)

1854/3614 (51.3%)

986/1854 (53.2%)

986/4500 (21.9%)

17.7

23.7

19.6

+ n and adj MWEs from Moses and Anymalign

2011/05/08

3614/4500 (80.3%)

1704/3614 (47.1%)

900/1704 (52.8%)

900/4500 (20.0%)

18.0

24.0

18.9

+ PP MWEs from Moses and Anymalign

2011/05/08

3614/4500 (80.3%)

1659/3614 (45.9%)

877/1659 (52.9%)

877/4500 (19.5%)

18.0

24.0

18.7

+ MWEs (All MWEs from Moses and Anymalign)

2011/05/06

3614/4500 (80.3%)

1729/3614 (47.8%)

906/1729 (52.4%)

906/4500 (20.1%)

18.0

24.3

19.0

+ Verb MWEs from Moses and Anymalign

2011/05/06

3612/4498 (80.3%)

1688/3612 (46.7%)

885/1688 (52.4%)

885/4498 (19.7%)

17.7

23.3

18.6

- MWEs (Baseline for MWE paper)

2011/05/04

3613/4499 (80.3%)

1647/3613 (45.6%)

870/1647 (52.8%)

870/4499 (19.3%)

17.7

23.7

18.5

Corrected lexicon, + MWEs

2011/04/06

3614/4500 (80.3%)

1576/3614 (43.6%)

852/1576 (54.1%)

852/4500 (18.9%)

18.3

23.7

18.6

Corrected lexicon, - MWEs

2011/03/31

3614/4500 (80.3%)

1505/3614 (41.6%)

824/1505 (54.8%)

824/4500 (18.3%)

17.7

23.3

18.0

New Edict (with particle bug) + Auto MWEs

2011/03/29

3614/4500 (80.3%)

1526/3614 (42.2%)

830/1526 (54.4%)

830/4500 (18.4%)

18.0

23.7

18.2

Auto MWEs

2011/03/28

3612/4498 (80.3%)

1507/3612 (41.7%)

816/1507 (54.1%)

816/4498 (18.1%)

18.3

23.7

18.2

New proper and relative nouns

2011/03/20

3614/4500 (80.3%)

1503/3614 (41.6%)

815/1503 (54.2%)

815/4500 (18.1%)

17.7

24.0

17.9

Old cheap + tc + end2end

2011/03/14

3614/4500 (80.3%)

1487/3614 (41.1%)

785/1487 (52.8%)

785/4500 (17.4%)

17.3

23.3

17.4

Old cheap + rtc000 + end2end

2011/03/14

3614/4500 (80.3%)

1487/3614 (41.1%)

785/1487 (52.8%)

785/4500 (17.4%)

16.0

22.3

16.7

New cheap + end to end model

2011/03/10

3612/4498 (80.3%)

1252/3612 (34.7%)

664/1252 (53.0%)

664/4498 (14.8%)

18.0

24.3

16.2

New auto rules with parallel corpus rules

2011/03/08

3614/4500 (80.3%)

1487/3614 (41.1%)

785/1487 (52.8%)

785/4500 (17.4%)

15.7

22.0

16.5

New auto rules (no parallel corpus rules)

2011/03/07

3614/4500 (80.3%)

1374/3614 (38.0%)

739/1374 (53.8%)

739/4500 (16.4%)

15.3

21.7

15.9

New ERG

2011/03/07

3613/4499 (80.3%)

1374/3613 (38.0%)

730/1374 (53.1%)

730/4499 (16.2%)

15.7

22.0

15.9

Wikipedia rules (first batch)

2011/02/17

3614/4500 (80.3%)

1465/3614 (40.5%)

775/1465 (52.9%)

775/4500 (17.2%)

16.3

23.0

16.8

Debugging

2011/02/17

3613/4499 (80.3%)

1463/3613 (40.5%)

776/1463 (53.0%)

776/4499 (17.2%)

16.3

23.0

16.8

Modifications

2011/01/27

3613/4499 (80.3%)

1422/3613 (39.4%)

754/1422 (53.0%)

754/4499 (16.8%)

16.3

22.7

16.5

Multi-word expressions

2011/01/06

3614/4500 (80.3%)

1506/3614 (41.7%)

787/1506 (52.3%)

787/4500 (17.5%)

16.0

22.3

16.7

Parallel corpus rules (new batch)

2010/12/28

3613/4499 (80.3%)

1517/3613 (42.0%)

793/1517 (52.3%)

793/4499 (17.6%)

16.3

22.7

17.0

Parallel corpus rules (fixed possessives)

2010/12/15

3614/4500 (80.3%)

1499/3614 (41.5%)

789/1499 (52.6%)

789/4500 (17.5%)

16.3

22.3

16.9

New Auto Transfer (fixed possessives)

2010/12/14

3614/4500 (80.3%)

1375/3614 (38.0%)

739/1375 (53.7%)

739/4500 (16.4%)

15.3

21.7

15.9

Parallel corpus rules (wnjpn)

2010/12/06

3614/4500 (80.3%)

1518/3614 (42.0%)

719/1518 (47.4%)

719/4500 (16.0%)

0.0

0.0

0.0

New Auto Transfer

2010/12/02

3614/4500 (80.3%)

1370/3614 (37.9%)

670/1370 (48.9%)

670/4500 (14.9%)

0.0

0.0

0.0

TC Gen Model

2010/10/26

3613/4499 (80.3%)

1383/3613 (38.3%)

732/1383 (52.9%)

732/4499 (16.3%)

15.7

22.0

16.0

2010/09/23

3614/4500 (80.3%)

1384/3614 (38.3%)

731/1384 (52.8%)

731/4500 (16.2%)

14.3

20.7

15.2

Tanaka Corpus Development Data

Changes

Date

HG Rev

PPID

Parse Coverage

Transfer Coverage

Generation Coverage

End-to-End Coverage

NEVA

Oracle

F1

:wait=7200, :quantum=1200

2009/09/22

604@27

21502

3722 / 4499 (82.73%)

1496 / 3722 (40.19%)

809 / 1496 (54.08%)

809 / 4499 (17.98%)

14.90

20.70

16.30

:wait=3600, :quantum=600

2009/09/22

604@26

29895

3722 / 4499 (82.73%)

1496 / 3722 (40.19%)

810 / 1496 (54.14%)

810 / 4499 (18.00%)

14.94

20.74

16.33

:wait=3600, :quantum=600

2009/09/22

604@26

15939

3723 / 4500 (82.73%)

1497 / 3723 (40.21%)

811 / 1497 (54.18%)

811 / 4500 (18.02%)

14.94

20.74

16.34

T20000

2009/09/21

604@24

2265

3722 / 4499 (82.73%)

1496 / 3722 (40.19%)

811 / 1496 (54.21%)

811 / 4499 (18.03%)

14.94

20.74

16.34

T15000

2009/09/21

604@23

20118

3721 / 4499 (82.71%)

1495 / 3721 (40.18%)

811 / 1495 (54.25%)

811 / 4499 (18.03%)

14.94

20.74

16.34

T10000

2009/09/21

604@22

29341

3723 / 4500 (82.73%)

1497 / 3723 (40.21%)

810 / 1497 (54.11%)

810 / 4500 (18.00%)

14.94

20.74

16.33

+MRS_MODEL

2009/09/21

604@21

30836

3477 / 4200 (82.79%)

1407 / 3477 (40.47%)

765 / 1407 (54.37%)

765 / 4200 (18.21%)

14.94

20.72

16.41

-MRS_MODEL

2009/09/20

604@20

21485

3722 / 4499 (82.73%)

1496 / 3722 (40.19%)

787 / 1496 (52.61%)

787 / 4499 (17.49%)

14.92

20.41

16.11

MAX3, FB_CLEAN

2009/09/20

604@19

10091

3723 / 4500 (82.73%)

1497 / 3723 (40.21%)

810 / 1497 (54.11%)

810 / 4500 (18.00%)

14.94

20.74

16.33

+CHANGE

2009/09/18

585@16

19596

3722 / 4499 (82.73%)

1488 / 3722 (39.98%)

795 / 1488 (53.43%)

795 / 4499 (17.67%)

14.78

20.65

16.10

+CHANGE

2009/09/18

585@14

29791

3720 / 4500 (82.67%)

1310 / 3720 (35.22%)

705 / 1310 (53.82%)

705 / 4500 (15.67%)

15.50

21.36

15.58

+CHANGE

2009/09/18

565@17

22899

3723 / 4500 (82.73%)

1324 / 3723 (35.56%)

709 / 1324 (53.55%)

709 / 4500 (15.76%)

15.38

21.11

15.56

+CHANGE

2009/09/17

585@9

10748

3722 / 4499 (82.73%)

1491 / 3722 (40.06%)

797 / 1491 (53.45%)

797 / 4499 (17.72%)

15.05

20.75

16.28

+CHANGE

2009/09/17

585@8

19904

3722 / 4499 (82.73%)

1490 / 3722 (40.03%)

795 / 1490 (53.36%)

795 / 4499 (17.67%)

14.74

20.62

16.07

+CHANGE

2009/09/17

585@7

5196

3722 / 4499 (82.73%)

1496 / 3722 (40.19%)

811 / 1496 (54.21%)

811 / 4499 (18.03%)

14.94

20.74

16.34

+CHANGE

2009/09/17

585@13

22095

3721 / 4498 (82.73%)

1477 / 3721 (39.69%)

798 / 1477 (54.03%)

798 / 4498 (17.74%)

14.99

20.60

16.25

+CHANGE

2009/09/17

585@12

11178

3719 / 4498 (82.68%)

1471 / 3719 (39.55%)

797 / 1471 (54.18%)

797 / 4498 (17.72%)

14.68

20.40

16.06

+CHANGE

2009/09/16

585@6

16496

3723 / 4500 (82.73%)

1514 / 3723 (40.67%)

817 / 1514 (53.96%)

817 / 4500 (18.16%)

14.68

20.59

16.23

HG585

2009/09/15

585

8772

3715 / 4492 (82.70%)

1450 / 3715 (39.03%)

781 / 1450 (53.86%)

781 / 4492 (17.39%)

14.76

20.60

15.96

+MAX5, +T10000, -LONG_WAIT, +MOSES_2.05, -FEEDBACK

2009/09/15

3687 / 4457 (82.72%)

1436 / 3687 (38.95%)

776 / 1436 (54.04%)

776 / 4457 (17.41%)

14.76

20.66

15.98

+MAX3, +T10000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK

2009/09/14

3722 / 4499 (82.73%)

1496 / 3722 (40.19%)

809 / 1496 (54.08%)

809 / 4499 (17.98%)

14.94

20.73

16.32

+MAX3, +T5000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK

2009/09/14

3723 / 4500 (82.73%)

1497 / 3723 (40.21%)

811 / 1497 (54.18%)

811 / 4500 (18.02%)

14.94

20.74

16.34

+FEEDBACK, +MAX10, +T10000, -LONG_WAIT, +MOSES_2.05

2009/09/14

3721 / 4498 (82.73%)

1470 / 3721 (39.51%)

795 / 1470 (54.08%)

795 / 4498 (17.67%)

15.02

20.64

16.24

+MAX10, +T10000, -LONG_WAIT, +MOSES_2.05

2009/09/11

3721 / 4498 (82.73%)

1457 / 3721 (39.16%)

792 / 1457 (54.36%)

792 / 4498 (17.61%)

14.69

20.52

16.01

+FEEDBACK_SORTING

2009/09/03

3721 / 4498 (82.73%)

1371 / 3721 (36.84%)

741 / 1371 (54.05%)

741 / 4498 (16.47%)

14.60

20.39

15.48

+BURN_IN

2009/08/31

3719 / 4495 (82.74%)

1508 / 3719 (40.55%)

814 / 1508 (53.98%)

814 / 4495 (18.11%)

14.74

20.61

16.25

+FEEDBACK, +MAX3, +T5000, -LONG_WAIT, +MOSES_2.05

2009/08/29

3721 / 4497 (82.74%)

1518 / 3721 (40.80%)

826 / 1518 (54.41%)

826 / 4497 (18.37%)

14.75

20.32

16.36

+FEEDBACK, +MAX10, +T10000, +LONG_WAIT, +MOSES_2.05

2009/08/27

3654 / 4416 (82.74%)

1461 / 3654 (39.98%)

793 / 1461 (54.28%)

793 / 4416 (17.96%)

14.72

20.44

16.18

+MAX10, +T10000, +LONG_WAIT, +MOSES_2.05

2009/08/26-2

3722 / 4499 (82.73%)

1445 / 3722 (38.82%)

788 / 1445 (54.53%)

788 / 4499 (17.52%)

14.71

20.55

15.99

+FEEDBACK, +MAX10, +T5000, +LONG_WAIT, +MOSES_2.05

2009/08/26

3527 / 4256 (82.87%)

1414 / 3527 (40.09%)

769 / 1414 (54.38%)

769 / 4256 (18.07%)

14.64

20.34

16.17

+JACY_BARCELONA

2009/08/24

3594 / 4349 (82.64%)

1460 / 3594 (40.62%)

790 / 1460 (54.11%)

790 / 4349 (18.17%)

14.55

20.35

16.16

-BLACKLIST

2009/08/23-2

3722 / 4499 (82.73%)

1477 / 3722 (39.68%)

794 / 1477 (53.76%)

794 / 4499 (17.65%)

14.86

20.74

16.13

+T5000, -LONG_WAIT

2009/08/23

3716 / 4492 (82.72%)

1348 / 3716 (36.28%)

744 / 1348 (55.19%)

744 / 4492 (16.56%)

14.83

21.07

15.65

+MAX3

2009/08/19

3722 / 4499 (82.73%)

1364 / 3722 (36.65%)

765 / 1364 (56.09%)

765 / 4499 (17.00%)

15.11

21.21

16.00

+LONG_WAIT

2009/08/18-3

3722 / 4499 (82.73%)

1345 / 3722 (36.14%)

748 / 1345 (55.61%)

748 / 4499 (16.63%)

15.11

21.06

15.83

+MOSES_2.5, +MAX5, -LONG_WAIT

2009/08/18-2

3723 / 4500 (82.73%)

1356 / 3723 (36.42%)

752 / 1356 (55.46%)

752 / 4500 (16.71%)

15.05

21.19

15.84

-MOSES, +BLACKLIST

2009/08/18

3721 / 4498 (82.73%)

1354 / 3721 (36.39%)

756 / 1354 (55.83%)

756 / 4498 (16.81%)

14.99

21.35

15.85

+LEMMAS

2009/08/17

3720 / 4497 (82.72%)

1441 / 3720 (38.74%)

808 / 1441 (56.07%)

808 / 4497 (17.97%)

14.92

21.22

16.30

+T5000, +MOSES_2.25

2009/08/16

3721 / 4498 (82.73%)

1438 / 3721 (38.65%)

806 / 1438 (56.05%)

806 / 4498 (17.92%)

15.03

21.16

16.35

+MAX3

2009/08/12

3721 / 4498 (82.73%)

1543 / 3721 (41.47%)

816 / 1543 (52.88%)

816 / 4498 (18.14%)

13.88

19.33

15.72

-MRS_MODEL

2009/08/09

3493 / 4270 (81.80%)

1536 / 3493 (43.97%)

826 / 1536 (53.78%)

826 / 4270 (19.34%)

15.10

20.77

16.96

+T10000, +COMBINED_MTRS, +LONG_WAIT

2009/08/03

3721 / 4497 (82.74%)

1172 / 3721 (31.50%)

601 / 1172 (51.28%)

601 / 4497 (13.36%)

14.55

20.97

13.93

+BARCELONA_TEST3

2009/07/13

3723 / 4500 (82.73%)

1735 / 3723 (46.60%)

943 / 1735 (54.35%)

943 / 4500 (20.96%)

14.81

20.53

17.35

+BARCELONA_TEST2

2009/07/07

3722 / 4499 (82.73%)

1734 / 3722 (46.59%)

942 / 1734 (54.33%)

942 / 4499 (20.94%)

14.85

20.63

17.38

+BARCELONA_TEST

2009/07/03

3723 / 4500 (82.73%)

1733 / 3723 (46.55%)

939 / 1733 (54.18%)

939 / 4500 (20.87%)

14.81

20.71

17.32

+TC0906, -FEEDBACK2

2009/06/06

3721 / 4500 (82.69%)

1705 / 3721 (45.82%)

939 / 1705 (55.07%)

939 / 4500 (20.87%)

14.90

20.61

17.38

+FEEDBACK2

2009/06/04

3730 / 4500 (82.89%)

1698 / 3730 (45.52%)

927 / 1698 (54.59%)

927 / 4500 (20.60%)

14.77

20.53

17.21

+JACY_EXP, -FEEDBACK

2009/06/03

3730 / 4500 (82.89%)

1693 / 3730 (45.39%)

926 / 1693 (54.70%)

926 / 4500 (20.58%)

14.86

20.63

17.26

+FEEDBACK

2009/06/02

3717 / 4500 (82.60%)

1696 / 3717 (45.63%)

918 / 1696 (54.13%)

918 / 4500 (20.40%)

14.79

20.08

17.15

+PN_FIX, +JACY_SVN

2009/05/28

3717 / 4500 (82.60%)

1716 / 3717 (46.17%)

941 / 1716 (54.84%)

941 / 4500 (20.91%)

14.97

20.64

17.45

+LIKE

2009/05/21

3721 / 4500 (82.69%)

1719 / 3721 (46.20%)

944 / 1719 (54.92%)

944 / 4500 (20.98%)

14.97

20.64

17.47

+ZERO_FIX

2009/05/19

3721 / 4500 (82.69%)

1638 / 3721 (44.02%)

936 / 1638 (57.14%)

936 / 4500 (20.80%)

14.97

19.97

17.41

-TERG, -TERGDICT, +0902, +0902DICT

2009/05/18

3721 / 4500 (82.69%)

1631 / 3721 (43.83%)

721 / 1631 (44.21%)

721 / 4500 (16.02%)

16.56

21.56

16.29

+TERG, +TERGDICT

2009/05/16

3721 / 4500 (82.69%)

1565 / 3721 (42.06%)

624 / 1565 (39.87%)

624 / 4500 (13.87%)

15.62

21.62

14.69

+SVN

2009/05/14

3721 / 4500 (82.69%)

1600 / 3721 (43.00%)

723 / 1600 (45.19%)

723 / 4500 (16.07%)

16.26

22.63

16.16

+RELATIONAL_N2, +TAME, +NAKEREBA, +MOSES, +CVS_HEAD

2008/11/11

3599 / 4500 (79.98%)

1711 / 3599 (47.54%)

937 / 1711 (54.76%)

937 / 4500 (20.82%)

14.67

20.67

17.21

+T5000, +BOOT

2008/11/08

3541 / 4409 (80.31%)

1667 / 3541 (47.08%)

903 / 1667 (54.17%)

903 / 4409 (20.48%)

14.32

23.00

16.86

+UNKNOWN

2008/11/04

3606 / 4499 (80.15%)

1492 / 3606 (41.38%)

852 / 1492 (57.10%)

852 / 4499 (18.94%)

14.33

23.64

16.31

+LMC, +SEMI, -GIZA

2008/11/02

3607 / 4500 (80.16%)

1351 / 3607 (37.45%)

801 / 1351 (59.29%)

801 / 4500 (17.80%)

14.33

23.65

15.88

-LMD

2008/11/01

3606 / 4499 (80.15%)

988 / 3606 (27.40%)

681 / 988 (68.93%)

681 / 4499 (15.14%)

13.02

24.64

14.00

-LMC, +GIZA, +LMD

2008/10/31

3607 / 4500 (80.16%)

989 / 3607 (27.42%)

682 / 989 (68.96%)

682 / 4500 (15.16%)

14.37

24.64

14.75

-T10000, +CVS, +LMC

2008/10/30

3606 / 4499 (80.15%)

988 / 3606 (27.40%)

682 / 988 (69.03%)

682 / 4499 (15.16%)

14.69

24.64

14.92

+T10000

2008/10/26

3605 / 4499 (80.13%)

1095 / 3605 (30.37%)

724 / 1095 (66.12%)

724 / 4499 (16.09%)

13.67

24.29

14.78

+NO_AMBIGUOUS_V3, +RELATIONAL_N

2008/10/24

3607 / 4500 (80.16%)

1010 / 3607 (28.00%)

692 / 1010 (68.51%)

692 / 4500 (15.38%)

13.66

24.64

14.47

+GEN2, +GEDICT2, + NO_AMBIGUOUS_V2

2008/10/21

3625 / 4500 (80.56%)

872 / 3625 (24.06%)

605 / 872 (69.38%)

605 / 4500 (13.44%)

13.67

23.63

13.56

+NEW_TANAKA

2008/10/20

3484 / 4500 (77.42%)

882 / 3484 (25.32%)

618 / 882 (70.07%)

618 / 4500 (13.73%)

13.66

23.64

13.70

-UNKNOWN, +PN, +NO_AMBIGUOUS_V

2008/10/13

3550 / 4500 (78.89%)

911 / 3550 (25.66%)

636 / 911 (69.81%)

636 / 4500 (14.13%)

13.67

23.63

13.90

+UNKNOWN, -IN_DOMAIN

2008/10/11

3564 / 4499 (79.22%)

1288 / 3564 (36.14%)

822 / 1288 (63.82%)

822 / 4499 (18.27%)

12.36

21.66

14.75

+DISCOURSE, CORRECT_GRAMMAR

2008/09/28

3566 / 4500 (79.24%)

972 / 3566 (27.26%)

651 / 972 (66.98%)

651 / 4500 (14.47%)

12.67

21.97

13.51

+DISCOURSE

2008/09/25

3567 / 4500 (79.27%)

967 / 3567 (27.11%)

647 / 967 (66.91%)

647 / 4500 (14.38%)

12.67

22.32

13.47

+IN_DOMAIN

2008/07/22

3551 / 4500 (78.91%)

861 / 3551 (24.25%)

617 / 861 (71.66%)

617 / 4500 (13.71%)

13.65

23.28

13.68

+WA

2008/06/26

3551 / 4500 (78.91%)

866 / 3551 (24.39%)

625 / 866 (72.17%)

625 / 4500 (13.89%)

13.65

22.94

13.77

+VN3

2008/06/19

3543 / 4500 (78.73%)

861 / 3543 (24.30%)

610 / 861 (70.85%)

610 / 4500 (13.56%)

13.00

22.97

13.27

-STRICT_N, -STRICT_V, -VN, +VN2

2008/06/18

3543 / 4500 (78.73%)

857 / 3543 (24.19%)

598 / 857 (69.78%)

598 / 4500 (13.29%)

12.69

22.66

12.98

+PET, +PMODEL

2008/06/17

3543 / 4500 (78.73%)

882 / 3543 (24.89%)

612 / 882 (69.39%)

612 / 4500 (13.60%)

12.69

22.66

13.13

-NO_SPURIOUS

2008/06/16

3013 / 4499 (66.97%)

730 / 3013 (24.23%)

528 / 730 (72.33%)

528 / 4499 (11.74%)

12.68

22.32

12.19

-CONJ

2008/06/16

3068 / 4500 (68.18%)

742 / 3068 (24.19%)

510 / 742 (68.73%)

510 / 4500 (11.33%)

14.04

22.64

12.54

+NO_SPURIOUS, +CONJ

2008/06/15

3068 / 4500 (68.18%)

777 / 3068 (25.33%)

528 / 777 (67.95%)

528 / 4500 (11.73%)

13.36

22.32

12.49

+GEDICT

2008/06/15

3014 / 4500 (66.98%)

722 / 3014 (23.95%)

520 / 722 (72.02%)

520 / 4500 (11.56%)

13.37

22.31

12.40

+VN, +STRICT_N, +STRICT_V

2008/06/14

3014 / 4500 (66.98%)

703 / 3014 (23.32%)

500 / 703 (71.12%)

500 / 4500 (11.11%)

13.69

22.64

12.27

+GEN

2008/06/14

3014 / 4500 (66.98%)

691 / 3014 (22.93%)

491 / 691 (71.06%)

491 / 4500 (10.91%)

14.04

23.00

12.28

+IF/THEN

2008/06/09

2779 / 4500 (61.76%)

679 / 2779 (24.43%)

487 / 679 (71.72%)

487 / 4500 (10.82%)

14.70

23.35

12.47

+HAND, +SYNC

2008/06/07

2779 / 4500 (61.76%)

662 / 2779 (23.82%)

478 / 662 (72.21%)

478 / 4500 (10.62%)

14.36

23.02

12.21

2008/06/05

2779 / 4500 (61.76%)

549 / 2779 (19.76%)

383 / 549 (69.76%)

383 / 4500 (8.51%)

14.33

22.34

10.68

Tanaka Corpus Test Data

Changes

Date

Parse Coverage

Transfer Coverage

Generation Coverage

End-to-End Coverage

NEVA

Oracle

F1

+CHANGE

2009/09/22

604@27

21502

3195 / 3994 (79.99%)

1286 / 3195 (40.25%)

667 / 1286 (51.87%)

667 / 3994 (16.70%)

13.40

19.03

14.87

+CHANGE

2009/09/22

604@26

29895

3573 / 4450 (80.29%)

1441 / 3573 (40.33%)

742 / 1441 (51.49%)

742 / 4450 (16.67%)

13.23

18.83

14.75

+CHANGE

2009/09/22

604@26

15939

3609 / 4499 (80.22%)

1460 / 3609 (40.45%)

755 / 1460 (51.71%)

755 / 4499 (16.78%)

13.30

19.00

14.84

+CHANGE

2009/09/21

604@24

2265

3612 / 4499 (80.28%)

1462 / 3612 (40.48%)

756 / 1462 (51.71%)

756 / 4499 (16.80%)

13.30

19.01

14.85

+CHANGE

2009/09/21

604@23

20118

3614 / 4500 (80.31%)

1463 / 3614 (40.48%)

756 / 1463 (51.67%)

756 / 4500 (16.80%)

13.30

19.01

14.85

+CHANGE

2009/09/21

604@22

29341

3613 / 4499 (80.31%)

1461 / 3613 (40.44%)

755 / 1461 (51.68%)

755 / 4499 (16.78%)

13.30

19.00

14.84

+CHANGE

2009/09/20

604@20

21485

3612 / 4498 (80.30%)

1458 / 3612 (40.37%)

751 / 1458 (51.51%)

751 / 4498 (16.70%)

13.29

18.84

14.80

+CHANGE

2009/09/20

604@19

10091

3613 / 4500 (80.29%)

1459 / 3613 (40.38%)

755 / 1459 (51.75%)

755 / 4500 (16.78%)

13.30

19.00

14.84

+CHANGE

2009/09/18

585@16

19596

3613 / 4499 (80.31%)

1436 / 3613 (39.75%)

726 / 1436 (50.56%)

726 / 4499 (16.14%)

12.96

18.85

14.38

+CHANGE

2009/09/18

585@14

29791

3611 / 4498 (80.28%)

1249 / 3611 (34.59%)

650 / 1249 (52.04%)

650 / 4498 (14.45%)

14.00

19.71

14.22

+CHANGE

2009/09/18

565@17

22899

3613 / 4499 (80.31%)

1256 / 3613 (34.76%)

647 / 1256 (51.51%)

647 / 4499 (14.38%)

14.02

19.84

14.20

+CHANGE

2009/09/17

585@9

10748

3611 / 4499 (80.26%)

1448 / 3611 (40.10%)

750 / 1448 (51.80%)

750 / 4499 (16.67%)

13.24

19.08

14.76

+CHANGE

2009/09/17

585@8

19904

3614 / 4500 (80.31%)

1439 / 3614 (39.82%)

728 / 1439 (50.59%)

728 / 4500 (16.18%)

12.93

18.85

14.37

+CHANGE

2009/09/17

585@7

5196

3611 / 4498 (80.28%)

1461 / 3611 (40.46%)

755 / 1461 (51.68%)

755 / 4498 (16.79%)

13.30

19.00

14.84

+CHANGE

2009/09/17

585@13

22095

3612 / 4498 (80.30%)

1439 / 3612 (39.84%)

745 / 1439 (51.77%)

745 / 4498 (16.56%)

13.17

19.13

14.67

+CHANGE

2009/09/17

585@12

11178

3612 / 4498 (80.30%)

1425 / 3612 (39.45%)

725 / 1425 (50.88%)

725 / 4498 (16.12%)

12.89

18.69

14.33

+CHANGE

2009/09/16

585@6

16496

3611 / 4497 (80.30%)

1470 / 3611 (40.71%)

748 / 1470 (50.88%)

748 / 4497 (16.63%)

13.16

18.88

14.69

HG585

2009/09/15

3435 / 4287 (80.13%)

1328 / 3435 (38.66%)

683 / 1328 (51.43%)

683 / 4287 (15.93%)

12.88

19.00

14.24

HG584

2009/09/15

3613 / 4499 (80.31%)

1462 / 3613 (40.46%)

755 / 1462 (51.64%)

755 / 4499 (16.78%)

13.30

19.00

14.84

+MAX3, +T10000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK

2009/09/14

3613 / 4499 (80.31%)

1462 / 3613 (40.46%)

755 / 1462 (51.64%)

755 / 4499 (16.78%)

13.30

19.00

14.84

+MAX3, +T5000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK

2009/09/14

3612 / 4498 (80.30%)

1457 / 3612 (40.34%)

754 / 1457 (51.75%)

754 / 4498 (16.76%)

13.31

19.02

14.84

+MAX10, +T10000, -LONG_WAIT, +MOSES_2.05, +FEEDBACK

2009/09/14

3608 / 4495 (80.27%)

1411 / 3608 (39.11%)

736 / 1411 (52.16%)

736 / 4495 (16.37%)

13.19

19.07

14.61

+MAX10, +T10000, -LONG_WAIT, +MOSES_2.05

2009/09/11

3612 / 4498 (80.30%)

1415 / 3612 (39.17%)

721 / 1415 (50.95%)

721 / 4498 (16.03%)

12.88

18.75

14.28

+FEEDBACK_SORTING

2009/09/04

3613 / 4500 (80.29%)

1336 / 3613 (36.98%)

672 / 1336 (50.30%)

672 / 4500 (14.93%)

13.53

19.40

14.19

+BURN_IN

2009/08/31

3613 / 4499 (80.31%)

1467 / 3613 (40.60%)

748 / 1467 (50.99%)

748 / 4499 (16.63%)

13.31

18.94

14.78

+FEEDBACK, +MAX3, +T5000, -LONG_WAIT, +MOSES_2.05

2009/08/29

3609 / 4495 (80.29%)

1476 / 3609 (40.90%)

750 / 1476 (50.81%)

750 / 4495 (16.69%)

13.57

18.98

14.96

+FEEDBACK, +MAX10, +T10000, +LONG_WAIT, +MOSES_2.05

2009/08/27

3612 / 4497 (80.32%)

1453 / 3612 (40.23%)

735 / 1453 (50.58%)

735 / 4497 (16.34%)

13.24

19.06

14.63

+MAX10, +T10000, +LONG_WAIT, +MOSES_2.05

2009/08/26-2

3607 / 4493 (80.28%)

1383 / 3607 (38.34%)

712 / 1383 (51.48%)

712 / 4493 (15.85%)

12.89

18.79

14.22

+FEEDBACK, +MAX10, +T5000, +LONG_WAIT, +MOSES_2.05

2009/08/26

3290 / 4088 (80.48%)

1303 / 3290 (39.60%)

665 / 1303 (51.04%)

665 / 4088 (16.27%)

13.25

19.00

14.60

+JACY_BARCELONA

2009/08/24

3613 / 4499 (80.31%)

1458 / 3613 (40.35%)

746 / 1458 (51.17%)

746 / 4499 (16.58%)

13.39

18.94

14.81

-BLACKLIST

2009/08/23-2

3613 / 4499 (80.31%)

1426 / 3613 (39.47%)

724 / 1426 (50.77%)

724 / 4499 (16.09%)

13.27

19.15

14.54

+T5000, -LONG_WAIT

2009/08/23

3614 / 4500 (80.31%)

1317 / 3614 (36.44%)

667 / 1317 (50.65%)

667 / 4500 (14.82%)

14.00

19.52

14.40

+MAX3

2009/08/19

3373 / 4199 (80.33%)

1246 / 3373 (36.94%)

648 / 1246 (52.01%)

648 / 4199 (15.43%)

13.81

19.68

14.57

+LONG_WAIT

2009-0818-3

3613 / 4499 (80.31%)

1311 / 3613 (36.29%)

672 / 1311 (51.26%)

672 / 4499 (14.94%)

13.45

19.42

14.15

+MOSES_2.5, +MAX5, -LONG_WAIT

2009/08/18-2

3614 / 4500 (80.31%)

1320 / 3614 (36.52%)

674 / 1320 (51.06%)

674 / 4500 (14.98%)

13.43

19.47

14.16

-MOSES, +BLACKLIST

2009/08/18

3613 / 4499 (80.31%)

1322 / 3613 (36.59%)

682 / 1322 (51.59%)

682 / 4499 (15.16%)

13.65

19.54

14.37

+LEMMAS

2009/08/17

3564 / 4440 (80.27%)

1364 / 3564 (38.27%)

710 / 1364 (52.05%)

710 / 4440 (15.99%)

13.31

19.48

14.53

+T5000, +MAX3, +COMBINED_MTRS, +MOSES_2.25

2009/08/16

3613 / 4499 (80.31%)

1389 / 3613 (38.44%)

724 / 1389 (52.12%)

724 / 4499 (16.09%)

13.66

19.81

14.78

+T10000, +MAX3, +COMBINED_MTRS

2009/08/12

3611 / 4497 (80.30%)

1455 / 3611 (40.29%)

746 / 1455 (51.27%)

746 / 4497 (16.59%)

11.99

18.21

13.92

+T10000, +COMBINED_MTRS, +LONG_WAIT

2009/08/03

3607 / 4493 (80.28%)

1066 / 3607 (29.55%)

548 / 1066 (51.41%)

548 / 4493 (12.20%)

13.97

20.18

13.02

+BARCELONA_TEST3

2009/07/13

3614 / 4500 (80.31%)

1679 / 3614 (46.46%)

863 / 1679 (51.40%)

863 / 4500 (19.18%)

14.12

20.11

16.27

+BARCELONA_TEST2

2009/07/07

3614 / 4500 (80.31%)

1680 / 3614 (46.49%)

863 / 1680 (51.37%)

863 / 4500 (19.18%)

14.25

20.04

16.35

+BARCELONA_TEST

2009/07/03

3613 / 4499 (80.31%)

1677 / 3613 (46.42%)

849 / 1677 (50.63%)

849 / 4499 (18.87%)

14.09

20.18

16.13

+TC0906, -FEEDBACK2

2009/06/06

3611 / 4499 (80.26%)

1654 / 3611 (45.80%)

832 / 1654 (50.30%)

832 / 4499 (18.49%)

14.50

20.64

16.25

+FEEDBACK2

2009/06/04

3626 / 4500 (80.58%)

1649 / 3626 (45.48%)

834 / 1649 (50.58%)

834 / 4500 (18.53%)

14.68

20.68

16.38

+JACY_EXP, -FEEDBACK

2009/06/03

3628 / 4500 (80.62%)

1642 / 3628 (45.26%)

827 / 1642 (50.37%)

827 / 4500 (18.38%)

14.51

20.75

16.21

+FEEDBACK

2009/06/02

3613 / 4500 (80.29%)

1638 / 3613 (45.34%)

838 / 1638 (51.16%)

838 / 4500 (18.62%)

14.97

20.87

16.60

+ PN_FIX, +JACY_SVN

2009/05/28

3611 / 4498 (80.28%)

1653 / 3611 (45.78%)

841 / 1653 (50.88%)

841 / 4498 (18.70%)

14.69

20.37

16.46

+LIKE

2009/05/23

3616 / 4500 (80.36%)

1653 / 3616 (45.71%)

841 / 1653 (50.88%)

841 / 4500 (18.69%)

14.69

20.37

16.45

+ZERO_FIX

2009/05/19

3616 / 4500 (80.36%)

1602 / 3616 (44.30%)

837 / 1602 (52.25%)

837 / 4500 (18.60%)

14.70

20.37

16.42

-TERG, -TERGDICT, +0902, + 0902DICT

2009/05/18

3616 / 4500 (80.36%)

1602 / 3616 (44.30%)

634 / 1602 (39.58%)

634 / 4500 (14.09%)

15.07

21.77

14.56

+TERG, +TERGDICT

2009/05/16

3616 / 4500 (80.36%)

1549 / 3616 (42.84%)

559 / 1549 (36.09%)

559 / 4500 (12.42%)

14.04

19.76

13.18

+SVN

2009/05/14

3616 / 4500 (80.36%)

1555 / 3616 (43.00%)

655 / 1555 (42.12%)

655 / 4500 (14.56%)

16.08

22.44

15.28

+SEMI, +T5000, +BOOT, +RELATIONAL_N2, +TAME, +NAKEREBA, +MOSES, +CVS_HEAD

2011/11

3500 / 4499 (77.80%)

1658 / 3500 (47.37%)

871 / 1658 (52.53%)

871 / 4499 (19.36%)

15.04

22.07

16.93

+CVS, +LMC

2008/10/30

3506 / 4500 (77.91%)

983 / 3506 (28.04%)

662 / 983 (67.34%)

662 / 4500 (14.71%)

13.66

24.01

14.17

+NO_AMBIGUOUS_V3, +GEN2, +GEDICT2, +RELATIONAL_N

2008/10/25

3505 / 4499 (77.91%)

1011 / 3505 (28.84%)

677 / 1011 (66.96%)

677 / 4499 (15.05%)

12.97

23.99

13.93

-UNKNOWN, +PN, +NO_AMBIGUOUS_V

2008/10/19

3491 / 4500 (77.58%)

921 / 3491 (26.38%)

623 / 921 (67.64%)

623 / 4500 (13.84%)

13.32

23.66

13.58

+UNKNOWN

2008/10/13

3509 / 4499 (78.00%)

1255 / 3509 (35.77%)

805 / 1255 (64.14%)

805 / 4499 (17.89%)

11.66

20.68

14.12

+WA

2008/06/27

3490 / 4500 (77.56%)

865 / 3490 (24.79%)

595 / 865 (68.79%)

595 / 4500 (13.22%)

13.03

22.70

13.12

+VN3

2008/06/21

3487 / 4500 (77.49%)

859 / 3487 (24.63%)

578 / 859 (67.29%)

578 / 4500 (12.84%)

12.01

22.67

12.41

+PET, +PMODEL

2008/06/18

3486 / 4499 (77.48%)

885 / 3486 (25.39%)

584 / 885 (65.99%)

584 / 4499 (12.98%)

12.01

22.01

12.48

-NO_SPURIOUS

2008/06/17

2939 / 4500 (65.31%)

757 / 2939 (25.76%)

507 / 757 (66.97%)

507 / 4500 (11.27%)

12.04

21.36

11.64

+GEN, +GEDICT, +VN, +STRICT_N, +STRICT_V, +NO_SPURIOUS

2008/06/16

3005 / 4500 (66.78%)

804 / 3005 (26.76%)

514 / 804 (63.93%)

514 / 4500 (11.42%)

12.03

21.35

11.72

+IF/THEN

2008/06/09

2764 / 4500 (61.42%)

720 / 2764 (26.05%)

494 / 720 (68.61%)

494 / 4500 (10.98%)

12.00

21.34

11.47

+HAND, +SYNC

2008/06/07

2764 / 4500 (61.42%)

698 / 2764 (25.25%)

488 / 698 (69.91%)

488 / 4500 (10.84%)

12.00

21.00

11.39

+PRO

2008/06/05

2764 / 4500 (61.42%)

572 / 2764 (20.69%)

398 / 572 (69.58%)

398 / 4500 (8.84%)

12.02

21.03

10.19

System Changes Legend

BARCELONA_TEST

Test of jaen for Barcelona LOGON release

JACY_EXP

Francis' experimental uncommitted Jacy fixes

FEEDBACK

feedback cleaning round #1 (feedback clean won ;_;)

JACY_SVN

re-checked out Jacy SVN

PN_FIX

make pn-omtr inherit from pn-mtr instead of proper_noun-mtr

LIKE

fixes for すること/のが好き/嫌い, some modification to idioms ("thank you", "ok")

ZERO_FIX

FCB's fix to zero pronoun translation

0TGT

allow rules where the target word doesn't appear in tc

0902DICT

rebuilt EDICT rules with 0902 TERG mrs rels

0902

reverted to 0902 tip TERG

TERGDICT

rebuilt EDICT rules with TERG mrs rels

TERG

switched to trunk ERG in ja2en.lisp

SVN

updated to the logon svn branch

CVS_HEAD

updated the logon branch with cvs update -r HEAD

MOSES

added rules acquired from Moses' phrase table

NAKEREBA

added rules for nakerea/nai+to naranai/ikenai

TAME

added rules for ため and its many variations

RELATIONAL_N2

fixed relational noun rules and added rules for embedding relational noun args

BOOT

updated bootstrapped rules from Tanaka Corpus and SLT06 data

T5000

set transfer edges to 5,000

SEMI

relaxed semi-test to (setf *semi-test* '(:predicates :properties))

LMD

set language weights to 0.2/0.2/0.1/0.3/0.0/0.2

GIZA

added giza++ alignment models for jaen

LMC

set language model weights in .tsdbrc to 0.2/0.2/0.1/0.5

CVS

updated LOGON CVS on 2008/10/28

T10000

increased transfer edges to 10,000

RELATIONAL_N

added a clean-up rule to insert ARG1s into relational nouns (_n_of,_n_for,_n_to,_n_about)

NO_AMBIGUOUS_V3

added なう and にる to ambiguous verb blacklist

GEDICT2

updated mtrs for Tanaka corpus generic entries

NO_AMBIGUOUS_V2

updated ambiguous verb form blacklist and added to Jacy SVN

GEN2

generic entries updated for new Tanaka corpus

NEW_TANAKA

cleaned up version of Tanaka corpus

NO_AMBIGUOUS_V

removed ambiguous verb entries from tanaka corpus unknown lexical entries. this includes potential forms of verbs like 買える for 買う and kana verb entries that cause particle ambiguity like でる, にる, はる, etc.

PN

Proper noun rules like シェクスピア→Shakespeare

UNKNOWN

fixes to unknown word handling: reinstating common noun -> proper noun coersion, stripping off _rel, etc.

DISCOURSE

changes to the grammar adding _d_ discourse rels for wa, mo, etc.

IN_DOMAIN

include up to 3 translations where src and tgt are both in the training data

WA

fixes to wa and topicalization in grammar

VN3

apply VN handling rules after dictionary rules

VN2

added a FLAG.SUBSUMES check for args to VN handling

PMODEL

parsing model trained on Tanaka corpus

PET

switched to PET for parsing Japanese

CONJ

fixed conjunction_mtr definition

NO_SPURIOUS

reduced spurious ambiguity by removing _ga_5_rel,_iru_6_rel,_iru_7_rel from Japanese grammar

STRICT_V

added checks to make sure ARG0 is of type e for verb rules

STRICT_N

added checks to make sure ARG0 is of type x for noun rules

VN

convert verbal nouns to nouns by stripping nominalization_rel and converting ARG0 to x in preprocessing

GEDICT

added translation rules from Edict for generic entries

GEN

added generic entries to Japanese grammar for unknown words in Tanaka corpus

IF/THEN

fixed handling of ~eba/~tara/~nara -> if/then

SYNC

synchronized rel names in grammar and handcrafted rules

HAND

added handcrafted lexical items

PRO

fixed pronoun handling

Subgoal (2008-10)

MtJaenTanaka (last edited 2011-10-08 21:12:15 by localhost)

(The DELPH-IN infrastructure is hosted at the University of Oslo)