Skip to content

JacyHinoki

FrancisBond edited this page Jul 14, 2020 · 3 revisions

Jacy is being developed in cooperation with the Hinoki Treebank.

Corpora

Name ID Full Name # Sentences # Words Comments
mrs 0 MRS Test Suite 136 ???
tc 100,000 Tanaka Corpus 150,341 1,756,825 Includes English Translations, 10 profiles (6-15) treebanked

These treebanks are in the jacy/tsdb/gold directory. They may lag behind the most recent version of the grammar.

If you want silver data, parsing the rest of the Tanaka Corpus is a good place to start.

Clone this wiki locally