Discussion: Advanced Pre-Processing

Moderator: Ulrich Schäfer; Scribe: Peter Adolphs

Objective

Besides parsing 'clean' standard corpora for treebanking etc., there seems to be a growing interest and need to use HPSG parsing with DELPH-IN tools for larger-scale processing of unseen real-world text such as the content of scientific papers, Wikipedia, and newspaper articles. There are various ways to improve parsing coverage by pre-processing, e.g.

Some of these methods have been implemented as part of LKB, PET, Heart of Gold, or project-specifically and can be characterized as absolute prerequisites (still not always optimally solved or fitting to the domain), others have the status of 'good idea but never done' or go into the direction of application- or domain-oriented pre-processing.

The aim of this discussion is to collect and discuss the efforts that have been made by member of the DELPH-IN community recently, maybe prospectively even try to unify them. There seem to be more good-practice solutions than have been published or made available as downloadable tools. Participants are encouraged to briefly report on their needs and solutions (maybe even with a slide). The slot partly overlaps with the chart mapping tutorial by Peter Adolphs and the robustness techniques tutorial by Yi Zhang. Moreover, I will present some of the efforts conducted at our site in my presentation on HyLaP.

Notes

BarcelonaPreprocessing (last edited 2011-10-08 21:12:11 by localhost)

(The DELPH-IN infrastructure is hosted at the University of Oslo)