agree : another grammar engineering environment

A project to develop a system for processing DELPH-IN style TDL grammars within the .NET and Mono managed runtime environments.

UW CLMA thesis work of GlennSlayden, supervised by EmilyBender. Thesis reader StephanOepen.
Spencer Rarrick helps with MaxEnt parse selection, generation, and testing with Jacy.

agree is open-source software released under the MIT license.

For SVN access and project information, please visit https://agree.codeplex.com.

Abstract

A new parser and generator for the DELPH-IN joint reference formalism

The Common Language Infrastructure (CLI, ECMA-335) is a modern standard for architecting extensible, platform-independent software. Well-known implementations include Mono and Microsoft's .NET Framework. These managed runtime environments enjoy the robust support of actively developed software, and incorporate decades of research and best practice experience in systems architecture and developer productivity. An aim of this project is to explore the suitability of this platform for a new suite of tools for processing DELPH-IN style TDL (Krieger and Schäfer 1994) grammars.

Single-core performance having reached physical limits, the focus in high-performance computing now concerns multi-core processors. Accordingly, another goal of this project is to examine opportunities for concurrent programming in the processing of precision analytical grammars. This effort has led to the development of a low-lock, concurrent parse/generate chart which exploits new deep operating system support for scalable, fine-grained concurrency.

TFS Representation

Informed by Pereira (1985), Wroblewski (1987), Tomabechi (1991), and more recent work by van Lohuizen, a research goal is to investigate the efficiency--under the intensive demands that are characteristic of unification grammars--of array TFS storage, a representation that departs from the traditional allocation-per-node DAG approach. This internal DAG representation may minimize garbage collector activity in managed programming environments.

Capitalizing on the observation that, in DELPH-IN grammars, the appropriate features for every type are invariant, this approach stores each TFS's nodes contiguously, indexed according to a hash that incorporates its hosting feature. Combined with careful application of C# value types, this approach has demonstrated excellent parsing and generation performance in the .NET managed runtime environment.


Project Status

The system supports both parsing and generation. It has been tested with the English Resource Grammar (Flickinger 2002), Jacy grammar of Japanese (Siegel and Bender 2002), and other Matrix grammars (Bender et al. 2002), notably a medium-small grammar of Thai (unpublished). Exact derivations have been validated versus PET for Redwoods (Oepen et al. 2000) corpora.

As the complexity of the agree tool suite has increased in support of diverse analysis tasks, application maintainability, configuration, and management have become key issues. In response, in 2012 the system was rearchitected as a set of loosely-coupled functors with which the linguist-user composes ad-hoc linguistic processing pipelines via XAML markup. Binding together functors yields a customized, reusable composite functor which, when applied to an input, produces a complex and/or nested sequence of live monads (non-reusable workers). Each monad represents a concurrent push-based processing stream implementing some part of the fanout or fork/join signature required by the originally composed task.

Type System

Unification Engine

Morpholexical Analysis

DELPH-IN Compliant Preprocessing

Concurrent Parse-Generate Chart

Maxent Parse Selection

MRS

Concurrent Tactical Realization (Generation)

Application Notes

WPF Client Application

http://www.agree-grammar.com/11-screenshots/galleria/20130514.png http://www.agree-grammar.com/11-screenshots/galleria/tfs-dag.png

The project is also investigating novel visualization technologies for grammar engineering and pedagogical use. Shown below is a three-dimensional view of an authored constraint definition superimposed over its expanded feature structure. Because the structures are aligned, there are spaces in the definition to account for constraints supplied by other structures. Multi-layer views can show the contributions from each definition which make a feature structure well-formed.

The view can be manipulated on any axis to freely explore relationships amongst the authored rules. Although the WPF environment makes such renderings easy to implement, an ongoing challenge is to find a visual presentation with the simplicity and elegance requisite for truly facilitating linguistic insight.

http://www.agree-grammar.com/webshare/20111206.png

References

Adolphs, Peter; Oepen, Stephan; Callmeier, Ulrich; Crysmann, Berthold; Flickinger, Dan & Kiefer, Bernd. 2008. Some Fine Points of Hybrid Natural Language Parsing. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC-2008). Marrakech, Morocco.

Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The grammar matrix: an open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. in Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15 (COLING-GEE '02), Vol. 15. Association for Computational Linguistics, Stroudsburg, PA, USA, 1-7.

Ulrich Callmeier. 2001. Efficient Parsing with Large-Scale Unification Grammars. MA Thesis, Universität des Saarlandes - Fachrichtung Informatik.

Ulrich Callmeier. 2000. PET: a platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1): 99-107.

Dan Flickinger. 2002. On building a more efficient grammar by exploiting types. Natural Language Engineering, 6, pp 15-28.

Hans-Ulrich Krieger and Ulrich Schäfer. 1994. TDL: a type description language for constraint-based grammars. in Proceedings of the 15th conference on Computational linguistics - Volume 2 (COLING '94), Vol. 2. Association for Computational Linguistics, Stroudsburg, PA, USA, 893-899.

Bernd Kiefer, Hans-Ulrich Krieger, John Carroll, and Rob Malouf. 1999. A Bag of Useful Techniques for Efficient and robust Parsing. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics. 473-480

Robert Malouf, John Carroll, and Ann Copestake. 2000. Robert Malouf, John Carroll, and Ann Copestake. 2000. Efficient feature structure operations without compilation. Natural Language Engineering, 1(1):1-18.

Stephan Oepen and John Carroll. 2000. Parser engineering and performance profiling. Natural Language Engingeering 6, 1 (March 2000), 81-97.

Stephan Oepen, Kristina Toutanova, Stuart Shieber, Christopher Manning, Dan Flickinger, and Thorsten Brants. 2002. The LinGO Redwoods Treebank: Motivation and preliminary applications. in Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pages 1253–7, Taipei, Taiwan.

Fernando C. N. Pereira. 1985. A structure-sharing representation for unification-based grammar formalisms. In Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics. Chicago, IL, 8-12 July 1985, pages 137-144.

Melanie Siegel and Emily M. Bender (2002): Efficient Deep Processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Coling 2002 Post-Conference Workshop. Taipei, Taiwan.

Hideto Tomabechi. 1991. Quasi-destructive graph unification. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA.

Hideto Tomabechi. 1992. Quasi-destructive graph unifications with structure-sharing. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-92), Nantes, France.

Hideto Tomabechi. 1995. Design of efficient unification for natural language. Journal of Natural Language Processing, 2(2):23-58.

Marcel P. van Lohuizen. 1999. Parallel processing of natural language parsers. In PARCO '99.

Marcel P. van Lohuizen. 2000. Exploiting parallelism in unification-based parsing. In Proceedings of the Sixth International Workshop on Parsing Technologies (IWPT 2000), Trento, Italy.

Marcel P. van Lohuizen. 2000. Memory-efficient and Thread-safe Quasi-Destructive Graph Unification. In Proceedings of the 38th Meeting of the Association for Computational Linguistics, Hong Kong, China, 2000.

Marcel P. van Lohuizen. 2001. A generic approach to parallel chart parsing with an application to LinGO. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, France.

David A. Wroblewski. 1987. Nondestructive graph unification. In Proceedings of the 6th National Conference on Artificial Intelligence (AAAI-87), 582-589. Morgan Kaufmann.

Past Images

http://www.agree-grammar.com/webshare/20111125-chart-debugger.png http://www.agree-grammar.com/webshare/20111125-chart-debugger-2up.png

http://www.agree-grammar.com/webshare/20110214.png

http://www.agree-grammar.com/webshare/20110131.png

http://www.agree-grammar.com/webshare/20101019.png

http://www.agree-grammar.com/webshare/20100918.png

AgreeTop (last edited 2014-04-21 18:53:31 by GlennSlayden)

(The DELPH-IN infrastructure is hosted at the University of Oslo)