File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1502_intro.xml
Size: 3,137 bytes
Last Modified: 2025-10-06 14:01:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1502"> <Title>The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-Linguistically Consistent Broad-Coverage Precision Grammars</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The past decade has seen the development of wide-coverage implemented grammars representing deep linguistic analysis of several languages in several frameworks, including Head-Driven Phrase Structure Grammar (HPSG), Lexical-Functional Grammar (LFG), and Lexicalized Tree Adjoining Grammar (LTAG). In HPSG, the most extensive grammars are those of English (Flickinger, 2000), German (M&quot;uller & Kasper, 2000), and Japanese (Siegel, 2000; Siegel & Bender, 2002).</Paragraph> <Paragraph position="1"> Despite being couched in the same general framework and in some cases being written in the same formalism and consequently being compatible with the same parsing and generation software, these grammars were developed more or less independently of each other. They each represent between 5 and 15 person years of research efforts, and comprise 35-70,000 lines of code. Unfortunately, most of that research is undocumented and the accumulated analyses, best practices for grammar engineering, and tricks of the trade are only available through painstaking inspection of the grammars and/or consultation with their authors. This lack of documentation holds across frameworks, with certain notable exceptions, including Alshawi (1992), M&quot;uller (1999), and Butt, King, Ni~no, & Segond (1999).</Paragraph> <Paragraph position="2"> Grammars which have been under development for many years tend to be very difficult to mine for information, as they contain layers upon layers of interacting analyses and decisions made in light of various intermediate stages of the grammar. As a result, when embarking on the creation of a new grammar for another language, it seems almost easier to start from scratch than to try to model it on an existing grammar. This is unfortunate--being able to leverage the knowledge and infrastructure embedded in existing grammars would greatly accelerate the process of developing new ones. At the same time, these grammars represent an untapped resource for the bottom-up exploration of language universals.</Paragraph> <Paragraph position="3"> As part of the LinGO consortium's multi-lingual grammar engineering effort, we are developing a 'grammar matrix' or starter-kit, distilling the wisdom of existing grammars and codifying and documenting it in a form that can be used as the basis for new grammars.</Paragraph> <Paragraph position="4"> In the following sections, we outline the inventory of a first, preliminary version of the grammar matrix, discuss the interaction of basic construction types and semantic composition in unification grammars by means of a detailed example, and consider extensions to the core inventory that we foresee and an evaluation methodology for the matrix proper.</Paragraph> </Section> class="xml-element"></Paper>