File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1029_intro.xml

Size: 4,792 bytes

Last Modified: 2025-10-06 14:02:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1029">
  <Title>Optimizing Typed Feature Structure Grammar Parsing through Non-Statistical Indexing</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Developing efficient all-paths parsers has been a long-standing goal of research in computational linguistics. One particular class still in need of parsing time improvements is that of TFSGs. While simpler formalisms such as context-free grammars (CFGs) also face slow all-paths parsing times when the size of the grammar increases significantly, TFSGs (which generally have fewer rules than large-scale CFGs) become slow as a result of the complex structures used to describe the grammatical categories. In HPSGs (Pollard and Sag, 1994), one category description could contain hundreds of feature values. This has been a barrier in transferring CFGsuccessful techniques to TFSG parsing.</Paragraph>
    <Paragraph position="1"> For TFSG chart parsers, one of the most time-consuming operations is the retrieval of categories from the chart during rule completion (closing of constituents in the chart under a grammar rule).</Paragraph>
    <Paragraph position="2"> Looking in the chart for a matching edge for a daughter is accomplished by attempting unifications with edges stored in the chart, resulting in many failed unifications. The large and complex structure of TFS descriptions (Carpenter, 1992) leads to slow unification times, affecting the parsing times. Thus, failing unifications must be avoided during retrieval from the chart.</Paragraph>
    <Paragraph position="3"> To our knowledge, there have been only four methods proposed for improving the retrieval component of TFSG parsing. One (Penn and Munteanu, 2003) addresses only the cost of copying large categories, and was found to reduce parsing times by an average of 25% on a large-scale TFSG (MERGE).</Paragraph>
    <Paragraph position="4"> The second, a statistical method known as quick-check (Malouf et al., 2000), determines the paths that are likely to cause unification failure by profiling a large sequence of parses over representative input, and then filters unifications at run-time by first testing these paths for type consistency.</Paragraph>
    <Paragraph position="5"> This was measured as providing up to a 50% improvement in parse times on the English Resource Grammar (Flickinger, 1999, ERG). The third (Penn, 1999b) is a similar but more conservative approach that uses the profile to re-order sister feature values in the internal data structure. This was found to improve parse times on the ALE HPSG by up to 33%.</Paragraph>
    <Paragraph position="6"> The problem with these statistical methods is that the improvements in parsing times may not justify the time spent on profiling, particularly during grammar development. The static analysis method introduced here does not use profiling, although it does not preclude it either. Indeed, an evaluation of statistical methods would be more relevant if measured on top of an adequate extent of non-statistical optimizations. Although quick-check is thought to produce parsing time improvements, its evaluation used a parser with only a superficial static analysis of chart indexing.</Paragraph>
    <Paragraph position="7"> That analysis, rule filtering (Kiefer et al., 1999), reduces parse times by filtering out mother-daughter unifications that can be determined to fail at compile-time. True indexing organizes the data (in this case, chart edges) to avoid unnecessary retrievals altogether, does not require the operations that it performs to be repeated once full unification is deemed necessary, and offers the support for easily adding information extracted from further static analysis of the grammar rules, while maintaining the same indexing strategy. Flexibility is one of the reasons for the successful employment of indexing in databases (Elmasri and Navathe, 2000) and automated reasoning (Ramakrishnan et al., 2001).</Paragraph>
    <Paragraph position="8"> In this paper, we present a general scheme for indexing TFS categories during parsing (Section 3).</Paragraph>
    <Paragraph position="9"> We then present a specific method for statically analyzing TFSGs based on the type signature and the structure of category descriptions in the grammar rules, and prove its soundness and completeness (Section 4.2.1). We describe a specific indexing strategy based on this analysis (Section 4), and evaluate it on two large-scale TFSGs (Section 5). The result is a purely non-statistical method that is competitive with the improvements gained by statistical optimizations, and is still compatible with further statistical improvements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML