File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1052_intro.xml

Size: 5,476 bytes

Last Modified: 2025-10-06 14:06:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1052">
  <Title>Charting the Depths of Robust Speech Parsing</Title>
  <Section position="4" start_page="405" end_page="406" type="intro">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="405" end_page="406" type="sub_section">
      <SectionTitle>
2.1 The Chart Parser
</SectionTitle>
      <Paragraph position="0"> The parser used in the system is a bottom-up chart parser. Since the grammar is a pure unification-based grammar, there is no context-free backbone and the chart edges are labelled with typed feature structures. At the moment, there is no local ambiguity packing of chart edges. Therefore, the worst case complexity of parsing is potentially exponential, but since the parser employs a best-first strategy, exponential behavior is rarely found in practice.</Paragraph>
      <Paragraph position="1"> The parser provides a flexible priority system for guiding the parsing process, using parsing tasks on an agenda. A parsing task represents the combination of a passive chart edge and an active chart edge or a rule. When such a combination succeeds, new tasks are generated and for each new task, a priority is assigned.</Paragraph>
      <Paragraph position="2"> This priority system helps to obtain good partial results, even in cases where the search space cannot be fully explored due to parsing time restrictions. A higher time bound would allow either the processing of more WHG paths or a more elaborate analysis of the given input, both  of which may lead to better results. The decision when to switch to the next best path of a given WHG depends on the length of the input and on the time already used. After the parsing of one path is finished, the passive edges of the chart form a directed acyclic graph which is directly used as input to compute best partial analyses.</Paragraph>
      <Paragraph position="3"> We note here that the parser processes the n-best paths of a WHG fully incrementally. I.e., when the analysis of a new input path begins, only those input items are added to the chart that have not been part of a previously treated path. Everything else that has been computed up to that point remains in the chart and can be used to process the new input without being recomputed.</Paragraph>
    </Section>
    <Section position="2" start_page="406" end_page="406" type="sub_section">
      <SectionTitle>
2.2 The HPSG Grammars
</SectionTitle>
      <Paragraph position="0"> The grammars for English, German, and Japanese follow the paradigm of HPSG (Pollard and Sag, 1994) which is the most advanced unification-based grammatical theory based on typed feature structures. The fundamental concept is that of a sign, a structure incorporating information from all levels of linguistic analysis, such as phonology, morphology, syntax, and semantics. This structure makes all information simultaneously available and provides declarative interfaces between these levels. The grammars use Minimal Recursion Semantics (Copestake et al., 1996) as the semantic representation formalism, allowing us to deal with ambiguity by underspecification.</Paragraph>
      <Paragraph position="1"> To give an impression of the size of grammars, we present the numbers for the German grammar. It consists of 2,389 types, 76 rule schemata, 4,284 stems and an average of six entries per stem. Morphological information is computed online which further increases the lexical ambiguity.</Paragraph>
    </Section>
    <Section position="3" start_page="406" end_page="406" type="sub_section">
      <SectionTitle>
2.3 Partial Analyses and the
Syntax-Semantics Interface
</SectionTitle>
      <Paragraph position="0"> Our architecture requires that the linguistic analysis module is capable of delivering not just analyses of complete utterances, but also of phrases and even of lexical items in the special interface format of VITs (VERBMOBIL Interface Terms) (Bos et al., 1998). There are three considerations which the interface has to take into account: 1. Only maximal projections, i.e., complete phrases, are candidates for robust processing. This qualifies, e.g., prepositional and noun phrases. On the other hand, this approach leaves gaps in the coverage of the input string as not every word needs to be dominated by a maximal projection. In particular, verbal projections below the sentential level usually are incomplete phrases. The use of intermediate, incomplete projections is avoided for several reasons:  * intermediate projections are highly grammar and language specific and * there are too many of them.</Paragraph>
      <Paragraph position="1"> 2. Phrases must be distinguished from elliptical utterances. A major difference is that  elliptical utterances express a speech act. E.g., a prepositional phrase can be a complete utterance expressing an answer to a question (On Monday.) or a question itself (On Monday?). If the phrase occurs in a sentence, it is not associated with a speech act of its own. This distinction is dealt with in the grammars by specifying special types for these complete utterances, phrases, and lexical items.</Paragraph>
      <Paragraph position="2"> 3. For robust processing, the interface must export a certain amount of information from syntax and morphology together with the semantics of the phrase. In addition, it is necessary to represent semantically empty parts of speech, e.g., separable verb prefixes in German.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML