File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1055_intro.xml

Size: 5,335 bytes

Last Modified: 2025-10-06 14:02:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1055">
  <Title>Skeletons in the parser: Using a shallow parser to improve deep parsing</Title>
  <Section position="4" start_page="1" end_page="3" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
2.1 The Monroe Corpus
</SectionTitle>
      <Paragraph position="0"> Our data consists of transcribed dialogs between two humans engaged in carefully designed tasks in simulated emergency management situations in Monroe County, New York (Stent, 2001). The scenario was designed to encourage collaborative prob- null lem solving and mixed initiative interaction involving complex planning and coordination between the participants, so the communication is very spontaneous and interactive. The corpus is split into utterances, and the speech repairs are marked and automatically removed for these tests. Utterances that are incomplete or uninterpretable (by humans) are also marked and eliminated from the corpus. The remaining utterances form the set on which we have been developing and testing the grammar. Figure 1 shows an excerpt from one of the dialogs.</Paragraph>
      <Paragraph position="1"> The entire Monroe corpus consists of 20 dialogs ranging from about 7 minutes up to 40 minutes in length. Our tests here focus on a subset of five dialogs that have been used to drive the grammar development: s2, s4, s12, s16 and s17 (henceforth dialogs 1, 2, 3, 4 and 5), constituting 1556 parseable utterances.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
2.2 The TRIPS Parser
</SectionTitle>
      <Paragraph position="0"> The deep parser we used is a robust parsing system developed in the TRIPS system over the past five years being driven from five different domains.</Paragraph>
      <Paragraph position="1"> The grammatical formalism and parsing framework is essentially a lexicalized version of the formalism described in (Allen, 1995). It is a GPSG/HPSG (Pollard and Sag, 1994) inspired unification grammar of approximately 1300 rules with a rich model of semantic features (Dzikovska, 2004). The parser  Parseable utterances exclude utterances that are incomplete or ungrammatical (see (Tetreault et al., 2004).) is an agenda-driven best-first chart parser that supports experimentation with different parsing strategies, although in practice we almost always use a straightforward bi-directional bottom-up algorithm.</Paragraph>
      <Paragraph position="2"> As an illustration of its flexibility, the modifications required to perform this experiment required adding only one function of ten lines of code. The grammar used for these experiments is the same TRIPS grammar used in all our applications, and the rules have hand-tuned weights. The weights of newly derived constituents are computed exactly as in a PCFG algorithm, the only difference being that the weights don't necessarily add to 1 and so are not probabilities. null  The TRIPS parser does not use a maximum entropy model (cf. the XLE system (Kaplan et al., 2004)) because there is insufficient training data and it is as yet unclear how such as model would perform at the detailed level of semantic representation produced by the TRIPS parser (see Figure 2 and discussion below).</Paragraph>
      <Paragraph position="3"> The rules, lexicon, and semantic ontology are independent of any specific domain but tailored to human-computer practical dialog. The grammar is fairly extensive in coverage (and still growing), and has quite good coverage of a corpus of human-human dialogs in the Monroe domain, an emergency management domain (Swift et al., 2004). The  We have a version of the grammar that uses a non-lexicalized PCFG model, but it was not used here as it does not perform as well. Thus we are using our best model, making it the most challenging to show improvement.</Paragraph>
      <Paragraph position="4">  three hundred pounds of the oranges were put in the truck. system is in active use in our spoken dialog understanding work in several different domains. It operates in close to real-time for short utterances, but degrades in performance as utterances become longer than 8 or 9 words. As one way to control ambiguity, the grammar makes use of selectional restrictions. Our semantic model utilizes two related mechanisms: first, an ontology of the predicates that are used to create the logical forms, and second, a vector of semantic features associated with these predicates that are used for selectional restrictions. The grammar computes a flattened and unscoped logical form using reified events (see also (Copestake et al., 1997) for a flat semantic representation), with many of its word senses derived from FrameNet frames (Johnson and Fillmore, 2000) and semantic roles (Fillmore, 1968). An example of the logical form representation produced by the parser is shown in Figure 2, in both a dependency graph (upper) and the actual parser output (lower).</Paragraph>
      <Paragraph position="5">  Term constructors appearing at the leftmost edge of terms in the parser output are F (relation), A (indefinite entity), THE (definite entity) and QUANTITY-TERM (numeric expressions). null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML