XML Viewer - w04-0304

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0304_metho.xml
Size: 25,408 bytes
Last Modified: 2025-10-06 14:09:04
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0304">
  <Title>Incremental Parsing with Reference Interaction</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Instantiating the Architecture
</SectionTitle>
    <Paragraph position="0"> Working in the context of TRIPS, an existing task-oriented dialogue system, we have modified the existing parser and reference resolution modules so that they communicate incrementally with each other. This models the early incorporation of reference resolution information seen in humans (Chambers et al., 1999; Allopenna et al., 1998), and allows reference resolution information to affect parsing decisions.</Paragraph>
    <Paragraph position="1"> For example, in &amp;quot;Put the apple in the box in the corner&amp;quot; there is an attachment ambiguity. Reference resolution can determine the number of matches for the noun phrase &amp;quot;the apple&amp;quot; incrementally; if there is a single match, the parser would expect this to be a complete NP, and prefer the reading where the box is in the corner. If reference returns multiple matches for &amp;quot;the apple&amp;quot;, the parser would expect disambiguating information, and prefer a reading where additional information about the apple is provided: in this case, an the NP &amp;quot;the apple in the box&amp;quot;. With solid feedback from reference, it should be possible to remove some of the ambiguity inherent in the search process within the parser. This will simultaneously guide the search to the most likely region of the search space, improving accuracy, and delay the search of unlikely regions, improving efficiency. Of course, this comes at the cost of some communication overhead and additional reference resolution. Ideally, the overall improvement in the parser's search space would be enough to cover the additional incremental operation costs of other modules.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 An Incremental Parser
</SectionTitle>
      <Paragraph position="0"> The pre-existing parser in the dialogue system was a pure bottom-up chart parser with a hand-built grammar suited for parsing task-oriented dialogue. The grammar consisted of a context-free backbone with a set of associated features and semantic restrictions, including agreement, hard subcategorization constraints, and soft selectional restriction preferences. The parser has been modified so that whenever a constituent is built, it can be sent forward to the Mediator, allowing for the possibility of feedback. The architecture and experiments described in this paper were performed in a synchronous mode, but the parser can also operate in an incrementally asynchronous mode, where it continues to build the chart in parallel with other modules' operations; probability adjustments to the chart then cascade to dependent constituents.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Interaction with Reference
</SectionTitle>
      <Paragraph position="0"> When the parser builds a potential referring expression (e.g. any NP), it is immediately passed on to the Advisor, the reference resolution module described in Tetreault et. al. (2004) modified for incremental interaction. This module then determines all possible discourse referents, providing the parser with a ranked classification based on the salience of the referents and the (incremental) syntactic environment. null The reference module keeps a dynamically updated list of currently salient discourse entities against which incoming incrementally constructed NP constituents are matched. Before any utterances are processed, the module loads a static database of relevant place names in the domain; all other possible referents are discourse entities which have been spoken of during the course of the dialogue.</Paragraph>
      <Paragraph position="1"> For efficiency, the dynamic portion of the context list is limited to the ten most recent contentful utterances; human-annotated antecedent data for this corpus shows that 99% of all pronoun antecendents fall within this threshold. After each sentence is fully parsed the context list is updated with new discourse entities introduced in the utterance; ideally, these context updates would also be incremental, but this feature was omitted in the current version for simplicity.</Paragraph>
      <Paragraph position="2"> The matching process is based on that described by Byron (2000), and differs from that of many other reference modules in that every entity and NP-constituent has a (possibly underspecified) semantic feature vector, and it is both the logical and semantic forms which determine successful matchings. Adding semantic information increases the accuracy of the reference resolution from 44% to 58% (Tetreault and Allen, 2004), and consequently improves the feedback provided to the parser.</Paragraph>
      <Paragraph position="3"> The Mediator receives the set of all possible referents, including the semantic content of the referent and a classification of whether the referent is the single salient entity in focus, has previously been mentioned, or is a relevant place name.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Mediator
</SectionTitle>
      <Paragraph position="0"> The Mediator interprets the information received from reference and determines how the parser's chart should be modified. If the NP matches nothing in the discourse context, no match is returned; otherwise each referent is annotated with its type and discourse distance, and this set is run through a classifier to reduce it to a single tag. The resulting tag is the reference resolution tag, or R. The NP constituents are also classified by definiteness and number, giving an NP tag N.</Paragraph>
      <Paragraph position="1"> For each classifier, we trained a probability model which calculated Pr, the probability that a noun phrase constituent c would be in the final parse, conditioned on R and N, or Pr = p(c in final parsejR;N): This probability was then linearly combined with the parser's constituent probability,</Paragraph>
      <Paragraph position="3"> according to the equation</Paragraph>
      <Paragraph position="5"> for various values of ,. Evaluation using held-out data suggested that a value of , = 0:2 would be optimal. This style of feedback is an example of chart subversion, as it is a direct modification of constituent probabilities by the Mediator, defining a new probability distribution.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> The Monroe domain (Tetreault et al., 2004; Stent, 2001) is a series of task-oriented dialogues between human participants set in a simulated rescue operation domain, where participants collaboratively plan responses to emergency calls. Dialogues were recorded, broken up into utterances, and then transcribed by hand, removing speech repairs from the parser input. These transcriptions served as input for all experiments reported below.</Paragraph>
    <Paragraph position="1"> A probabilistic grammar was trained from supervised data, assigning PCFG probabilities for the rule expansions in the CFG backbone of the handcrafted, semantically constrained grammar. The parser was run using this grammar, but without any incremental interaction whatsoever, in order to establish baseline accuracy and efficiency numbers.</Paragraph>
    <Paragraph position="2"> The corpus consists of six task-oriented dialogues; four were used for the PCFG training, one was held out to establish appropriate parameter values, and one was selected for testing. The held-out and test dialogues contain hand-checked gold standard parses.</Paragraph>
    <Paragraph position="3"> Under normal operation of the sequential dialogue system, the parser is run in best-first mode, providing only a single analysis to higher-level modules, and has a constituent construction limit in  reference feedback, (b) An Oracle Advisor correctly determining status of all NPs, (c) An Oracle Advisor correctly determining status of definite singular NPs.</Paragraph>
    <Paragraph position="4"> an attempt to simulate the demands of a real-time system. When the parser reaches the constituent limit, appropriate partial analyses are collected and forwarded to higher-level modules. These constraints were kept in place during our experiments, because they would be necessary under normal operation of the system. Thus, the inability to parse a sentence does not necessarily indicate a lack of coverage of the grammar, but rather a lack of efficiency in the parsing process.</Paragraph>
    <Paragraph position="5"> As can be seen in Table 1, the parser achieves a 94.6% labelled bracket precision, and a 71.1% labelled bracket recall. Note that only constituents of complete parses were checked against the gold standard, to avoid any bias introduced by the partial parse evaluation metric. Of the 290 gold standard utterances in the test data, 270 could be parsed, and 224 were parsed perfectly.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Oracle Evaluation
</SectionTitle>
      <Paragraph position="0"> We began with a feasibility study to determine how significant the effects of incremental advice on noun phrases could be in principle. The feedback from the reference module is designed to determine whether particular NPs are good or bad from a reference standpoint. We constructed a simple feedback oracle from supervised data which determined, for each NP, whether or not the final parse of the sentence contained an NP constituent which spanned the same input. Those NPs marked &amp;quot;good&amp;quot;, which did appear in the parse, were added to the chart as new constituents. NPs marked &amp;quot;bad&amp;quot; were added to the chart with a probability of zero1. A second or1In some sense, this style of feedback is an example of heuristic subversion, as it has the effect of keeping &amp;quot;good&amp;quot; analyses around while removing &amp;quot;bad&amp;quot; analyses from the search space. Technically, this is also chart subversion, as each hypothesis has its score multiplied by 1 or 0, depending on acle evaluation performed this same task, but only providing feedback on definite singular NPs.</Paragraph>
      <Paragraph position="1"> The results of both oracles are shown in Table 1. The first five rows give the precision, recall, fstatistic, the raw f-statistic improvement, and the f-statistic error reduction percentage, all determined in terms of labelled bracket accuracy. There is a marked increase in both precision and recall, with an overall error reduction of 42.4% with the full oracle and 27.2% with the definite singular oracle.</Paragraph>
      <Paragraph position="2"> Thus, in this domain over a quarter of all incorrectly labelled constituents are attributable to syntactically incorrect definite singular NPs. The number of constituents built during the parse is used as a measure of efficiency, and the work reduction is reported in the sixth row of the table, showing an efficiency improvement of 30.3% or 18.7%, depending on the oracle. The final two lines of the table show that both the number of sentences which can be parsed and the number of sentences which are perfectly parsed increase under both models.</Paragraph>
      <Paragraph position="3"> The nature of the oracle experiment ensures some reduction in error and complexity, but the magnitude of the improvement is surprising, and certainly encouraging for the prospects of incremental reference. Definite singular NPs typically have a unique referent, providing a locus for effective feedback, and we believe that incremental interaction with an accurate reference module might approach the oracle performance.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Dialogue Experiments
</SectionTitle>
      <Paragraph position="0"> For these experiments the parser interacted with the actual reference module, incorporating feedback according to the model discussed in Section 3.3. The first data column of Table 2 repeats the baseline results of the parser without reference feedback. The next two columns show statistics for a run of the parser with incremental feedback from reference, using a probability model based on a classification scheme which distinguished only whether or not the set of referent matches was empty. The second data column shows the results for the estimated interpolation parameter value of , = 0:2, while the third data column shows results for the empirically determined optimal , value of 0.1.</Paragraph>
      <Paragraph position="1"> The results are encouraging, with an error reduction of 8.2% or 9.3% on the test dialogue, although the amount of work the parser performed was reduced by only 4.0% and 3.6%. A further encouraging sign is that for every exploratory , value we whether it is &amp;quot;good&amp;quot; or &amp;quot;bad&amp;quot;. In this degenerate case of all-or-nothing feedback, chart subversion and heuristic subversion are equivalent.</Paragraph>
      <Paragraph position="2">  tried in either the held-out or the test data, both the accuracy and efficiency improved. Reference information also helped increase both the number of sentences that could be parsed and the number of sentences that were parsed perfectly, although the improvements were small.</Paragraph>
      <Paragraph position="3"> The estimated value of , = 0:2 produced an error reduction that was approximately 20% of the oracular, which is a very good start, especially considering that this experiment used only the information of whether there was a referent match or not. The efficiency gains were more modest at just above 10% of the oracular results, although one would expect less radical efficiency improvements from this experiment, since under the linear interpolation of the experiment, even extremely dispreferred analyses may be expanded, whereas the oracle simply drops all dispreferred NPs off the beam immediately.</Paragraph>
      <Paragraph position="4"> We performed a second experiment that made more complete use of the reference data, breaking down referent sets according to when and how often they were mentioned, whether they matched the focus, and whether they were in the set of relevant place names. We expected that this information would provide considerably better results than the simple match/no-match classification above. For example, consider a definite singular NP: if it matches a single referent, one would expect it to be in the parse with high probability, but multiple matches would indicate that the referent was not unique, and that the base noun probably requires additional discriminating information (e.g. a prepositional phrase or restrictive relative clause).</Paragraph>
      <Paragraph position="5"> Unfortunately, as the final column of Table 2 shows, the additional information did not provide much of an advantage. The amount of work done was reduced by 4.6%, the largest of any efficiency improvement, but error reduction was only 5.8%, and the number of sentences parsed perfectly actually decreased by one.</Paragraph>
      <Paragraph position="6"> We conjecture that co-reference chains may be a significant source of confusion in the reference data.</Paragraph>
      <Paragraph position="7"> Ideally, if several entities in the discourse context all refer to the same real-world entity, they should be counted as a single match. The current reference module does construct co-referential chains, but a single error in co-reference identification will cause all future NPs to match both the chain and the misidentified item, instead of producing the single match desired.</Paragraph>
      <Paragraph position="8"> The reference module has to rely on the parser to provide the correct context, so there is something of a bootstrapping problem at work, which indicates both a drawback and a potential of this type of incremental interaction. The positive feedback loop bodes well for the potential benefits of the incremental system, because as the incremental reference information begins to improve the performance of the parser, the context provided to the reference resolution module improves, which provides even more accurate reference information. Of course, in the early stages of such a system, this works against us; many of the reference resolution errors could be a result of the poor quality of the discourse context.</Paragraph>
      <Paragraph position="9"> Our current efforts aim to identify and correct these and other reference resolution issues. Not only will this improve the performance of the Reference Advisor from an incremental parsing standpoint, but it should also further our understanding of reference resolution itself.</Paragraph>
      <Paragraph position="10"> We have shown efficiency improvements in terms of the overall number of constituents constructed by the parser; however, one might ask whether this improvement in parsing speed comes at a large cost to the overall efficiency of the system. We suggest that this is in some sense the wrong question to ask, because for a real-time interactive system the primary concern is to keep up with the human interlocutor, and the incremental approach offers a far greater opportunity for parallelism between modules. In terms of time elapsed from speech to analysis, the system as a whole should benefit from the incremental architecture. null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Semantic Replacement
</SectionTitle>
    <Paragraph position="0"> When the word &amp;quot;it&amp;quot; is parsed as a referential NP, it is given highly underspecified semantics. We have implemented a Mediator which, for each possible referent for &amp;quot;it&amp;quot;, adds a new item to the parser's chart with the underspecified semantics of &amp;quot;it&amp;quot; instantiated to the semantics of the referent.</Paragraph>
    <Paragraph position="1"> Consider the sentence sequence &amp;quot;Send the bus to the hospital&amp;quot;, &amp;quot;Send it to the mall&amp;quot;. At the point that the NP &amp;quot;it&amp;quot; is encountered in the second sentence, it has not yet been connected to the verb, so the incremental reference resolution determines that &amp;quot;the bus&amp;quot; and &amp;quot;the hospital&amp;quot; are both possible referents. We add two new constituents to the chart: &amp;quot;it&amp;quot;[the hospital] and &amp;quot;it&amp;quot;[the bus]. They are given probabilities infinitesimally higher than the &amp;quot;it&amp;quot;[underspecified] which already exists on the chart. Thus, if either of the new versions of &amp;quot;it&amp;quot; match the semantic restrictions inherent in the rest of the parse, they will be featured in parses with a higher probability than the underspecified version.</Paragraph>
    <Paragraph position="2"> &amp;quot;It&amp;quot;[the bus] matches the mobility required of the object of &amp;quot;send&amp;quot;, while &amp;quot;it&amp;quot;[the hospital] does not. This results in a parse where the semantics of &amp;quot;it&amp;quot; are instantiated early and incrementally.</Paragraph>
    <Paragraph position="3"> This sort of capability is key for an end-to-end incremental system, because neither the reference module nor the parser is capable, by itself, of determining incrementally that the reference in question must be &amp;quot;the bus&amp;quot;. If we want an end-to-end system which can interact incrementally with the user, this type of decision-making must be made in an incremental fashion.</Paragraph>
    <Paragraph position="4"> This ability is also key in the presence of soft constraints or other Advisors which prefer one possible moveable referent to another; under incremental parsing, these constraints would have the chance to be applied during the parsing process, whereas a sequential system has no alternatives to the default, underspecified pronoun, and so cannot apply these restrictions to discriminate between referents.</Paragraph>
    <Paragraph position="5"> Our implementation performs the semantic vetting discussed above, but we have done no large-scale experiments in this area.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> There are instances in the literature of incremental parsers that pass forward information to higher-level modules, but none, to our knowledge, are designed as continuous understanding systems, where all levels of language analysis occur (virtually) simultaneously. null For example, there are a number of robust semantic processing systems (Pinkal et al., 2000; Rose, 2000; Worm, 1998; Zechner, 1998) which contain incremental parsers that pass on partial results immediately to the robust semantic analysis component, which begins to work on combining these sentence fragments. If the parser cannot find a parse, then the semantic analysis program has already done at least part of its work. However, none of the above systems have a feedback loop between the semantic analysis component and the incremental parser. So, while all of these are in some sense examples of incremental parsing, they are not continuous understanding models.</Paragraph>
    <Paragraph position="1"> Schuler (2002) describes a parser which builds both a syntactic tree and a denotation-based semantic analysis as it parses. The denotations of constituents in the environment are used to inform parsing decisions, much as we use the static database of place names. However, the feedback in our system is richer, based on the context provided by the preceding discourse. Furthermore, as an instantiation of the general architecture presented in Section 2, our system is more easily extensible to other forms of feedback.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Future Work
</SectionTitle>
    <Paragraph position="0"> There is a catch-22 in that the accurate reference information necessary to improve parsing accuracy is dependent on an accurate discourse context which is reliant on accurate parsing. One way to cut this Gordian Knot is to use supervised data to ensure that the discourse context in the reference module is updated with the gold standard parse of the sentence rather than the parse chosen by the parser; a context oracle, if you will.</Paragraph>
    <Paragraph position="1"> A major undertaking necessary to advance this work is an error analysis of the reference module and of the parser's response to feedback; when does feedback lead to additional work or decreased accuracy on the part of the incremental parser, and is the feedback that leads to these errors correct from a reference standpoint? Currently, the accuracy of the parser is couched in syntactic terms. The precision of the baseline PCFG is fairly high at 94.6%, but that could conceal semantic errors, which could be corrected with reference information. Assessing semantic accuracy is one of a number of alternative evaluation metrics that we are exploring.</Paragraph>
    <Paragraph position="2"> We intend to gather timing data and investigate other efficiency metrics to determine to what extent the efficiency gains in the parser offset the communication overhead and the extra work performed by the reference module.</Paragraph>
    <Paragraph position="3"> We also plan to do experiments with different feedback regimes, experimenting both with the actual reference results and with the oracle data. Further experiments with this oracle data should enable us to appropriately parameterize the linear interpolation, and indeed, to investigate whether linear interpolation itself is a productive feedback scheme, or whether an integrated probability distribution over parser and reference judgments is more effective. The latter scheme is not only more elegant, but can also be shown to produce probabilities equivalent to those assigned parses in the parse re-ranking task (Stoness, 2004).</Paragraph>
    <Paragraph position="4"> We've shown (Stoness, 2004) that feedback which punishes constituents that are not in the final parse cannot result in reduced accuracy or efficiency; under certain restrictions, the same holds of rewarding constituents that will be in the final parse. However, it is not clear how quickly the efficiency and accuracy gains drop off as errors mount. By introducing random mistakes into the Oracle Advisor, we can artificially achieve any desired level of accuracy, which will enable us to explore the characteristics of this curve. The accuracy and efficiency response under error has drastic consequences on the types of Advisors that will be suitable under this architecture. null Finally, it is clear that finding only the discourse context referents of a noun phrase is not sufficient; intuitively, and as shown by Schuler (2002), real-world referents can also aid in the parsing task. We intend to enhance the reference resolution component of the system to identify both discourse and real-world referents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML