File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3502_metho.xml
Size: 15,091 bytes
Last Modified: 2025-10-06 14:10:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3502"> <Title>Backbone Extraction and Pruning for Speeding Up a Deep Parser for Dialogue Systems</Title> <Section position="4" start_page="9" end_page="10" type="metho"> <SectionTitle> 3 The TRIPS and LCFLEX algorithms </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="9" end_page="10" type="sub_section"> <SectionTitle> 3.1 The TRIPS parser </SectionTitle> <Paragraph position="0"> The TRIPS parser we use as a baseline is a bottom-up chart parser with lexical entries and rules represented as attribute-value structures. To achieve parsing ef ciency, TRIPS uses a best- rst beam search algorithm based on the scores from a parse selection model (Dzikovska et al., 2005; Elsner et al., 2005).</Paragraph> <Paragraph position="1"> The constituents on the parser's agenda are grouped into buckets based on their scores. At each step, the bucket with the highest scoring constituents is selected to build/extend chart edges. The parsing stops once N requested analyses are found. This guarantees that the parser returns the N-best list of analyses according to the parse selection model used, unless the parser reaches the chart size limit.</Paragraph> <Paragraph position="2"> 1Other enhancements used by LINGO depend on disallowing disjunctive features, and relying instead on the type system. The TRIPS grammar is untyped and uses disjunctive features, and converting it to a typed system would require as yet undetermined amount of additional work.</Paragraph> <Paragraph position="3"> In addition to best- rst parsing, the TRIPS parser uses a chart size limit, to prevent the parser from running too long on unparseable utterances, similar to (Frank et al., 2003). TRIPS is much slower processing utterances not covered in the grammar, because it continues its search until it reaches the chart limit. Thus, a lower chart limit improves parsing ef ciency. However, we show in our evaluation that the chart limit necessary to obtain good performance in most cases is too low to nd parses for utterances with 15 or more words, even if they are covered by the grammar.</Paragraph> <Paragraph position="4"> The integration of lexical semantics in the TRIPS lexicon has a major impact on parsing in TRIPS.</Paragraph> <Paragraph position="5"> Each word in the TRIPS lexicon is associated with a semantic type from a domain-independent ontology.</Paragraph> <Paragraph position="6"> This enables word sense disambiguation and semantic role labelling for the logical form produced by the grammar. Multiple word senses result in additional ambiguity on top of syntactic ambiguity, but it is controlled in part with the use of weak selectional restrictions, similar to the restrictions employed by the VerbNet lexicon (Kipper et al., 2000). Checking semantic restrictions is an integral part of TRIPS parsing, and removing them signi cantly decreases speed and increases ambiguity of the TRIPS parser (Dzikovska, 2004). We show that it also has an impact on parsing with a CFG backbone in Section 4.1.</Paragraph> </Section> <Section position="2" start_page="10" end_page="10" type="sub_section"> <SectionTitle> 3.2 LCFLEX </SectionTitle> <Paragraph position="0"> The LCFLEX parser (Ros*e and Lavie, 2001) is an all-paths robust left corner chart parser designed to incorporate various robustness techniques such as word skipping, exible uni cation, and constituent insertion. Its left corner chart parsing algorithm is similar to that described by Briscoe and Carroll (1994). The system supports grammatical speci cation in a uni cation framework that consists of context-free grammar rules augmented with feature bundles associated with the non-terminals of the rules. LCFLEX can be used in two parsing modes: either context-free parsing can be done rst, followed by applying the uni cation rules, or uni cation can be done interleaved with context-free parsing. The context free backbone allows for ef cient left corner predictions using a pre-compiled left corner prediction table, such as that described in (van Noord, 1997). To enhance its ef ciency, it incorporates a provably optimal ambiguity packing algorithm (Lavie and Ros*e, 2004).</Paragraph> <Paragraph position="1"> These ef ciency techniques make feasible all-path parsing with the LCFLEX CARMEL grammar (Ros*e, 2000). However, CARMEL was engineered with fast all-paths parsing in mind, resulting in certain compromises in terms of coverage. For example, it has only very limited coverage for noun-noun compounding, or headless noun phrases, which are a major source of ambiguity with the TRIPS grammar.</Paragraph> </Section> </Section> <Section position="5" start_page="10" end_page="12" type="metho"> <SectionTitle> 4 Combining LCFLEX and TRIPS </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="10" end_page="11" type="sub_section"> <SectionTitle> 4.1 Adding CFG Backbone </SectionTitle> <Paragraph position="0"> A simpli ed TRIPS grammar rule for verb phrases and a sample verb entry are shown in Figure 1. The features for building semantic representations are omitted for brevity. Each constituent has an assigned category that corresponds to its phrasal type, and a set of (complex-valued) features.</Paragraph> <Paragraph position="1"> The backbone extraction algorithm is reasonably straightforward, with CFG non-terminals corresponding directly to TRIPS constituent categories. To each CFG rule we attach a corresponding TRIPS uni cation rule. After parsing is complete, the parses found are scored and ordered with the parse selection model, and therefore parsing accuracy in all-paths mode is the same or better than TRIPS accuracy for the same model.</Paragraph> <Paragraph position="2"> For constituents with subcategorized arguments (verbs, nouns, adverbial prepositions), our backbone generation algorithm takes the subcategorization frame into account. For example, the TRIPS VP rule will split into 27 CFG rules corresponding to different subcategorization frames: VP-V intr, VP-V NP NP, VP-V NP CP NP CP, etc. For each lexical entry, its appropriate CFG category is determined based on the subcategorization frame from TRIPS lexical representation. This improves parsing ef ciency using the prediction algorithms in TFLEX operating on the CFG backbone. The version of the TRIPS grammar used in testing contained 379 grammar rules with 21 parts of speech (terminal symbols) and 31 constituent types (nonterminal symbols), which were expanded into 1121 CFG rules with 85 terminals and 36 non-terminals during backbone extraction.</Paragraph> <Paragraph position="3"> We found, however, that the previously used tech- null the TRIPS grammar contains 43 loops resulting from lexical coercion rules or elliptical constructions. A small number of loops from lexical coercion were both obvious and easy to avoid, because they are in the form N- N. However, there were longer loops, for example, NP - SPEC for sentences like John's car and SPEC - NP for headless noun phrases in sentences like I want three . LCFLEX uses a re-uni cation algorithm that associates a set of uni cation rules with each CFG production, which are reapplied at a later stage. To be able to apply a uni cation rule corresponding to N- N production, it has to be explicitly present in the chart, leading to an in nite number of N constituents produced. Applying the extra CFG rules expanding the loops during re-uni cation would complicate the algorithm signi cantly. Instead, we implemented loop detection during CFG parsing.</Paragraph> <Paragraph position="4"> The feature structures prevent loops in uni cation, and we considered including certain grammatical features into backbone extraction as done in (Briscoe and Carroll, 1994). However, in the TRIPS grammar the feature values responsible for breaking loops belonged to multi-valued features (6 valued in the worst case), with values which may depend on other multiple-valued features in daughter constituents. Thus adding the extra features resulted in major backbone size increases because of category splitting. This can be remedied with additional pre-compilation (Kiefer and Krieger, 2004), however, this requires that all lexical entries be known in advance. One nice feature of the TRIPS lexicon is that it includes a mechanism for dynamically adding lexical entries for unknown words from wide-coverage lexicons such as VerbNet (Kipper et al., 2000), which would be impractical to use in precompilation. null Therefore, to use CFG parsing before uni cation in our system, we implemented a loop detector that checked the CFG structure to disallow loops. However, the next problem that we encountered is massive ambiguity in the CFG structure. Even a very short phrase such as a train had over 700 possible CFG analyses, and took 910 msec to parse compared to 10 msec with interleaved uni cation. CFG ambiguity is so high because noun phrase fragments are allowed as top-level categories, and lexical ambiguity is compounded with semantic ambiguity and robust rules normally disallowed by features during uni cation. Thus, in our combined algorithm we had to use uni cation interleaved with parsing to lter out the CFG constituents.</Paragraph> </Section> <Section position="2" start_page="11" end_page="12" type="sub_section"> <SectionTitle> 4.2 Ambiguity Packing </SectionTitle> <Paragraph position="0"> For building semantic representations in parallel with parsing, ambiguity packing presents a set of known problems (Oepen and Carroll, 2000). One possible solution is to exclude semantic features during an initial uni cation stage, use ambiguity packing, and re-unify with semantic features in a post-processing stage. In our case, we found this strategy dif cult to implement, since selectional restrictions are used to limit the ambiguity created by multiple word senses during syntactic parsing. Therefore, we chose to do ambiguity packing on the CFG structure only, keeping the multiple feature structures associated with each packed CFG constituent.</Paragraph> <Paragraph position="1"> To begin to evaluate the contribution of ambiguity packing on ef ciency, we ran a test on the rst 39 utterances in a hold out set not used in the formal evaluation below. Sentences ranged from 1 to 17 words in length, 16 of which had 6 or more words. On this set, the average parse time without ambiguity packing was 10 seconds per utterance, and 30 seconds per utterance on utterances with 6 or more words. With ambiguity packing turned on, the average parse time decreased to 5 seconds per utterance, and 13.5 seconds per utterance on the utterances with more than 6 words. While this evaluation showed that ambi- null guity packing improves parsing ef ciency, we determined that further enhancements were necessary.</Paragraph> </Section> <Section position="3" start_page="12" end_page="12" type="sub_section"> <SectionTitle> 4.3 Pruning </SectionTitle> <Paragraph position="0"> We added a pruning technique based on the scoring model discussed above and ambiguity packing to enhance system performance. As an illustration, consider an example from a corpus used in our evaluation where the TRIPS grammar generates a large number of analyses, we have a heart attack victim at marketplace mall . The phrase a heart attack victim has at least two interpretations, a [N1 heart [N1 attack [N1 victim]]] and a [N1 [N1 heart [N1 attack]] [N1 victim]] . The prepositional phrase at marketplace mall can attach either to the noun phrase or to the verb. Overall, this results in 4 basic interpretations, with additional ambiguity resulting from different possible senses of have .</Paragraph> <Paragraph position="1"> The best- rst parsing algorithm in TRIPS uses parse selection scores to suppress less likely interpretations. In our example, the TRIPS parser will chose the higher-scoring one of the two interpretations for a heart attack victim , and use it rst. For this NP the features associated with both interpretations are identical with respect to further processing, thus TRIPS will never come back to the other interpretation, effectively pruning it. At also has 2 possible interpretations due to word sense ambiguity: LF::TIME-LOC and LF::SPATIAL-LOC. The former has a slightly higher preference, and TRIPS will try it rst. But then it will be unable to nd an interpretation for at Marketplace Mall , and backtrack to LF::SPATIAL-LOC to nd a correct parse.</Paragraph> <Paragraph position="2"> Without chart size limits the parser is guaranteed to nd a parse eventually through backtracking. However, this algorithm does not work quite as well with chart size limits. If there are many similarlyscored constituents in the chart for different parts of the utterance, the best- rst algorithm expands them rst, and the the chart size limit tends to interfere before TRIPS can backtrack to an appropriate lower-scoring analysis.</Paragraph> <Paragraph position="3"> Ambiguity packing offers an opportunity to make pruning more strategic by focusing speci cally on competing interpretations for the same utterance span. The simplest pruning idea would be for every ambiguity packed constituent to eliminate the interpretations with low TRIPS scores. However, we need to make sure that we don't prune constituents that are required higher up in the tree to make a parse. Consider our example again.</Paragraph> <Paragraph position="4"> The constituent for at will be ambiguity packed with its two meanings. But if we prune LF::SPATIAL-LOC at that point, the parse for at Marketplace Mall will fail later. Formally, the competing interpretations for at have non-local features, namely, the subcategorized complement (time versus location) is different for those interpretations, and is checked higher up in the parse. But for a heart attack victim the ambiguity-packed interpretations differ only in local features. All features associated with this NP checked higher up come from the head noun victim and are identical in all interpretations. Therefore we can eliminate the low scoring interpretations with little risk of discarding those essential for nding a complete parse. Thus, for any constituent where ambiguity-packed non-head daughters differ only in local features, we prune the interpretations coming from them to a speci ed prune beam width based on their TRIPS scores.</Paragraph> <Paragraph position="5"> This pruning heuristic based on local features can be generalised to different uni cation grammars.</Paragraph> <Paragraph position="6"> For example, in HPSG pruning would be safe at all points where a head is combined with ambiguity-packed non-head constituents, due to the locality principle. In the TRIPS grammar, if a trips rule uses subcategorization features, the same locality principle holds. This heuristic has perfect precision though not complete recall, but, as our evaluation shows, it is suf cient to signi cantly improve performance in comparison with the TRIPS parser.</Paragraph> </Section> </Section> class="xml-element"></Paper>