File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2038_metho.xml
Size: 6,148 bytes
Last Modified: 2025-10-06 14:10:25
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2038"> <Title>Speeding Up Full Syntactic Parsing by Leveraging Partial Parsing Decisions</Title> <Section position="5" start_page="296" end_page="299" type="metho"> <SectionTitle> 3 Experiments & Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="296" end_page="298" type="sub_section"> <SectionTitle> 3.1 Our parser with chunks </SectionTitle> <Paragraph position="0"> Our parser uses a simpli ed version of the model presented in (Collins, 1996). For this experiment,we tested four versions of our internal parser: Our original parser. No optimizations or chunking information.</Paragraph> <Paragraph position="1"> Our original parser with chunking information. null Our optimized parser without chunking information. null Our optimized parser with chunking information. null For parsers that use chunking information, the runtime of the chunk parsing is included in the parser's runtime, to show that total gains in run-time offset the cost of running the chunker. We trained the chunk parser on all of Treebank except for section 23, which will be used as the test set. We trained our parser on all of Treebank except for section 23. Scoring of the parse trees [signed]V P [by]PP [the Big Board]NP [and]NP [the Chicago Mercantile Exchange]NP , [trading]NP [was temporarily halted]V P [in]PP [Chicago]NP . The color coding scheme is the same as in Figure 2. tion 23 with chunking information was done using the EVALB package that was used to score the (Collins, 1999) parser. The numbers represent the labeled bracketing of all sentences; not just those with 40 words or less.</Paragraph> <Paragraph position="2"> The experiment was run on a dual Pentium 4, 3.20Ghz machine with two gigabytes of memory.</Paragraph> <Paragraph position="3"> The results are presented in Table 1.</Paragraph> <Paragraph position="4"> The most notable result is the greatly reduced time to parse when chunking information was added. Both versions of our parser saw an average three fold increase in speed by leveraging chunking decisions. We also saw small increases in both precision and recall.</Paragraph> </Section> <Section position="2" start_page="298" end_page="298" type="sub_section"> <SectionTitle> 3.2 Collins Parsers with chunks </SectionTitle> <Paragraph position="0"> To show that this method is general and does not exploit weaknesses in the lexical model of our parser, we repeated the previous experiments with the three models of parsers presented in the (Collins, 1999). We made sure to use the exact same chunk post-processing rules in the Collins parser code to make sure that the same chunk information was being used. We used Collins' training data. We did not retrain the parser in any way to optimize for chunked input. We only modi ed the parsing algorithm.</Paragraph> <Paragraph position="1"> Once again, the chunk parser was trained on all of Treebank except for section 23, the trees are evaluated with EVALB, and these experiments were run on the same dual Pentium 4 machine.</Paragraph> <Paragraph position="2"> These results are presented in Table 2.</Paragraph> <Paragraph position="3"> Like our parser, each Collins parser saw a 23 with clause identi cation information. Data copied from the rst experiment has been italicized for comparison.</Paragraph> <Paragraph position="4"> slightly under three fold increase in speed. But unlike our parser, all three models of the Collins parser saw slight decreases in accuracy, averaging at -0.17% for both precision and recall. We theorize that this is because the errors in our lexical model are more severe than the errors in the chunks, but the Collins parser models make fewer errors in word grouping at the leaf node level than the chunker does. We theorize that a more accurate chunker would result in an increase in the precision and recall of the Collins parsers, while preserving the substantial speed gains.</Paragraph> </Section> <Section position="3" start_page="298" end_page="299" type="sub_section"> <SectionTitle> 3.3 Clause Identi cation </SectionTitle> <Paragraph position="0"> Encouraged by the improvement brought by using chunking as a source of restrictions, we used the data from our clause identi er.</Paragraph> <Paragraph position="1"> Again, our clause identi er was derived from (Carreras et al., 2002), using boosted C5.0 decision trees instead of their boosted binary decision tree method, which performs below their numbers: 88.85% precision, 70.22% recall on the CoNLL 2001 shared task test set.</Paragraph> <Paragraph position="2"> These results are presented in Table 3.</Paragraph> <Paragraph position="3"> Adding clause detection information hurt performance in every category. The increases in run-time are caused by the clause identi er's runtime complexity of over O(n3). The time to identify clauses is greater then the speed increases gained by using the output as restrictions.</Paragraph> <Paragraph position="4"> In terms of the drop in precision and recall, we believe that errors from the clause detector are grouping words together that are not all constituents of the same parent node. While errors in a chunk parse are relatively localized, errors in the hierarchical structure of clauses can affect the entire parse tree, preventing the parser from exploring the correct high-level structure of the sentence.</Paragraph> </Section> </Section> <Section position="6" start_page="299" end_page="299" type="metho"> <SectionTitle> 4 Future Work </SectionTitle> <Paragraph position="0"> While the modi cation given in section 2.2 is speci c to CYK parsing, we believe that placing restrictions based on the output of a chunk parser is general enough to be applied to any generative, statistical parser, such as the Charniak parser (2000), or a Lexical Tree Adjoining Grammar based parser (Sarkar, 2000). Restrictions can be placed where the parser would explore possible trees that would violate the boundaries determined by the chunk parser, pruning paths that will not yield the correct parse tree.</Paragraph> </Section> class="xml-element"></Paper>