File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1159_metho.xml
Size: 19,950 bytes
Last Modified: 2025-10-06 14:07:50
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1159"> <Title>Extending A Broad-Coverage Parser for a General NLP Toolkit</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 NLP Toolkit </SectionTitle> <Paragraph position="0"> In the previous section, we mentioned a</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Natural Language Processing Toolkit </SectionTitle> <Paragraph position="0"> (NLPTK) that allows programmers with no linguistic knowledge to rapidly develop natural language user interfaces for their applications.</Paragraph> <Paragraph position="1"> The toolkit should incorporate the major components of an NLP system, such as a spell checker, a parser and a semantic representation generator. Using the toolkit, a software engineer will be able to create a system that incorporates complex NLP techniques such as syntactic parsing and semantic understanding.</Paragraph> <Paragraph position="2"> In order to provide NL control to an application, the NLPTK needs to generate semantic representations for input sentences.</Paragraph> <Paragraph position="3"> We refer to each of these semantic forms as a frame, which is basically a predicate-argument representation of a sentence.</Paragraph> <Paragraph position="4"> The NLPTK is implemented using the following steps: 1. NLPTK begins to create an NLP front end by generating semantic representations of sample input sentences provided by the programmer.</Paragraph> <Paragraph position="5"> 2. These representations are expanded using synonym sets and stored in a Semantic Frame Table (SFT), which becomes a comprehensive database of all the possible commands a user could request the system to do.</Paragraph> <Paragraph position="6"> 3. The toolkit then creates methods for attaching the NLP front end to the back end applications.</Paragraph> <Paragraph position="7"> 4. When the NLP front end is released, a user may enter an NL sentence, which is translated into a semantic frame by the system. The SFT is then searched for an equivalent frame. If a match is found, the action or command linked to this frame is executed.</Paragraph> <Paragraph position="8"> In order to generate semantic representations in Step 1, the parser has to parse the input sentences into syntactic trees. During the process of building an NLP system, the programmer needs to customize the parser of the toolkit for their specific domain. For example, the toolkit provides an interface to highlight the domain specific words that are not in the lexicon. The toolkit then asks the programmer for information that helps the system insert the correct lexical item into the lexicon. The NLPTK development team must handle complicated customizations for the programmer. For example, we might need to change the rules behind the domain specific parser to handle certain natural language input. In Step 4, when the programmer finishes building an NLP application, the system will implement a domain specific parser. The toolkit has been completely implemented and tested.</Paragraph> <Paragraph position="9"> We use a corpus of email messages from our customers for developing the system.</Paragraph> <Paragraph position="10"> These emails contain questions, comments and general inquiries regarding our documentconversion products. We modified the raw email programmatically to delete the attachments, HTML tags, headers and sender information. In addition, we manually deleted salutations, greetings and any information not directly related to customer support. The corpus contains around 34,640 lines and 170,000 words. We constantly update it with new emails from our customers.</Paragraph> <Paragraph position="11"> From this corpus, we created a test corpus of 1000 inquiries to test existing broad-coverage parsers and the parser of the toolkit.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Minipar in NLPTK </SectionTitle> <Paragraph position="0"> We choose to use Minipar (Lin, 2001), a widely known parser in commercial domains, as the general parser of NLPTK. It is worth pointing out that our methodology does not depend on any individual parser, and we can use any other available parser.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Introduction to Minipar </SectionTitle> <Paragraph position="0"> Minipar is a principle-based, broad-coverage parser for English (Lin, 2001). It represents its grammar as a network of nodes and links, where the nodes represent grammatical categories and the links represent types of dependency relationships. The grammar is manually constructed, based on the Minimalist Program (Chomsky, 1995).</Paragraph> <Paragraph position="1"> Minipar constructs all possible parses of an input sentence. It makes use of the frequency counts of the grammatical dependency relationships extracted by a collocation extractor (Lin, 1998b) from a 1GB corpus parsed with Minipar to resolve syntactic ambiguities and rank candidate parse trees.</Paragraph> <Paragraph position="2"> The dependency tree with the highest ranking is returned as the parse of the sentence.</Paragraph> <Paragraph position="3"> The Minipar lexicon contains about 130,000 entries, derived from WordNet (Fellbaum, 1998) with additional proper names. The lexicon entry of a word lists all possible parts of speech of the word and its subcategorization frames (if any).</Paragraph> <Paragraph position="4"> Minipar achieves about 88% precision and 80% recall with respect to dependency relationships (Lin, 1998a), evaluated on the SUSANNE corpus (Sampson, 1995), a subset of the Brown Corpus of American English.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Disadvantages of Minipar </SectionTitle> <Paragraph position="0"> In order to see how well Minipar performs in our domain, we tested it on 584 sentences from our corpus. Instead of checking the parse trees, we checked the frames corresponding to the sentences, since the accuracy of the frames is what we are most concerned with. If any part of a frame was wrong, we treated it as an error of the module that contributed to the error. We counted all the errors caused by Minipar and its accuracy in terms of correctly parsed sentences is 77.6%. Note that the accuracy is actually lower because later processes fix some errors in order to generate correct frames.</Paragraph> <Paragraph position="1"> The majority of Minipar errors fall in the following categories: 1. Tagging errors: some nouns are mis-tagged as verbs. For example, in Can I get a copy of the batch product guide?, guide is tagged as a verb.</Paragraph> <Paragraph position="2"> 2. Attachment errors: some prepositional phrases (PP) that should be attached to their immediate preceding nouns are attached to the verbs. For example, in Can Drake convert the PDF documents in Japanese?, in Japanese is attached to convert.</Paragraph> <Paragraph position="3"> 3. Missing lexical entries: some domain specific words such as download and their usages are not in the Minipar lexicon.</Paragraph> <Paragraph position="4"> This introduces parsing errors because such words are tagged as nouns by default.</Paragraph> <Paragraph position="5"> 4. Inability to handle ungrammatical sentences: in a real world application, it is unrealistic to expect the user to enter only grammatical sentences. Although Minipar still produces a syntactic tree for an ungrammatical sentence, the tree is ill formed and cannot be used to extract the semantic information being expressed.</Paragraph> <Paragraph position="6"> In addition, Minipar, like other broad-coverage parsers, cannot be adapted to specific applications. Its accuracy does not satisfy the needs of our toolkit. We have to build another parser on top of Minipar to enable domain specific customizations to increase the parsing accuracy.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The Shallow Parser </SectionTitle> <Paragraph position="0"> Our NLPTK maps input sentences to action requests. In order to perform an accurate mapping the toolkit needs to get information such as the sentence type, the main predicate, the arguments of the predicate, and the modifications of the predicate and arguments from a sentence. In other words, it mostly needs local dependency relationships.</Paragraph> <Paragraph position="1"> Therefore we decided to build a shallow parser instead of a full parser. A parser that captures the most frequent verb argument structures in a domain can be built relatively fast. It takes less space, which can be an important issue for certain applications. For example, when building an NLP system for a handheld platform, a light parser is needed because the memory cannot accommodate a full parser.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Introduction </SectionTitle> <Paragraph position="0"> We built a KWIC (keyword in context) verb shallow parser. It captures only verb predicates with their arguments, verb argument modifiers and verb adjuncts in a sentence. The resulting trees contain local and subjacent dependencies between these elements.</Paragraph> <Paragraph position="1"> The shallow parser depends on three levels of information processing: the verb list, subcategorization (in short, subcat) and syntactic rules. The verb subcat system is derived from Levin's taxonomy of verbs and their classes (Levin, 1993). We have 24 verb files containing 3200 verbs, which include all the Levin verbs and the most frequent verbs in our corpus. A verb is indexed to one or more subcat files and each file represents a particular alternation semantico-syntactic sense. We have 272 syntactic subcat files derived from the Levin verb semantic classes. The syntactic rules are marked for argument types and constituency, using the Penn Treebank tagset (Marcus, 1993). They contain both generalized rules, e.g., .../NN, and specified rules, e.g., purchase/VBP. An example subcat rule for the verb purchase looks like this: .../DT .../JJ .../NN, .../DT .../NN from/RP .../NN for/RP .../NN. The first element says that purchase takes an NP argument, and the second says that it takes an NP argument and two PP adjuncts.</Paragraph> <Paragraph position="2"> We also encoded specific PP head class information based on the WordNet concepts in the rules for some attachment disambiguation.</Paragraph> <Paragraph position="3"> The shallow parser works like this: it first tags an incoming sentence with Brill tagger (Brill, 1995) and matches verbs in the tagged sentence with the verb list. If a match is found, the parser will open the subcat files indexed to that verb and gather all the syntactic rules in these specific subcat files. It then matches the verb arguments with these syntactic rules and outputs the results into a tree. The parser can control over-generation for any verb because the syntactic structures are limited to that particular verb's syntactic structure set from the Levin classes.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Disadvantages of Shallow Parser </SectionTitle> <Paragraph position="0"> The disadvantages of the shallow parser are mainly due to its simplified design, including: 1. It cannot handle sentences whose main verb is be or phrasal sentences without a verb because the shallow parser mainly targets command-and-control verb argument structures.</Paragraph> <Paragraph position="1"> 2. It cannot handle structures that appear before the verb. Subjects will not appear in the parse tree even though it might contain important information.</Paragraph> <Paragraph position="2"> 3. It cannot detect sentence type, for example, whether a sentence is a question or a request.</Paragraph> <Paragraph position="3"> 4. It cannot handle negative or passive sentences.</Paragraph> <Paragraph position="4"> We tested the shallow parser on 500 sentences from our corpus and compared the results with the output of Minipar. We separated the sentences into five sets of 100 sentences. After running the parser on each set, we fixed the problems that we could identify. This was our process of training the parser. Table 1 shows the data obtained from one such cycle. Since the shallow parser cannot handle sentences with the main verb be, these sentences are excluded from the statistics. So the test set actually contains 85 sentences. In Table 1, the first column and the first row show the statistics for the shallow parser and Minipar respectively. The upper half of the table is for the unseen data, where 55.3% of the sentences are parsed correctly and 11.8% incorrectly (judged by humans) by both parsers. 18.9% of the sentences are parsed correctly by Minipar, but incorrectly by the shallow parser, and 14.1% vise versa. The lower half of the table shows the result after fixing some shallow parser problems, for example, adding a new syntactic rule. The accuracy of the parser is significantly improved, from 69.4% to 81.2%. This shows the importance of adaptation to specific domain needs, and that in our domain, the shallow parser outperforms Minipar.</Paragraph> <Paragraph position="5"> parser with Minipar on 85 sentences The parsers do not perform equally well on all sets of sentences. For some sets, the accuracies of Minipar and the shallow parser drop to 60.9% and 67.8% respectively.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Extending Minipar with the </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Shallow Parser </SectionTitle> <Paragraph position="0"> Each parser has pros and cons. The advantage of Minipar is that it is a broad-coverage parser with relatively high accuracy, and the advantage of the shallow parser is that it is adaptable. For this reason, we intend to use Minipar as our primary parser and the shallow parser a backup. Table 1 shows only a small percentage of sentences parsed incorrectly by both parsers (about 7%). If we always choose the correct tree between the two outputs, we will have a parser with much higher accuracy.</Paragraph> <Paragraph position="1"> Therefore, combining the advantages of the two parsers will achieve better performance in both coverage and accuracy. Now the question is how to decide if a tree is correct or not.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Detecting Parsing Errors </SectionTitle> <Paragraph position="0"> In an ideal situation, each parser should provide a confidence level for a tree that is comparable to each other. We would choose the tree with higher confidence. However, this is not possible in our case because weightings of the Minipar trees are not publicly available, and the shallow parser is a rule-based system without confidence information.</Paragraph> <Paragraph position="1"> Instead, we use a few simple heuristics to decide if a tree is right or wrong, based on an analysis of the trees generated for our test sentences. For example, given a sentence, the Minipar tree is incorrect if it has more than one subtree connected by a top-level node whose syntactic category is U (unknown). A shallow parser tree is wrong if there are unparsed words at the end of the sentence after the main verb (except for interjections). We have three heuristics identifying a wrong Minipar tree and two identifying a wrong shallow parser tree. If a tree passes these heuristics, we must label the tree as a good parse. This may not be true, but we will compensate for this simplification later. The module implementing these heuristics is called the error detector.</Paragraph> <Paragraph position="2"> We tested the three heuristics for Minipar trees on a combination of 84 requestive, interrogative and declarative sentences. The results are given in the upper part of Table 2.</Paragraph> <Paragraph position="3"> The table shows that 45 correct Minipar trees (judged by humans) are identified as correct by the error detector and 18 wrong trees are identified as wrong, so the accuracy is 75%.</Paragraph> <Paragraph position="4"> Tagging errors and some attachment errors cannot be detected.</Paragraph> <Paragraph position="5"> tree error detector We tested the two heuristics for shallow parser trees on 100 sentences from our corpus and the result is given in the lower part of not use the same set of sentences to test the two sets of heuristics because the coverage of the two parsers is different.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Choosing the Better Parse Trees </SectionTitle> <Paragraph position="0"> We run the two parsers in parallel to generate two parse trees for an input sentence, but we cannot depend only on the error detector to decide which tree to choose because it is not accurate enough. Table 2 shows that the error detector mistakenly judges some wrong trees as correct, but not the other way round. In other words, when the detector says a tree is wrong, we have high confidence that it is indeed wrong, but when it says a tree is correct, there is some chance that the tree is actually wrong. This motivates us to distinguish three cases: 1. When only one of the two parse trees is detected as wrong, we choose the correct tree, because no matter what the correct tree actually is, the other tree is definitely wrong so we cannot choose it.</Paragraph> <Paragraph position="1"> 2. When both trees are detected as wrong, we choose the Minipar tree because it handles more syntactic structures.</Paragraph> <Paragraph position="2"> 3. When both trees are detected as correct, we need more analysis because either might be wrong.</Paragraph> <Paragraph position="3"> We have mentioned in the previous sections the problems with both parsers. By comparing their pros and cons, we come up with heuristics for determining which tree is better for the third case above.</Paragraph> <Paragraph position="4"> The decision flow for selecting the better parse is given in Figure 1. Since the shallow parser cannot handle negative and passive sentences as well as sentences with the main verb be, we choose the Minipar trees for such sentences. The shallow parser outperforms Minipar on tagging and some PP attachment because it checks the WordNet concepts. So, when we detect differences concerning part-of-speech tags and PP attachment in the parse trees, we choose the shallow parser tree as the output. In addition, we prefer the parse with bigger NP chunks.</Paragraph> <Paragraph position="5"> We tested these heuristics on 200 sentences and the result is shown in Table 3. The first row specifies whether a Minipar tree or a shallow parser tree is chosen as the final output. The first column gives whether the final tree is correct or incorrect according to human judgment. 88% of the time, Minipar trees are chosen and they are 82.5% accurate. The overall contribution of Minipar to the accuracy is 73.5%. The improvement from just using Minipar is about 7%, from about 75.5% to 82.5%. This is a significant improvement.</Paragraph> <Paragraph position="6"> The main computational expense of running two parsers in parallel is time. Since our shallow parser has not been optimized, the extended parser is about 2.5 times slower than Minipar alone. We hope that with some optimization, the speed of the system will increase considerably. Even in the current time frame, it takes less than 0.6 second to parse a</Paragraph> </Section> </Section> class="xml-element"></Paper>