File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/90/p90-1031_evalu.xml
Size: 3,042 bytes
Last Modified: 2025-10-06 14:00:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P90-1031"> <Title>PARSING THE LOB CORPUS</Title> <Section position="10" start_page="248" end_page="250" type="evalu"> <SectionTitle> RESULTS </SectionTitle> <Paragraph position="0"> We have used the parser, both with and without the lexical disambiguator, to analyze large portions of the LOB corpus. Our grammar is small; the three primary tables have a total of 134 actions, and the transducer functions are restricted to (outside of building tree structure) projecting categories from daughter phrases upward, checking agreement and case, and dealing with verb subcategorization features. Verb subcategorization information is obtained from the Oxford Advanced Learner's Dictionary of Contemporary English (Hornby et al 1973), which often includes unusual verb aspects, and consequently the parser tends to accept too many verb arguments.</Paragraph> <Paragraph position="1"> The parser identifies phrase boundaries surprisingly well, and usually builds structures up to the point of major sentence breaks such as commas or conjunctions. Disambiguation failure is almost nonexistent. At the end of this paper is a sequence of parses of sentences from the corpus. The parses illustrate the need for a better subcategorization system and some method for dealing with conjunctions and parentheticals, which tend to break up sentences.</Paragraph> <Paragraph position="2"> Figure 5 presents some plots of parser speed on a random 624 sentence subset of the LOB, and compares parser performance with and without lowering, and with and without disambiguation. Graphs 1 and 2 (2 is a zoom of 1) illustrate the speed of the parser, and Graph 3 plots the number of phrases the parser returns for a sentence of a given length, which is a measure of how much coverage the grammar has and how much the parser accomplishes. Graph 4 plots the number of phrases the parser builds during an entire parse, a good measure of the work it performs. Not surprisingly, there is a very smooth curve relating the number of phrases built and parse time. Graphs 5 and 6 are included to show the necessity of disambiguation and lowering, and indicate a substantial reduction in speed if either is absent. There is also a substantial reduction in accuracy. In the no disambiguation case, the parser is passed all cate- null subset of LOB. See text for explanations.</Paragraph> <Paragraph position="3"> gories every word can take, in random order.</Paragraph> <Paragraph position="4"> Parser accuracy is a difficult statistic to measure. We have carefully analyzed the parses * assigned to many hundreds of LOB sentences, and are quite pleased with the results. A1though there are many sentences where the parser is unable to build substantial structure, it rarely builds incorrect phrases. A pointed exception is the propensity for verbs to take too many arguments. To get a feel for the parser's ac- null curacy, examine the Appendix, which contains unedited parses from the LOB.</Paragraph> </Section> class="xml-element"></Paper>