File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2142_intro.xml
Size: 5,066 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2142"> <Title>Integrated Control of Chart Items for Error Repair</Title> <Section position="3" start_page="864" end_page="865" type="intro"> <SectionTitle> 2 Experimental Results </SectionTitle> <Paragraph position="0"> The test data included syntactic errors introduced by substitution of an unknown or known word, addition of an unknown or known word, deletion of a word, segmentation and punctuation problems, and semantic errors. Data sets we used are identified as: NED (a mix of errors from Novels, Electronic mail, and an (electronic) Diary); Applingl, and Peters2 (the Birkbeck data from Oxford Text Archive (Mitton, 1987)); and Thesprev.</Paragraph> <Paragraph position="1"> Thesprev was a scanned version of an anonymous humorous article titled &quot;Thesis Prevention: Advice to PhD Supervisors: The</Paragraph> <Section position="1" start_page="864" end_page="865" type="sub_section"> <SectionTitle> Siblings of Perpetual Prototyping&quot;. </SectionTitle> <Paragraph position="0"> In all, 258 ill-formed sentences were tested: 153 from the NED data, 13 from Thesprev, 74 from Applingl, and 18 from Peters2. The syntactic grammar covered 166 (64.3%) of the manually corrected versions of the 258 sentences. The average parsing time was 3.2 seconds. Syntactic processing produced on average 1.7 parse trees 2, of which 0.4 syntactic parse trees were filtered out by semantic processing. Semantic processing produced 9.3 concepts on average per S node, and 7.3 of them on average were ill-formed. So many were produced because CHAPTER generated a semantic concept whether it was semantically ill-formed or not, to assist with the repair of ill-formed sentences (Fass and Wilks, 1983).</Paragraph> <Paragraph position="1"> Across the 4 data sets, about one-third of the (manually-corrected) sentences were outside the coverage of the grammar and lexicon. The most common reasons were that the sentences included a conjunction (&quot;He places them face down so that they are a surprise&quot;), a phrasal verb CI called out to Fred and went inside&quot;), or a compound noun (&quot;P C complex sentences in NED were split into simple sentences to collect 13 more ill-formed sentences for testing.</Paragraph> <Paragraph position="2"> 2There are so few parse trees because of the use of subcategorisation and the augmented context-free grammar (the number of parse trees ranges from 1 to 7).</Paragraph> <Paragraph position="3"> Table 1 shows that 89.9% of these ill-formed sentences were repaired. Among these, CHAPTER ranked the correct repair first or second in 79.3% of cases (see 'best repair' column in Table 1). The ranking was based on penalty schemes at three levels: lexical, syntactic, and semantic. If the correct repair was ranked lower than second among the repairs suggested, then it is counted under 'other repairs' in Table 1. In the case of the NED data, the 'other repairs' include 11 cases of incorrect repairs introduced by: segmentation errors, apostrophe errors, semantic errors, and phrasal verbs. Thus for about 71% of all ill-formed sentences tested, the correct repair ranked first or second among the repairs suggested. For 19% of the sentences tested, incorrect repairs were ranked as the best repairs. A sentence was considered to be &quot;correctly repaired&quot; if any of the suggested corrections was the same as the one obtained by manual correction Table 2 shows further statistics on CHAPTER's performance. CHAPTER took 18.8 seconds on average 3 to repair an ill-formed sentence; suggested an average of 6.4 repaired parse trees; an average of 3 repairs were filtered out by semantic processing.</Paragraph> <Paragraph position="4"> During semantic processing, an average of 40.3 semantic concepts were suggested for each S node. An average 34.3 concepts per S node were classified as ill-formed. Twenty seven percent of the 'best' parse trees suggested by CHAPTER's ranking strategy at the syntactic level were filtered out by semantic processing. The remaining 73% of the 'best' parse trees were judged semantically wellformed. null In the case of the NED data set, 90 ill-formed sentences were repaired. On average: recovery time per sentence was 23.9 seconds; 9.8 repaired S trees per sentence were produced; 4.5 of the 9.8 repaired S trees were semantically well-formed; 95.1 repaired concepts (ill-formed and well-formed) were produced; 8.5 of 95.1 repaired concepts were well-formed; and semantic processing filtered syntactically best repairs, removing 22% of repaired sentences. The number of repaired concepts for S is very large because semantic processing at present supports interpretation of only a single verbal (or verb phrasal) adjuncts. For example, the template of the verb GO allows either a temporal or destination adjunct at present and ignores any second or later adjunct. Thus a GO sentence would be interpreted using both \[THING GO DEST\] and \[THING GO TIME\].</Paragraph> </Section> </Section> class="xml-element"></Paper>