File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/p93-1006_metho.xml
Size: 15,699 bytes
Last Modified: 2025-10-06 14:13:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1006"> <Title>USING BRACKETED PARSES TO EVALUATE A GRAMMAR CHECKING APPLICATION</Title> <Section position="4" start_page="38" end_page="38" type="metho"> <SectionTitle> OVERVIEW OF THE APPLICATION </SectionTitle> <Paragraph position="0"> The Boeing Simplified English Checker (a.k.a.</Paragraph> <Paragraph position="1"> the BSEC, cf. Hoard, Wojcik, and Holzhauser 1992) is a type of grammar and style checker, but it is more accurately described as a 'controlled English checker' (cf. Adriaens 1992). That is, it reports to users on where a text fails to comply with the aerospace standard for maintenance documentation known as Simplified English (AECMA 1989). If the system cannot produce a parse, it prints the message &quot;Can't do SE check.&quot; At present, the Checker achieves parses for about 90 percent of the input strings submitted to it. 2 The accuracy of the error critiques over that 90 percent varies, but our subjective experience suggests that most sentence reports contain critiques that are useful in that they flag some bona fide failure to comply with Simplified English.</Paragraph> <Paragraph position="2"> The NLP methodology underlying the BSEC does not rely on the type of pattern matching techniques used to flag errors in more conventional checkers. It cannot afford simply to ignore sentences that are too complex to handle. As a controlled sublanguage, Simplified English requires 2. The 90 percent figure is based on random samplings taken from maintenance documents submitted to the BSEC over the past two years. This figure has remained relatively consistent for maintenance documentation, although it varies with other text domains.</Paragraph> <Paragraph position="3"> that every word conform to specified usage. That is, each word must be marked as 'allowed' in the lexicon, or it will trigger an error critique. Since the standard generally requires that words be used in only one part of speech, the BSEC produces a parse tree on which to judge vocabulary usage as well as other types of grammatical violations) As one would expect, the BSEC often has to choose between quite a few alternative parse trees, sometimes even hundreds or thousands of them. Given its reliance on full-ambiguity parse forests and relatively little semantic analysis, we have been somewhat surprised that it works as well as it does.</Paragraph> <Paragraph position="4"> We know of few grammar and style checkers that rely on the complexity of grammatical analysis that the BSEC does, but IBM's Critique is certainly one of the best known. In discussing the accuracy of Critique, Richardson and Braden-Harder (1993:86) define it as &quot;the actual 'under the covers' natural language processing involved, and the user's perception.&quot; In other words, there are really two levels upon which to gauge accuracy--that of the internal parser and that of the reports generated. They add: &quot;Given the state of the art, we may consider it a blessing that it is possible for the latter to be somewhat better than the former.&quot; The BSEC, like Critique, appears to be smarter than it really is at guessing what the writer had in mind for a sentence structure. Most error critiques are not affected by incorrect phrasal attachment, although grossly incorrect parses lie behind most sentence reports that go sour. What we have not fully understood in the past is the extent to which parsing accuracy affects error critiques. What if we could eliminate all the bad parses? Would that make our system more accurate by reducing incorrect critiques, or would it degrade performance by reducing the overall number of correct critiques reported? We knew that the system was capable of producing good error reports from relatively bad parses, but how many of those error reports even had a reasonably correct parse available to them?</Paragraph> </Section> <Section position="5" start_page="38" end_page="39" type="metho"> <SectionTitle> 3. The Simplified English (SE) standard allows </SectionTitle> <Paragraph position="0"> some exceptions to the 'single part of speech' rule in its core vocabulary of about a thousand words.</Paragraph> <Paragraph position="1"> The BSEC currently does little to guarantee that writers have used a word in the 'Simplified English' meaning, only that they have selected the correct part of speech.</Paragraph> </Section> <Section position="6" start_page="39" end_page="39" type="metho"> <SectionTitle> OVERVIEW OF SIMPLIFIED ENGLISH </SectionTitle> <Paragraph position="0"> The SE standard consists of a set of grammar, style, format, and vocabulary restrictions, not all of which lend themselves to computational analysis. A computer program cannot yet support those aspects of the standard that require deep understanding, e.g. the stricture against using a word in any sense other than the approved one, or the requirement to begin paragraphs with the topic sentence. What a program can do is count the number of words in sentences and compound nouns, detect violations of parts of speech, flag the omission of required words (such as articles) orthe presence of banned words (such as auxiliary have and be, etc.).</Paragraph> <Paragraph position="1"> The overall function of such a program is to present the writer with an independent check on a fair range of Simplified English requirements. For further details on Simplified English and the BSEC, see Hoard et al. (1992) and Wojcik et al.</Paragraph> <Paragraph position="2"> (1990).</Paragraph> <Paragraph position="3"> Although the BSEC detects a wide variety of Simplified English and general writing violations, only the error categories in Table 1 are relevant to this study: Except for illegal comma usage, which is rather uncommon, the above errors are among the most frequent types of errors detected by the BSEC.</Paragraph> <Paragraph position="4"> To date, The Boeing Company is the only aerospace manufacturer to produce a program that detects such a wide range of Simplified English violations. In the past, Boeing and other companies have created checkers that report on all words that are potential violations of SE, but such 'word checkers' have no way of avoiding critiques for word usage that is correct. For example, if the word test is used legally as a noun, the word-checking program will still flag the word as a potential verb-usage error. The BSEC is the only Simplified English checker in existence that manages to avoid this. a As Richardson and Braden-Harder (p. 88) pointed out: &quot;We have found...that professionals seem much more forgiving of wrong critiques, as 4. Oracle's recently released CoAuthor product, which is designed to be used with the Interleaf word processor, has the potential to produce grammatical analyses of sentences, but it only works as a Simplified English word checker at present.</Paragraph> <Paragraph position="5"> long as the time required to disregard them is minimal.&quot; In fact, the chief complaint of Boeing technical writers who use the BSEC is when it produces too many nuisance errors. So word-checking programs, while inexpensive and easy to produce, do not address the needs of Simplified English writers.</Paragraph> <Paragraph position="6"> POS A known word is used in incorrect part of speech.</Paragraph> <Paragraph position="7"> NON-SE An unapproved word is used.</Paragraph> <Paragraph position="8"> MISSING Articles must be used wherev-ARTICLE er possible in SE.</Paragraph> <Paragraph position="9"> PASSIVE Passives are usually illegal.</Paragraph> </Section> <Section position="7" start_page="39" end_page="39" type="metho"> <SectionTitle> TWO- COMMAND </SectionTitle> <Paragraph position="0"> Commands may not be conjoined when they represent sequential activities. Simultaneous commands may be coni joined.</Paragraph> <Paragraph position="1"> ING Progressive participles may not be used in SE.</Paragraph> <Paragraph position="2"> appear in a special format. Usually, an error arises when a declarative sentence has been used where an imperative one is required.</Paragraph> </Section> <Section position="8" start_page="39" end_page="40" type="metho"> <SectionTitle> THE PARSER UNDERLYING THE BSEC </SectionTitle> <Paragraph position="0"> The parser underlying the Checker (cf. Harrison 1988) is loosely based on GPSG. The grammar contains over 350 rules, and it has been implemented in Lucid Common Lisp running on Sun workstations. 5 Our approach to error critiquing differs from that used by Critique (Jensen, Heidorn, Miller, and Ravin 1993). Critique uses a two-pass approach that assigns an initial canonical parse in so-called 'Chomsky-normal' form.</Paragraph> <Paragraph position="1"> The second pass produces an altered tree that is an- null notated for style violations. No-parses cause the system to attempt a 'fitted parse', as a means of producing some information on more serious grammar violations. As mentioned earlier, the BSEC generates parse forests that represent all possible ambiguities vis-a-vis the grammar.</Paragraph> <Paragraph position="2"> There is no 'canonical' parse, nor have we yet implemented a 'fitted parse' strategy to reclaim information available in no-parses. 6 Our problem has been the classic one of selecting the best parse from a number of alternatives. Before the SE Checker was implemented, Boeing's parser had been designed to arrive at a preferred or 'fronted' parse tree by weighting grammatical rules and word entries according to whether we deemed them more or less desirable. This strategy is quite similar to the one described in Heidorn 1993 and other works that he cites. In the maintenance manual domain, we simply observed the behavior of the BSEC over many sentences and adjusted the weights of rules and words as needed.</Paragraph> <Paragraph position="3"> To get a better idea of how our approach to fronting works, consider the ambiguity in the following two sentences: (1) The door was closed.</Paragraph> <Paragraph position="4"> (2) The damage was repaired.</Paragraph> <Paragraph position="5"> In the Simplified English domain, it is more likely that (2) will be an example of passive usage, thus calling for an error report. To parse (1) as a passive would likely be incorrect in most cases. We therefore assigned the adjective reading of closed a low weight in order to prefer an adjectival over a verb reading. Sentence (2) reports a likely event rather than a state, and we therefore weight repaired to be preferred as a passive verb. Although this method for selecting fronted parse trees sometimes leads to false error critiques, it works well for most cases in our domain.</Paragraph> </Section> <Section position="9" start_page="40" end_page="41" type="metho"> <SectionTitle> BRACKETED INPUT STRINGS </SectionTitle> <Paragraph position="0"> In order to coerce our system into accepting only the desired parse tree, we modified it to accept only parses that satisfied bracketed forms.</Paragraph> <Paragraph position="1"> 6. The BSEC has the capability to report on potential word usage violations in no-parses, but the end-users seem to prefer not to use it. It is often difficult to say whether information will be viewed as help or as clutter in error reports.</Paragraph> <Paragraph position="2"> For example, the following sentence produces five separate parses because our grammar attaches prepositional phrases to preceding noun phrases and verb phrases in several ways. The structural ambiguity corresponds to five different interpretations, depending on whether the boy uses a telescope, the hill has a telescope on it, the girl on the hill has a telescope, and so on.</Paragraph> <Paragraph position="3"> (3) The boy saw the girl on the hill with a telescope.</Paragraph> <Paragraph position="4"> We created a lisp operation called spe, for &quot;string, parse, and evaluate,&quot; which takes an input string and a template. It returns all possible parse trees that fit the template. Here is an example of an spe form for (3): (SPE 'q'he boy saw the girl on the hill with a</Paragraph> <Paragraph position="6"> The above bracketing restricts the parses to just the parse tree that corresponds to the sense in which the boy saw the girl who is identified as being on the hill that has a telescope. If run through the BSEC, this tree will produce an error message that is identical to the unbracketed report--viz.</Paragraph> <Paragraph position="7"> that boy, girl, hill, and telescope are NON-SE words. In this case, it does not matter which tree is fronted. As with many sentences checked, the inherent ambiguity in the input string does not affect the error critique.</Paragraph> <Paragraph position="8"> Recall that some types of ambiguity do affect the error reports----e.g, passive vs. adjectival participial forms. Here is how the spe operation was used to disambiguate a sentence from our data: (SPE &quot;Cracks in the impeller blades are not permitted&quot; (S (NP Cracks in the impeller blades) (VP are not (A permitted)))) We judged the word permitted to have roughly the same meaning as stative 'permissible' here, and that led us to coerce an adjectival reading in the bracketed input. If the unbracketed input had resuited in the verb reading, then it would have flagged the sentence as an illegal passive. It turned out that the BSEC selected the adjective reading in the unbracketed sentence, and there was no difference between the bracketed and unbracketed error critiques in this instance.</Paragraph> </Section> <Section position="10" start_page="41" end_page="41" type="metho"> <SectionTitle> METHODOLOGY </SectionTitle> <Paragraph position="0"> We followed this procedure in gathering and analyzing our data: First, we collected a set of data from nightly BSEC batch runs extending over a three month period from August through October 1991. The data set consisted of approximately 20,000 sentences from 183 documents. Not all of the documents were intended to be in Simplified English when they were originally written. We wrote a shell program to extract a percentage-stratified sample from this data. After extracting a test set, we ended up culling the data for duplicates, tables, and other spurious data that had made it past our initial filter. 7 We ended up with 297 sentences in our data set.</Paragraph> <Paragraph position="1"> We submitted the 297 sentences to the current system and obtained an error report, which we call the unbracketed report. We then created spe forms for each sentence. By observing the parse trees with our graphical interface, we verified that the parse tree we wanted was the one produced by the spe operation. For 49 sentences, our system could not produce the desired tree. We ran the current system, using the bracketed sentences to produce the unmodified bracketed report. Next we examined the 24 sentences which did not have parses satisfying their bracketings but did, nevertheless, have parses in the unbracketed report. We added the lexical information and new grammar rules needed to enable the system to parse these sentences. Running the resulting system produced the modified bracketed report. These new parses produced critiques that we used to evaluate the critiques previously produced from the unbracketed corpus. The comparison of the unbracketed report and the modified bracketed report produced the estimates of Precision and Recall for this sample.</Paragraph> <Paragraph position="2"> '7. The BSEC falters out tables and certain other types of input, but the success rate varies with the type of text.</Paragraph> </Section> class="xml-element"></Paper>