File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/c90-2063_abstr.xml
Size: 12,710 bytes
Last Modified: 2025-10-06 13:46:52
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2063"> <Title>Parsing for Grammar and Style Checking</Title> <Section position="1" start_page="0" end_page="367" type="abstr"> <SectionTitle> 1. Abstract </SectionTitle> <Paragraph position="0"> The following paper describes some basic problems which have to be tackled if a morphosyntactic parser is to be configured in a grammm&quot; and style checking environment.</Paragraph> <Paragraph position="1"> Whereas grammar checking has to deal with ill-formed input which by definition is outside the scope of a grammar, style checking has problems in grammar coverage and intentionality of style.</Paragraph> <Paragraph position="2"> To overcome these problems, a method is presented based on the METAL grammar formalism which allows for fallback rules, levelling and scoring mechanisms, and other features which can be used. It will be de~ scribed what kinds of information and processing are needed to implement such checkers.</Paragraph> <Paragraph position="3"> Finally, some examples are given which illustrate the mode of operation of the method described.</Paragraph> <Paragraph position="4"> 2. Tile problem domain There is a fundamental difference between grammar and style checking: Grammar checking tries to find ill-formedness which by definition is considered to be a mistake and MUST be corrected; style checking has to do with well-formed but somehow marked text. As a result, style checking has to be much more &quot;liberal&quot; as it has to do with &quot;deviations&quot; which might have been intended by the author, but CAN be corrected. This results in two different sets of requirements for a pars~ el'.</Paragraph> <Paragraph position="5"> Concerning a grammar checker, its task is outside the scope of a grammar by definition: A grammar tries to describe (only and exactly) the grammatical structures of a language. Ew ery ungrammatical sentence should cause a parse failure.</Paragraph> <Paragraph position="6"> Moreover, to detect a grammar error, the parser has to successfully pm'se a given sentence. In order to parse it, however, information must be used which could have been violated. E.g. in (1) (example from German), agreement is the only way to decide which NP is subject (namely the second) and which is object; (2) is ambiguous as both NPs are plural: 1. Die Tiger t6tet der Mann (the tigers kills the man) 2. Die Tiger t6ten die M~inner (the tigers kill the men) If agreement is violated it is hard to find out what the subject should be; and therefore it is hard to detect that agreement is violated. ~Iqae &quot;circulus vitiosus&quot; is that the parser should detect errors the correct interpretation of which is needed to obtain an overall parse on the basis of which the error can be detected. There is an additional problem with grammar checking: If the grammar becomes more corn- null plex, several competing parses for a given sentence might be found. Diagnosis then depends on what parse has been chosen. The application (checking of larger texts) does not allow for asking the user which interpretation to pick; the parser has to find the &quot;best path&quot; and interpret it. This might lead to the result that sentences are flagged which are correct (from the user's point of view) but did not result in the &quot;best path&quot; parse. E.g. if a PP can be argument of a noun as well as a verb, different flags might be set depending on which reading &quot;wins&quot;.</Paragraph> <Paragraph position="7"> Style checking has a different set of problems to solve. First it has to be found out, what &quot;style&quot; is, i.e. what has to be checked. The present paper will not contribute to this debate; we take as input guidelines which are used in the process of technical writing and in the production of technical documents (of. Schmitt 89).</Paragraph> <Paragraph position="8"> These guidelines have to be &quot;translated&quot; into a operational form; e.g. what should be checked if the user is asked not to write &quot;too complex&quot; sentences? In 4. below, some examples of phenomena are given which should be marked.</Paragraph> <Paragraph position="9"> As style is a kind of producing non-standard structures (i.e. structures which are not covered by standard grammars), we need a powerful parser and a grammar with large coverage to interpret style phenomena; i.e.</Paragraph> <Paragraph position="10"> the linguistic structures which have to be interpreted for style phenomena can and will be w;ry complex. Also, the risk of parse failure will increase, and we need a kind of &quot;post mortem&quot; diagnosis for cases which could not be handled. We need a parser which allows for that.</Paragraph> <Paragraph position="11"> As far as diagnosis is concerned, the checker should be cautious and formulate questions rather than correct things, as a stylistic variant could be intended by the text author. It also should not mark too many things; e.g. if the rule is &quot;avoid passives&quot; it should certainly not flag every passive sentence. I.e. the diagnostics require practical tuning to be really useful.</Paragraph> <Paragraph position="12"> 3. Properties of a parser for style and grammar checking purposes</Paragraph> <Section position="1" start_page="365" end_page="366" type="sub_section"> <SectionTitle> 3.1 Grammar checking </SectionTitle> <Paragraph position="0"> A parser for grammar checking should have the following features: It should be able to allow for the analysis of parse failures. Compared to an ATN (cf. Weischedel 1982), where a failure ends with the starting state, a chart keeps all the intermediate results and is well suited for diagnosticsdeg However, diagnostics follow specific information: The diagnosis must know &quot;what to look for&quot; (e.g. wrong agreement, wrong punctuation etc.). It therefore will cover only a part of the potential grammar errors.</Paragraph> <Paragraph position="1"> Such a &quot;two step approach&quot; has been implemented in the CRITIQUE system (of. Ravin 1988), where a parse failure is more closely looked at. However, one could think of special &quot;fallback rules&quot; which implement these diagnostics already in the grammar. This means to enlarge the coverage of the grammar for explicitly ungrammatical structures which during parsing could be marked as ungramrrmtical.</Paragraph> <Paragraph position="2"> This would be just a different kind of representing the diagnosis knowledge but it would be computationally more effective as it: could be integrated into the parse itself, leading to a &quot;one step approach&quot;.</Paragraph> <Paragraph position="3"> In this approach, we do not want the fallback rules to fire except if all other rules failed; i.e. we have to avoid that rules which build grammatical structures are not selected, but rules which are meant as fallback rules fire in &quot;regular&quot; parses. Therefore we must be able to build SETs of rules which can be controlled by the grammar writer. We then can fire the sets which build grammatical structures first, and the fallback rules later on. Then we only need to mark the nodes built by the fallback rules with a flag indicating that there was a fall-back rule (and of what kind it was).</Paragraph> <Paragraph position="4"> Moreover, as only the &quot;best first&quot; strategy can be applied in this application area, we must be able to tune the parser in such a way that the most plausible reading comes out first. This can be done by a proper scoring mechanism which should be accessible to the grammar writer. This cannot always avoid that the &quot;intended&quot; parse differs from the &quot;best&quot; one, but it at least makes the parsing process more stable and independent of system-internal determinants (like rule order, parsing strategy etc.) Finally, we must be able to change the error detected by local operations. These operat:ions consist in changing, adding or deleting feature-value pairs or nodes etc. The alternat:ives here are: Overwrite the respective piece of information by the correct one and re-generate the whole morphosyntactic surface structure; or exchange just a partial structure.</Paragraph> <Paragraph position="5"> This will depend on the kind of error detected. null Instead of discussing what style might be, we concentrate on &quot;bad style&quot; phenomena mentioned in texts on technical writing (cf.</Paragraph> <Paragraph position="6"> Schmitt 89). Examples of bad style are: etc. (these are, of course, language specific) These criteria have to be reformulated in forreal terms of linguistic descriptions, e.g.</Paragraph> <Paragraph position="7"> complexity of sentences could be: (e.g. subclauses) etc.</Paragraph> <Paragraph position="8"> These formal specifications then have to be used in the diagnosis part.</Paragraph> <Paragraph position="9"> Here again we have the choice between a &quot;two step&quot; approach which first parses and then does diagnostics, or a &quot;one step&quot; approach which does everything during parsing. We could do diagnosis on partial structures and mark the nodes which have been built. If these nodes are used by the parser to build higher non-terminal nodes, the flags are valid; if the nodes are rejected by the parser they are just ignored.</Paragraph> <Paragraph position="10"> As using bad style does not lead to ungrammatical sentences, we should not need additional grammar rules for style checking. But what we need is a set of flags which are attached to the nodes in question as soon as some diagnosis succeeds. This could be an additional feature set which is set on top of the features used in the regular grammar. It is used to INTERPRET the rules which have fired according to stylistic criteria.</Paragraph> <Paragraph position="11"> These features have to be kept local to allow for error localization: If the user is told &quot;too complex word&quot; then the system should be able to localize this word in the tree. On the other hand, we need some global information as well which is related either to a sentence as a whole or even the whole text. (If we want we can even compute overall stylistic scores out of them as soon as we know what that means).</Paragraph> <Paragraph position="12"> They also should be able to be easily added or removed from the grammar, i.e. should be kept as an independent module which simply is not added if the grammar is used for other purposes. Therefore, we need flexible feature maintenance possibilities.</Paragraph> </Section> <Section position="2" start_page="366" end_page="367" type="sub_section"> <SectionTitle> 3.3 The METAL grammar as basic </SectionTitle> <Paragraph position="0"> tool Although originally developed for machine translation, the METAL system can fulfill all the requirements mentioned above: * it is language independent, i.e. it has a common software kernel which interprets the different language knowledge sources. It also takes care of problems like separation of text and layout information in a given text, treatment of editor specific information, etc.</Paragraph> <Paragraph position="1"> oil uses an active chart as control structure and does some parse failure diagnosis already (for MT purposes), and it stores those tests which did not succeed and prevented a rule from firing to enable later diagnosis deg it has large grammars and lexica for several languages; therefore considerable coverage is available. Also, some fall-back rules already exist. Moreover, the rule structure is such that the analysis parts can easily be separated from the translation parts and enriched by other purpose components (like grammar ~md style checking) (cf. Thurmair 1990) deg it has a special levelling and preferencing mechanism which allows to group rules into levels and use these levels together with explicit scores for good or bad partial parses to control the overall behavior of the parser according to linguistic needs * it treats nodes as complex bundles of teatures and values; and it allows for easy feature manipulation (e.g. percolating, unifying, adding etc.) using a set of grammar operators * it does not only allow for simple tests (e.g. presence of a feature) but also for complex tests, e.g. on structural descriptions of tree structures * it has to be modified, however, by ad-Cling a component which at the end of a parse collects the grammatical and stylistic flags and evaluates them if necessary null</Paragraph> </Section> </Section> class="xml-element"></Paper>