File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1029_intro.xml
Size: 5,240 bytes
Last Modified: 2025-10-06 14:00:40
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1029"> <Title>A Tool for Automated Revision of Grammars for NLP Systems</Title> <Section position="2" start_page="0" end_page="210" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Natural language processing systems often constrain the set of &quot;utterances&quot; from a user (spoken, typed in, etc.) to narrow down the possible syntactic and semantic resolutions of the utterance and reduce the number of misrecognitions and/or misunderstandings by the system. Such constraints on the allowed syntax and the inferred semantics are often expressed in the form of a &quot;grammar &quot;l, a set of Throughout this document, by using the word &quot;grammar&quot;, we refer to a Context-Free Grammar that consists of a finite set of non-terminals, a finite set of terminals, a unique non-terminal called the start symbol, and a set of production rules of the form A-> a, where A is a non-terminal and a is a string of terminal or non-terminal symbols. The 'language' rules specifying the set of allowed utterances and possibly also specifying the semantics associated with these utterances. For instance, grammars are commonly used in speech understanding systems to specify both the set of allowed sentences and to specify &quot;tags&quot; to extract semantic entities (e.g. the &quot;amount&quot; of money).</Paragraph> <Paragraph position="1"> Constraining the number of sentences accepted by a grammar is essential for reducing misinterpretations of user queries by an NLP system. For instance, for speech understanding systems, if the grammar accepts a large number of sentences, then the likelihood of recognizing uttered sentences as random, irrelevant, or undesirable sentences is increased. For transaction processing systems, misrecognized words can lead to unintended transactions being processed. An effective constraining grammar can reduce transactional errors by limiting the number of sentence level errors. The problem of over-generalization of speech grammars and related issues is well discussed by Seneff (1992). Thus, speech grammars must often balance the conflicting requirements of * accepting a wide variety of sentences to increase flexibility, and * accepting a small number of sentences to increase system accuracy and robustness.</Paragraph> <Paragraph position="2"> Developing tight grammars which trade-off these conflicting constraints is a tedious and accepted by a grammar is the set of all terminal strings that can be generated from the start symbol by successive application of the production rules. The grammar may optionally have semantic interpretation rules associated with each production rule (e.g. see (Allen 95)).</Paragraph> <Paragraph position="3"> difficult process. Typically, grammars overgeneralize and accept too many sentences that are irrelevant or undesirable for a given application. We call such sentences &quot;counterexamples&quot;. The problem is usually handled by revising the grammar manually to disallow such counter-examples. For instance, the sentence &quot;give me my last eighteen transactions&quot; may need to be excluded from a grammar for a speech understanding system, since the words &quot;eighteen&quot; and &quot;ATM&quot; are easily confused by the speech recogniser. However, &quot;five&quot; and &quot;ten&quot; should remain as possible modifiers of &quot;transactions&quot;. Counter-examples can also be sets of sentences that need to be excluded from a grammar (specified by allowing the inclusion of non-terminals in counter-examples). For example, for a banking application that disallows money transfers to online accounts, we might wish to exclude the set of sentences &quot;transfer <AMOUNT> dollars to my online account&quot; from the grammar, where <AMOUNT> is a non-terminal in the grammar that maps to all possible ways of specifying amounts.</Paragraph> <Paragraph position="4"> In this paper, we are proposing techniques for automatically revising grammars using counterexamples. The grammar developer identifies counter-examples from among sentences (or sets of sentences) mis-recognized by the speech recognizer or from sentences randomly generated by a sentence generator using the the original grammar to invalidate the counterexamples. The revised grammar can be fed back to the grammar reviser and whole process can be iterated several times until the resulting grammar is deemed satisfactory.</Paragraph> <Paragraph position="5"> In the next sections, we first describe our algorithm for revising grammars to disallow counter-examples. We also discuss algorithms to make the revised grammar compact using minimum description length (MDL) based grammar compaction techniques and extensions to our basic algorithm to handle grammars with recursion. We then present some results of applying our grammar reviser tool to constrain speech grammars of speech understanding systems. Finally, we present an approach for revising attribute value grammars using our technique and present our conclusions.</Paragraph> </Section> class="xml-element"></Paper>