File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1007_metho.xml

Size: 4,422 bytes

Last Modified: 2025-10-06 14:11:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-1007">
  <Title>A METRIC SPACE DEFINED ON ENGLISH AND ITS RELATION TO ERROR CORRECTION</Title>
  <Section position="1" start_page="0" end_page="43" type="metho">
    <SectionTitle>
A METRIC SPACE DEFINED ON ENGLISH
AND ITS RELATION TO ERROR CORRECTION
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="43" type="sub_section">
      <SectionTitle>
Canada
</SectionTitle>
      <Paragraph position="0"> A distance function is proposed that maps pairs of strings to the real numbers. It has been shown that given suitable constraints the function is a metric over the free monoid generated from a set of gr~um~ matical symbols. The necessary constraints modify the metric so that it maps pairs of strings to a lattice of real numbers. Thus for each string the metric defines a countable set of nested neighbourhoods. This aspect of the space has proved useful for the correction of certain kinds of grammatical errors that occur in English sentences. An English parser was written that used the metric to propose corrections to a variety of ungrammatical sentences.</Paragraph>
      <Paragraph position="1"> Experience with the program suggests that in many cases the intuitive notion of grammatical similarity corresponds closely to the mathematical definition of nearest neighbour in the space.</Paragraph>
      <Paragraph position="2"> I. Introduction Consider a string of grammatical symbols which has been produced by lexical analysis. Each symbol in the string corresponds to a word in the original sentence. The string will be analysed by a parser which compares the sequence of symbols to sequences specified by some grammar, G. If the comparison succeeds then the original sentence is accepted as grammatical. L Otherwise, it is rejected and error correction is required.</Paragraph>
      <Paragraph position="3"> Definition: Given a grammar G and a string S composed of grammatical symbols from some alphabet A then S is ungrammatical if it is not contained in L(G), the language generated by G.</Paragraph>
      <Paragraph position="4"> Ungrammatical in this sense refers to any sentence that was not anticipated by the grammar. In many systems it is possible for a user to produce a proper English sentence within the appropriate domain of discourse and still have the sentence rejected by the parser. This is usually attributed to &amp;quot;holes in the grammar.&amp;quot; This paper will describe a technique for correcting ungrammatical input. The class of errors treated includes both genuine gramma~ tical errors and those resulting from &amp;quot;holes.&amp;quot; One of the assumptions tested by this work is that a significant class of errors can be resolved by examination of syntactic structure alone.</Paragraph>
      <Paragraph position="5">  An ungrammatical sentence is viewed as a grammatical sentence that has been transformed by one or more error operations.</Paragraph>
      <Paragraph position="6"> Definition: An error operation involves either (a) an insertion of a word, (91 a de~I-on of a word, or (c) an alteration of the word sequence.</Paragraph>
      <Paragraph position="7"> In general, the damage done by a single error operatPSon is local and does not significantly alter the gloSal structure.</Paragraph>
      <Paragraph position="8"> Thus a comparison of the respective structures of the two sentences is used as the bases for a measure of theZr similarity. This approach is based on earlier work by Fischer and Wagner. ~ The error correction strategy rests on a measure which expresses structural similarity as a numerical distance. If the parser~s analysis of a of a given sentence fa~is then a search is made for its nearest grammatical neighbour. As various alternatives are found they are presented to the user. The user may elect to continue the search, accept the corrector~s proposal or abandon the search and rephrase the input.</Paragraph>
      <Paragraph position="9"> The class of errors that can be corrected by a measure of structural similarity are those related to word arrangement. Word arrangement is described by an augmented transition network in which the conditions on the arcs are totally relaxed. Such a net is called a recursive transition network and it defines a context free language~ Thus the class of errors treated by this technique are called context free errors.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML