File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/69/c69-0701_metho.xml

Size: 6,357 bytes

Last Modified: 2025-10-06 14:11:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C69-0701">
  <Title>Automatic error-correction in natural languages</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FJVWMBPHIOQUEARLNXGSCKTDYZ
</SectionTitle>
    <Paragraph position="0"> Next, through lack of a proper dictionary, the general-content check procedure was used to compile lists of words occurring in selected stretches of English (parts of three articles on physics, linguistics and secio-politics, containing about 3,000 text words in ~l).</Paragraph>
    <Paragraph position="1"> This limit has been accepted.</Paragraph>
    <Paragraph position="2">  -6-Several hundred distorted words (based on words in the s~ne articles) were matched against these vocabularies. After all the corrections and adjustments, the need for which naturally occurred during the tests, have been made, the final results can be summarized as follows:  (i) The retrievals were both exact and complete, in the sense that no misspelled words (within the proper error limit) were left unretrieved and no wrong retrievals were produced; (ii) The number of multiple equivalents increased rapidly as  the lower limit of the number of letters (four) in a word was approached (in some cases up to five equivalents); (iii) The number of multiple equivalents was generally insignificant for 'content' words (in most cases only one word was retrieved), whereas 'function' words often produced many equivalents, e.g. WTH~'--,THEY, OTHER, THEN, T}~M, TImElY, TI~IR All these observations confirmed the results anticipated in previous sections.</Paragraph>
    <Paragraph position="3"> The latest stage of the experiment is being carried out at the time of writing this paper (May, 1969). The author is now able to use the English side of the Palantype - '_iuglish dictionary ~ of about 80,000 entries. For the sake of economy in programming and machine time, only one section of the dictionary, namely the entries starting with letter S (about I~% of the whole dictionary) is being used. The linearization and organization of this section is now in progress. This will enable the author to test a more complete dictionary look-up than before, together with general-content check and later with syntactic analysis as well.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5. Other applications
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Apart from the general use for natural English texts, an
</SectionTitle>
      <Paragraph position="0"> application of the elastic m!~tching technique has been proposed in the automatic tra!!scription of machine-shorthand of the Palantype system.</Paragraph>
      <Paragraph position="1"> This system uses a special machine with a keyboard enabling the simultaneous striking of several keys, each 'stroke' corresponding to a phonetically-based group of consonants and vowels, roughly equivalent to a syllable. In normal operation all the characters of each stroke are printed together on a continuous paper band, shifting after each This will be explained below, Section 5-7- null stroke. The recording is later read and transcribed by a human operator. Since the latter part of the operation is naturally much slower (about four times that of the recording), a project, now in progress at the NPL, aims at securing automatic transcription, in which the character levers, in addition to the ordinary printing action, activate electric contacts. These create impulses, which are fed into a computer and result, after a series of operations, in printing out a text as new to ordinary English, as possible. One of the problems encountered in this process is caused by the flexibility of the recording conve~tion, enabling the human operator to record phonetic combinations in more than one way. Generally, this is provided for by inserting in the automatic Palantype-~haglish dictionary all versions of each word that can be reasonably foreseen. In practice the Unforeseen sometimes happens and the word is output untranslated (but 'transliterated' phonetically), which is at the best annoying, but may even be unreadable. An analysis has shown that most of the deviations from standard versions stored in the dictionary are caused by a few convention rtules, such as e.g. 'vowel elision': any unaccented vowel in a word can be omitted. Now, if the matching is done not on palantype strokes but on their linearized versions, the elastic matching rules can easily be adjusted to include the versions produced.</Paragraph>
      <Paragraph position="2"> Incidentally, the Palantype sequence is ~Iready partly linearized, and reads: SCPTH + M~LYOEAUI . NLCF~RPT + SH (the &amp;quot;+&amp;quot; and &amp;quot;.&amp;quot; signs have special phonetic functions). For the linearization purposes all that is needed is to exclude the repeated consonants (from second &amp;quot;N&amp;quot; to the end); the number of lines will therefore exceed the number of 'strokes'.</Paragraph>
      <Paragraph position="3"> The relevant procedures have been fully tested on sample lists of standard and non-standard versions (containing up to 300 words) and were found satisfactory. The implementation, however, for use with the full dictionary remains to be done. It is still not clear whether it would repay to linearize and store in this form the complete dictionary of eighty odd thousand entries; or whether it would be more practical to linearize while checking, stroke by stroke, which would be, of course, a much slower procedure. At the present time it does not look likely that either solution would lead to standardization being possible in 'real-time', but there remains the possibility of an 'errata' sheet being produced almost immediately after the normal output. More particulars about this application can be found in the paper \[4\].</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Another application, now under consideration, is the
</SectionTitle>
      <Paragraph position="0"> retrieval of misspelled proper names from lists used in a factretrieval project, which is also in progress at the NPL.</Paragraph>
      <Paragraph position="1"> -8-</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML