XML Viewer - j98-4003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/j98-4003_concl.xml
Size: 3,175 bytes
Last Modified: 2025-10-06 13:58:03
<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-4003">
  <Title>Machine Transliteration</Title>
  <Section position="8" start_page="609" end_page="611" type="concl">
    <SectionTitle>
6. Discussion
</SectionTitle>
    <Paragraph position="0"> In a 1947 memorandum, Weaver (1955) wrote: One naturally wonders if the problem of translation could conceivably be treated as a problem of cryptography. When I look at an article in Russian, I say: &amp;quot;'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.&amp;quot; (p. 18) Whether this is a useful perspective for machine translation is debatable (Brown et al. 1993; Knoblock 1996)--however, it is a dead-on description of transliteration. Most katakana phrases really are English, ready to be decoded.</Paragraph>
    <Paragraph position="1"> We have presented a method for automatic back-transliteration which, while far from perfect, is highly competitive. It also achieves the objectives outlined in Section 1. It ports easily to new language pairs; the P(w) and P(elw ) models are entirely reusable, while other models are learned automatically. It is robust against OCR noise, in a rare example of high-level language processing being useful (necessary, even) in improving low-level OCR.</Paragraph>
    <Paragraph position="2"> There are several directions for improving accuracy. The biggest problem is that raw English frequency counts are not the best indication of whether a word is a possible source for transliteration. Alternative data collection methods must be considered.  Computational Linguistics Volume 24, Number 4 We may also consider changes to the model sequence itself. As we have presented it, our hypothetical human transliterator produces Japanese sounds from English sounds only, without regard for the original English spelling. This means that English homonyms will produce exactly the same katakana strings. In reality, though, transliterators will sometimes key off spelling, so that tonya and tanya produce toonya and taanya. It might pay to carry along some spelling information in the English pronunciation lattices.</Paragraph>
    <Paragraph position="3"> Sentential context should be useful for determining correct translations. It is often clear from a Japanese sentence whether a katakana phrase is a person, an institution, or a place. In many cases it is possible to narrow things further--given the phrase &amp;quot;such-and-such, Arizona,&amp;quot; we can restrict our P(w) model to include only those cities and towns in Arizona.</Paragraph>
    <Paragraph position="4"> It is also interesting to consider transliteration for other languages. In Arabic, for example, it is more difficult to identify candidates for transliteration because there is no distinct, explicit alphabet that marks them. Furthermore, Arabic is usually written without vowels, so we must generate vowel sounds from scratch in order to produce correct English.</Paragraph>
    <Paragraph position="5"> Finally, it may be possible to embed phonetic-shift models inside speech recognizers, to explicitly adjust for heavy foreign accents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML