File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-3003_evalu.xml

Size: 3,058 bytes

Last Modified: 2025-10-06 13:59:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-3003">
  <Title>Limited-Domain Speech-to-Speech Translation between English and Pashto</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Challenges
</SectionTitle>
    <Paragraph position="0"> Three main challenges face this project. First, commercially nonviable languages, such as Pashto, often offer very few linguistic resources (such as linguistic descriptions, acoustic data, texts, language processing tools). The lack of resources makes development more difficult, and severely constrains the approaches that are viable: approaches that rely on large corpora cannot be used. In addition, there is a shortage of literate speakers who are available to act as consultants, and a scarcity of basic knowledge about the language. This impedes progress and renders difficult approaches that rely on a large body of hand-crafted translation rules. The challenge of having no single person who has a deep understanding of both the language and the technology and who can serve as a bridge between them is substantial, and causes more iterative development than is ordinarily the case when bilingual technologists are available, as newly discovered phenomena or new understanding cause revisions of previous work.</Paragraph>
    <Paragraph position="1"> A second major challenge is due to the nature of the domain and the underlying concept of operations. Real speech occurs in noisy environments, has disfluencies, and is highly variable (e.g., phrasings, dialect differences). In addition, the output of a speech recognizer contains more and qualitatively different errors than typical text input to automatic translation systems. While the problems of real speech are not unique to this project, they are magnified by the fact that the non-English speakers will largely be unsophisticated users of technology, who will often be using the system for the first and only time. The system must work well from the very first utterance - there cannot be much of a learning curve. This applies to the translation quality and to other aspects of the system, such as the synthetic speech, as speakers are not familiar with synthesized Pashto speech. These speakers are also not expected to be literate, and their understanding will not be bolstered by the extra redundancy and capabilities that the display offers to the English speaker.</Paragraph>
    <Paragraph position="2"> A third major challenge is posed by the handheld device platform with its computational and storage limitations, and the near-real-time requirement of the envisioned usage. The restriction to integer-only computation is most serious for the speech recognition, as nearly all medium- or large-vocabulary speech recognizers perform extensive floating-point computations, and the conversion of a speech recognizer to use only integer computation required considerable effort. The severe memory limitations perhaps impact most the parsing/generation components of the system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML