File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1020_intro.xml

Size: 3,390 bytes

Last Modified: 2025-10-06 14:06:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1020">
  <Title>Pattern-Based Context-Free Grammars for Machine Translation</Title>
  <Section position="3" start_page="0" end_page="144" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> With the explosive growth of the World-Wide Web (WWW) as information source, it has become routine for Internet users to access textual data written in foreign languages. In Japan, for example, a dozen or so inexpensive MT tools have recently been put on the market to help PC users understand English text in WWW home pages. The MT techniques employed in the tools, however, are fairly conventional. For reasons of affordability, their designers appear to have made no attempt to tackle the well-known problems in MT, such as how to ensure the learnability of correct translations and facilitate customization. As a result, users are forced to see the same kinds of translation errors over and over again, except they in cases where they involve merely adding a missing word or compound to a user dictionary, or specifying one of several word-to-word translations as a correct choice.</Paragraph>
    <Paragraph position="1"> There are several alternative approaches that might eventually liberate us from this limitation on the usability of MT systems: Unification-based grammar formalisms and lexical-semantics formalisms (see LFG (Kaplan and Bresnan, 1982), HPSG (Pollard and Sag, 1987), and Generative Lexicon (Pustejovsky, 1991), for example) have been proposed to facilitate computationally precise description of natural-language syntax and semantics. It is possible that, with the descriptive power of these grammars and lexicons, individual usages of words and phrases may be defined specifically enough to give correct translations. Practical implementation of MT systems based on these formalisms, on the other hand, would not be possible without much more efficient parsing and disambiguation algorithms for these formalisms and a method for building a lexicon that is easy even for novices to use.</Paragraph>
    <Paragraph position="2"> Corpus-based or example-based MT (Sato and Nagao, 1990; Sumita and Iida, 1991) and statistical MT (Brown et al., 1993) systems provide the easiest customizability, since users have only to supply a collection of source and target sentence pairs (a bilingual corpus). Two open questions, however, have yet to be satisfactorily answered before we can confidently build commercial MT systems based on  ing algorithm for TAGs has O(IGIn6) 2 worst case time complexity (Vijay-Shanker, 1987), and that the &amp;quot;patterns&amp;quot; in Maruyama's approach are merely context-free grammar (CFG) rules. Thus, it has been a challenge to find a framework in which we can enjoy both a grammar formalism with better descriptive power than CFG and more efficient parsing/generation algorithms than those of TAGs. 3 In this paper, we will show that there exists a class of &amp;quot;pattern-based&amp;quot; grammars that is weakly equivalent to CFG (thus allowing the CFG parsing algorithms to be used for our grammars), but that it facilitates description of the domain of locality.</Paragraph>
    <Paragraph position="3"> Furthermore, we will show that our framework can be extended to incorporate example-based MT and a powerful learning mechanism.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML