File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3207_intro.xml
Size: 3,033 bytes
Last Modified: 2025-10-06 14:02:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3207"> <Title>Bilingual Parsing with Factored Estimation: Using English to Parse Korean</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Consider the problem of parsing a language L for which annotated resources like treebanks are scarce. Suppose we have a small amount of text data with syntactic annotations and a fairly large corpus of parallel text, for which the other language (e.g., English) is not resourceimpoverished. How might we exploit English parsers to improve syntactic analysis tools for this language? One idea (Yarowsky and Ngai, 2001; Hwa et al., 2002) is to project English analysis onto L data, &quot;through&quot; word-aligned parallel text. To do this, we might use an English parser to analyze the English side of the parallel text and a word-alignment algorithm to induce word correspondences. By positing a coupling of English syntax with L syntax, we can induce structure on the L side of the parallel text that is in some sense isomorphic to the English parse.</Paragraph> <Paragraph position="1"> We might take the projection idea a step farther. A statistical English parser can tell us much more than the hypothesized best parse. It can be used to find every parse admitted by a grammar, and also scores of those parses. Similarly, translation models, which yield word alignments, can be used in principle to score competing alignments and offer alternatives to a single-best alignment. It might also be beneficial to include the predictions of an L parser, trained on any available annotated L data, however few.</Paragraph> <Paragraph position="2"> This paper describes how simple, commonly understood statistical models--statistical dependency parsers, probabilistic context-free grammars (PCFGs), and word translation models (TMs)--can be effectively combined into a unified framework that jointly searches for the best English parse, L parse, and word alignment, where these hidden structures are all constrained to be consistent.</Paragraph> <Paragraph position="3"> This inference task is carried out by a bilingual parser.</Paragraph> <Paragraph position="4"> At present, the model used for parsing is completely factored into the two parsers and the TM, allowing separate parameter estimation.</Paragraph> <Paragraph position="5"> First, we discuss bilingual parsing (SS2) and show how it can solve the problem of joint English-parse, L-parse, and word-alignment inference. In SS3 we describe parameter estimation for each of the factored models, including novel applications of log-linear models to English dependency parsing and Korean morphological analysis.</Paragraph> <Paragraph position="6"> SS4 presents Korean parsing results with various mono-lingual and bilingual algorithms, including our bilingual parsing algorithm. We close by reviewing prior work in areas related to this paper (SS5).</Paragraph> </Section> class="xml-element"></Paper>