File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2039_intro.xml
Size: 2,737 bytes
Last Modified: 2025-10-06 14:03:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2039"> <Title>Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus</Title> <Section position="3" start_page="0" end_page="301" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Example-based approaches for developing parsers have already been proposed in literature. These approaches either use examples from the same language, e.g., (Bod et al., 2003; Streiter, 2002), or they try to imitate the parse of a given sentence using the parse of the corresponding sentence in some other language (Hwa et al., 2005; Yarowsky and Ngai, 2001). In particular, Hwa et al. (2005) have proposed a scheme called direct projection algorithm (DPA) which assumes that the relation between two words in the source language sentence is preserved across the corresponding words in the parallel target language. This is called Direct Correspondence Assumption (DCA).</Paragraph> <Paragraph position="1"> However, with respect to Indian languages we observed that the DCA does not hold good all the time. In order to overcome the difficulty, in this work, we propose an algorithm based on a variation of the DCA, which we call pseudo Direct Correspondence Assumption (pDCA). Through pDCA the syntactic knowledge can be transferred even if not all syntactic relations may be projected directly from the source language to the target language in toto. Further, the proposed algorithm projects the relations between phrases instead of projecting relations between words. Keeping in line with (Hwa et al., 2005), we call this algorithm as pseudo Direct Projection Algorithm (pDPA).</Paragraph> <Paragraph position="2"> The present work discusses the proposed parsing scheme for a new (target) language with the help of a parser that is already available for a language (source) and using word-aligned parallel corpus of the two languages under consideration. We propose that the syntactic relationships between the chunks of the input sentence T (of the target language) are given depending upon the relationships of the corresponding chunks in the translation S of T. Along with the parsed structure of the input, the system also outputs the constituent structure (phrases) of the given input sen- null tence.</Paragraph> <Paragraph position="3"> In this work, we first discuss the proposed scheme in a general framework. We illustrate the scheme with respect to parsing of Hindi sentences using the Link Grammar (LG) based parser for English and the experimental results are discussed.</Paragraph> <Paragraph position="4"> Before that in the following section we discuss Link Grammar briefly.</Paragraph> </Section> class="xml-element"></Paper>