File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0303_intro.xml
Size: 2,664 bytes
Last Modified: 2025-10-06 14:01:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0303"> <Title>Word Alignment Based on Bilingual Bracketing</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Bilingual parsing based word alignment is promising but still difficult. The goal is to extract structure information from parallel sentences, and thereby improve word/phrase alignment via bilingual constraint transfer.</Paragraph> <Paragraph position="1"> This approach can be generalized to the automatic acquisition of a translation lexicon and phrase translations esp. for languages for which resources are relatively scarce compared with English.</Paragraph> <Paragraph position="2"> The parallel sentences in building Statistical Machine Translation (SMT) systems are mostly unrestricted text where full parsing often fails, and robustness with respect to the inherent noise of the parallel data is important.</Paragraph> <Paragraph position="3"> Bilingual Bracketing [Wu 1997] is one of the bilingual shallow parsing approaches studied for Chinese-English word alignment. It uses a translation lexicon within a probabilistic context free grammar (PCFG) as a generative model to analyze the parallel sentences with weak order constraints. This provides a framework to incorporate knowledge from the English side such as POS, phrase structure and potentially more detailed parsing results.</Paragraph> <Paragraph position="4"> In this paper, we use a simplified bilingual bracketing grammar together with a statistical translation lexicon such as the Model-1 lexicon [Brown 1993] to do the bilingual bracketing. A boosting strategy is studied and applied to the statistical lexicon training. English POS and Base Noun Phrase (NP) detection are used to further improve the alignment performance. Word alignments and phrase alignments are extracted from the parsing results as post processing. The settings of different translation lexicons within the bilingual bracketing framework are studied and experiments on word-alignment are carried out on Chinese-English, French-English, and Romanian-English language pairs.</Paragraph> <Paragraph position="5"> The paper is structured as follows: in section 2, the simplified bilingual bracketing used in our system is described; in section 3, the boosting strategy based on importance sampling for IBM Model-1 lexicon is introduced; in section 4, English POS and English Base Noun Phrase are used to constrain the alignments ; in section 5, the experimental results are shown; summary and conclusions are given in section 6.</Paragraph> </Section> class="xml-element"></Paper>