File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4015_intro.xml
Size: 2,276 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4015"> <Title>Morphological Analysis for Statistical Machine Translation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Translation of two languages with highly different morphological structures as exemplified by Arabic and English poses a challenge to successful implementation of statistical machine translation models (Brown et al. 1993). Rarely occurring inflected forms of a stem in Arabic often do not accurately translate due to the frequency imbalance with the corresponding translation word in English. So called a word (separated by a white space) in Arabic often corresponds to more than one independent word in English, posing a technical problem to the source channel models. In the English-Arabic sentence alignment shown in Figure 1, Arabic word AlAHmr (written in Buckwalter transliteration) is aligned to two English words 'the red', and llmEArDp to three English words 'of the opposition.' In this paper, we present a technique to induce a morphological and syntactic symmetry between two languages with different morphological structures for statistical translation quality improvement.</Paragraph> <Paragraph position="1"> The technique is implemented as a two-step morphological processing for word-based translation models. We first apply word segmentation to Arabic, segmenting a word into prefix(es)-stem-suffix(es). Arabic-English sentence alignment after Arabic word segmentation is illustrated in Figure 2, where one Arabic morpheme is aligned to one or zero English word. We then apply the proposed technique to the word segmented Arabic corpus to identify prefixes/suffixes to be merged into their stems or deleted to induce a symmetrical morphological structure. Arabic-English sentence alignment after Arabic morphological analysis is shown in Figure 3, where the suffix p is merged into their stems mwAjh and mEArd.</Paragraph> <Paragraph position="2"> For phrase translation models, we apply additional morphological analysis induced from noun phrase parsing of Arabic to accomplish a syntactic as well as morphological symmetry between the two languages.</Paragraph> </Section> class="xml-element"></Paper>