File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1023_intro.xml
Size: 2,534 bytes
Last Modified: 2025-10-06 14:02:52
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1023"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 177-184, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Inner-Outer Bracket Models for Word Alignment using Hidden Blocks</Title> <Section position="4" start_page="177" end_page="177" type="intro"> <SectionTitle> 6 contains discussion and conclusions. 2 Segmentation by a Block </SectionTitle> <Paragraph position="0"> We use the following notation in the remainder of this paper: e and f denote the English and foreign sentences with sentence lengthes of I and J, respectively. ei is an English word at position i in e; fj is a foreign word at position j in f. a is the alignment vector with aj mapping the position of the English word eajto which fj connects. Therefore, we have the standard limitation that one foreign word cannot be connected to more than one English word. A block d[] is defined as a pair of brackets as follows:</Paragraph> <Paragraph position="2"> where de = [il,ir] is a bracket in English sentence defined by a pair of indices: the left position il and the right position ir, corresponding to a English phrase eiril . Similar notations are for df = [jl,jr], which is one possible projection of de inf. The subscript l and r are abbreviations of left and right, respectively.</Paragraph> <Paragraph position="3"> de segments e into two parts: (de,e) = (de[?],de/[?]).</Paragraph> <Paragraph position="4"> The inner part de[?] = {ei,i [?] [il,ir]} and the outer part de/[?] = {ei,i /[?] [il,ir]}; df segments f similarly.</Paragraph> <Paragraph position="5"> Thus, the block d[] splits the parallel sentence pair into two non-overlapping regions: the Inner d[][?] and Outer d[]/[?] parts (see Figure 1). With this segmentation, we assume the words in the inner part are aligned to inner part only: d[][?] = de[?] - df[?] : {ei,i [?] [il,ir]} - {fj,j [?] [jl,jr]}; and words in the outer part are aligned to outer part only: d[]/[?] = de/[?] - df/[?] : {ei,i /[?] [il,ir]} - {fj,j /[?] [jl,jr]}. We do not allow alignments to cross block boundaries. Words inside a block d[] can be aligned using a variety of models (IBM models 1-5, HMM, etc). We choose Model1 for simplicity. If the block boundaries are accurate, we can expect high quality word alignment. This is our proposed new localization method.</Paragraph> <Paragraph position="7"/> </Section> class="xml-element"></Paper>