File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1064_intro.xml
Size: 2,684 bytes
Last Modified: 2025-10-06 14:02:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1064"> <Title>Aligning words using matrix factorisation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Word alignments </SectionTitle> <Paragraph position="0"> We address the following problem: Given a source sentence f = f1 :::fi :::fI and a target sentence e = e1 :::ej :::eJ, we wish to find words fi and ej on either side which are aligned, ie in mutual correspondence. Note that words may be aligned without being directly &quot;dictionary translations&quot;. In order to have proper alignments, we want to enforce the following constraints: Coverage: Every word on either side must be aligned to at least one word on the other side (Possibly taking &quot;null&quot; words into account). Transitive closure: If fi is aligned to ej and e', any fk aligned to e' must also de aligned to ej.</Paragraph> <Paragraph position="1"> Under these constraints, there are only 4 types of alignments: 1-1, 1-N, M-1 and M-N (fig. 1).</Paragraph> <Paragraph position="2"> Although the first three are particular cases where N=1 and/or M=1, the distinction is relevant, because most word-based translation models (eg IBM models (Brown et al., 1993)) can typically not accommodate general M-N alignments.</Paragraph> <Paragraph position="3"> We formalise this using the notion of cepts: a cept is a central pivot through which a subset of e-words is aligned to a subset of f-words. General M-N alignments then correspond to M-1-N alignments from e-words, to a cept, to f-words (fig. 2). Cepts naturally guarantee transitive closure as long as each word is connected to a single cept. In addition, coverage is ensured by imposing that each le droit de permis ne augmente pas the licence fee does not increase</Paragraph> <Paragraph position="5"> fig. 1, 2. Black squares represent alignments.</Paragraph> <Paragraph position="6"> word is connected to a cept. A unique constraint therefore guarantees proper alignments: Propriety: Each word is associated to exactly one cept, and each cept is associated to at least one word on each side.</Paragraph> <Paragraph position="7"> Note that our use of cepts differs slightly from that of (Brown et al., 1993, sec.3), inasmuch cepts may not overlap, according to our definition.</Paragraph> <Paragraph position="8"> The motivation for our work is that better word alignments will lead to better translation models. For example, we may extract better chunks for phrase-based translation models. In addition, proper alignments ensure that cept-based phrases will cover the entire source and target sentences.</Paragraph> </Section> class="xml-element"></Paper>