File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/w05-0801_relat.xml
Size: 2,404 bytes
Last Modified: 2025-10-06 14:15:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0801"> <Title>Identifying Word Correspondences in Parallel Texts. In Proceedings of the Speech and Natural</Title> <Section position="9" start_page="6" end_page="7" type="relat"> <SectionTitle> 8 Related Work </SectionTitle> <Paragraph position="0"> The literature on measures of bilingual word association is too large to review thoroughly, but mostly it concerns extracting bilingual lexicons rather than word alignment. We discuss three previous research efforts that seem particularly relevant here.</Paragraph> <Paragraph position="1"> Gale and Church (1991) made what may be the first application of word association to word alignment. Their method seems somewhat like our Method 1B. They use a word association score directly, although they use the ph statistic instead of LLR, and they consider forward jumps as well as backward jumps in a probability model in place of our nonmonotonicity measure. They report 61% recall at 95% precision on Canadian Hansards data. Obviously, we are building directly on the work of Melamed (2000), sharing his use of the LLR statistic and adopting his competitive linking algorithm. We diverge in other details, however. Moreover, Melamed makes no provision for other than one-to-one alignments, and he does not deal with the problem of turning a word type alignment into a word token alignment. As Table 4 shows, this is crucial to obtaining high accuracy alignments.</Paragraph> <Paragraph position="2"> Finally, our work is similar to that of Cherry and Lin (2003) in our use of the conditional probability of a link given the co-occurrence of the linked words. Cherry and Lin generalize this idea to incorporate additional features of the aligned sentence pair into the conditioning information. The chief difference between their work and ours, however, is their dependence on having parses for the sentences in one of the languages being aligned. They use this to enforce a phrasal coherence constraint, which basically says that word alignments cannot cross constituent boundaries. They report excellent alignment accuracy using this approach, and one way of comparing our results to theirs is to say that we show it is also possible to get good results (at least for English and French) by using nonmonotonicity information in place of constituency information.</Paragraph> </Section> class="xml-element"></Paper>