File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/w01-1412_concl.xml
Size: 1,056 bytes
Last Modified: 2025-10-06 13:53:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1412"> <Title>A Comparative Study on Translation Units for Bilingual Lexicon Extraction</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> This paper reports on-going research on extracting bilingual lexicon from English-Japanese parallel corpora. Three models including a previously proposed one in (Kitamura and Matsumoto, 1996) are compared in this paper. Through preliminary experiments with 10000 bilingual sentences, we obtain that our new models (Chunkbound N-gram and Dependency-linked N-gram) gain approximately 13% improvement in accuracy and 5-9% improvement in coverage from the baseline model (Bound-length N-gram). We present quantitative and qualitative analysis of the results in three models. We conclude that chunk boundaries are useful for building initial bilingual lexicon, and that idiomatic expressions may be partially handled with by dependency links.</Paragraph> </Section> class="xml-element"></Paper>