File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0808_intro.xml
Size: 1,538 bytes
Last Modified: 2025-10-06 14:03:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0808"> <Title>A hybrid approach to align sentences and words in English-Hindi parallel corpora</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Text alignment is not only used for the tasks such as bilingual lexicography or machine translation but also in other language processing applications such as multilingual information retrieval and word sense disambiguation. Whilst resources like bilingual dictionaries and parallel grammars help to improve Machine Translation (MT) quality, text alignment, by aligning two texts at various levels (i.e. documents, sections, paragraphs, sentences and words), helps in the creation of such lexical resources (Manning & Schutze, 2003).</Paragraph> <Paragraph position="1"> In this paper, we describe a system that aligns English-Hindi texts at the sentence and word level.</Paragraph> <Paragraph position="2"> Our system is motivated by the desire to develop for the research community an alignment system for the English and Hindi languages. Building on this, alignment results can be used in the creation of other Hindi language processing resources (e.g.</Paragraph> <Paragraph position="3"> part-of-speech taggers). We present a simple sentence length approach to align English-Hindi sentences and a hybrid approach with local word grouping and dictionary lookup as the primary techniques to align words.</Paragraph> </Section> class="xml-element"></Paper>