File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/c90-3031_abstr.xml
Size: 3,721 bytes
Last Modified: 2025-10-06 13:46:59
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3031"> <Title>amp;quot;Corpus-based Lexical Acquisition for Translation</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> Our goal is to explore methods for combining structured but incomplete information from dictionaries with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base. This paper concentrates on the class of action verbs of movement, and builds on earlier work on lexical correspondences between languages and specific to this verb class.</Paragraph> <Paragraph position="1"> The languages we explore here are English and French. We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Collins 1978, henceforth CR) bilingual dictionary. We then analyze the behavior of some of these verbs in a large bilingual corpus. We take advantage of the results of linguistic research on verb types (e.g. Levin, to appear) coupled with data from machine readable dictionaries to motivate corpus-based text analysis for the purpose of estabfishing lexical correspondences with the full range of associated translations and then attach frequencies to translations.</Paragraph> <Paragraph position="2"> 1. Background. As NLP systems become more robust, large lexicons are required, providing a wide range of information including syntactic, semantic, pragmatic, naorphological and phonological. There are difficulties in constructing these large lexicons, first in their design, and then in providing them with the necessary and sufficient data. These problems have recently been the topic of intense research (Klavans 1988, Boguraev and Briscoe 1989, Boguraev et al. 1989, Zemick 1990). Moreover, an important sub-area of computational lexicon building that has barely been approached is that of bi-lingual lexicon construction (Caholari and Picchi 1986, Rizk 1989).</Paragraph> <Paragraph position="3"> 2. Motion Verbs. In this paper, we report on data for movement verbs (or motion verbs). The class of English motion verbs and their translations into Romance languages has been widely discussed from various points of view including theoretical, structural (Talmy 1985), and applied (Atkins et al. 1990, in preparation). English generally incorporates movement and cause or manner into a single lexical item whereas languages like French do not. For example, in CR stroll is translated as %6 promener nonchalamment', 'fl/mer' and stroll in/out etc. as 'entrer/sortir/s'floigner sans se presser' or 'nonchalammcnt'. Notice that in French, the translation typically consists of a general motion verb 'entrer/sortir/aUer/avancer' with an adverbial or prepositional modifier showin\[, manner, e.g.</Paragraph> <Paragraph position="4"> 'nonchalammcnt' or 'sans se presser'. Similarly, in English, causation in movement is often incorporated, e.g the Fmglish verb march as in to march the troops is translated in CR as 'faire marcher (au pas) les troupes'. These multi-word correspondences often cause problcms in the lexical transfer compo- null nent of machine translation systems.</Paragraph> <Paragraph position="5"> 3. Bilingual Corpus-based Analysis. In earlier work (Klavans and Tzoukermann 1989), we reported on a study of a scleclcd sub-set of movement verbs ha a bilingual corpus. The corpus consists of 85 million English and 95 million French words from the Canadian Parliamentary Proceedings (the ttansard corpus). Of this, 75 million French and 70 milfion I;nglish words are aligned by sentence (Brown ctal. 1988). For example:</Paragraph> </Section> class="xml-element"></Paper>