File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1053_intro.xml
Size: 6,760 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1053"> <Title>Accumulation of Lexical Sets: Acquisition of Dictionary Resources and Production of New Lexical Sets</Title> <Section position="3" start_page="331" end_page="332" type="intro"> <SectionTitle> 2 Production </SectionTitle> <Paragraph position="0"> From available LSs it is interesting and .possible to produce new ones, eg, one can revert a bilingual dictionary A-B to obtain a B-A dictionary, or chain two dictionaries A-B and B-C to make an A-B-C, or only A-C (A, B, C are three languages). The produced LSs surely need more correction but they can serve at least as somewhat prepared materials, eg, dictionary drafts. Acquisition and production make the notion of lexical accumulation complete: the former is to obtain lexical data of (almost) the same linguistic structure as the source, the latter is to create data of totally new linguistic structures.</Paragraph> <Paragraph position="1"> Viewed as a computational linguistic problem, production has two aspects. The linguistic aspect consists in defining what to produce, ie the mapping from the source LSs to the target LSs.</Paragraph> <Paragraph position="2"> The quality of the result depends on the linguistic decisions. There were several experiments studying some specific issues, such as sense mapping or attribute transferring (Byrd & al (1987), Dorr & al (1995)). This aspect seems to pose many difficult lexicographic problems, and is not dealt with here.</Paragraph> <Paragraph position="3"> The computational aspect, in which we are interested, is how to do production. To be general, production needs a Turing machine computational power. In this perspective, a framework which can help us specify easily a production process may be very desirable. To build such a framework, we will examine several common categories of production, point out basic operations often used in them, and finally, establish and implement a formalism for specifying and doing production.</Paragraph> <Section position="1" start_page="331" end_page="332" type="sub_section"> <SectionTitle> 2.1 Categories of production </SectionTitle> <Paragraph position="0"> Production can be done in one of two directions, or by combining both: &quot;extraction&quot; and &quot;synthesis&quot;. Some common categories of production are listed below.</Paragraph> <Paragraph position="1"> (1) Selection of a subset by some criteria, eg selection of all verbs from a dictionary.</Paragraph> <Paragraph position="2"> (2) Extraction of a substructure, eg extracting a bilingual dictionary from a trilingual.</Paragraph> <Paragraph position="3"> (3) Inversion, eg of an English-French dictionary to obtain a French-English one.</Paragraph> <Paragraph position="4"> (4) Regrouping some elements to make a &quot;bigger&quot; structure, eg regrouping homograph entries into polysemous ones.</Paragraph> <Paragraph position="5"> (5) Chaining, eg two bilingual dictionaries A-B and B-C to obtain a trilingual A-B-C.</Paragraph> <Paragraph position="6"> (6) Paralleling, eg an English-French dictionary with another English-French, to make an English-\[French( I ), French(2)\] (for comparison or enrichment .... ).</Paragraph> <Paragraph position="7"> (7) Starring combination, eg of several bilingual dictionaries A-B, B-A, A-C, C-A, A-D, D-A, to make a multiligual one with A being the pivot language (B, C, D)-A-(B, C, D).</Paragraph> <Paragraph position="8"> Numeric evaluations can be included in production, eg in paralleling several English-French dictionaries, one can introduce a fuzzy logic number showing how well a French word translates an English one: the more dictionaries the French word occurs in, the bigger the number becomes.</Paragraph> </Section> <Section position="2" start_page="332" end_page="332" type="sub_section"> <SectionTitle> 2.2 Implementation of production </SectionTitle> <Paragraph position="0"> Studying the algorithms for the categories above shows they may make use of many common basic operations. As an example, the operation regroup set by functionl into function2 partitions set into groups of elements having the same value of applying function1, and applies function2 on each group to make a new element.</Paragraph> <Paragraph position="1"> It can be used to regroup homograph entries (ie those having the same headword forms) of a dictionary into polysemous ones, as follows: regroup dictionary by headword into polysem (polysem is some function combining the body of the homograph entries into a polysemous one.) It can also be used in the inversion of an English-French dictionary EF-dict whose entries are of the structure <English-word, Frenchtranslations> (eg <love, {aimer, amour}>): for-all EF-entry in EF-dict do split EF-entry into <French, English> pairs, eg split <love, {aimer, amour}> into {<aimer, love> <amour, love>}. Call the result FE-pairs.</Paragraph> <Paragraph position="2"> regroup FE-pairs by French into FE-entry (FE-entry is a function making French-English entries, eg making <aimer, {love, like}> from <aimer, like> and <aimer, love>.) Our formalism for production was built with four groups of operations (see Doan-Nguyen (1996) for more details): (1) Low-level operations: assignments, conditionals, and (rarely used) iterations.</Paragraph> <Paragraph position="3"> (2) Data manipulation functions, eg string functions.</Paragraph> <Paragraph position="4"> (3) Set and first-order predicate calculus operations, eg the for-all above.</Paragraph> <Paragraph position="5"> (4) Advanced operations, which d o complicated transformations on objects and sets, eg regroup, split above.</Paragraph> <Paragraph position="6"> Finally, LSs were implemented as LISP lists for &quot;small&quot; sets, and CLOS object databases and LISPO sequential files for large ones.</Paragraph> </Section> <Section position="3" start_page="332" end_page="332" type="sub_section"> <SectionTitle> 2.3 Result and example </SectionTitle> <Paragraph position="0"> Within the framework presented above, about 1 0 dictionary drafts of about 200,000 entries were produced. As an example, an English-French-UNL 2 (EFU) dictionary draft was produced from an English-UNL (EU) dictionary, a French-English-Malay (FEM), and a French-English (FE). The FEM is extracted and inverted to give an English-French dictionary (EF-1), the FIE is inverted to give another (EF-2). The EFU is produced then by paralleling the EU, EF-1, and EF-2. This draft was used as the base for compiling a French-UNL dictionary at GETA (Boitet & al 1998). We have not yet had an evaluation on the draft.</Paragraph> </Section> </Section> class="xml-element"></Paper>