File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2212_evalu.xml
Size: 3,277 bytes
Last Modified: 2025-10-06 14:00:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2212"> <Title>Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text</Title> <Section position="7" start_page="1302" end_page="1302" type="evalu"> <SectionTitle> 5 Shortcomings and future work </SectionTitle> <Paragraph position="0"> In matching a pair of bags, two kinds of ambiguity could lead to multiple results, some of which are incorrect. Firstly, as already mentioned, a bag could contain two lexical items with unifiable descriptions (e.g. two adjectives modifying the same noun), possibly causing an incorrect match. Secondly, as the bilingual template database grows, the chance of overlaps between templates also grows. Two different templates or combinations of templates might cover the same input and output. A case in point is that of a phrasal verb or an idiom covered by both a single multi-word template and a compositional combination of simpler templates.</Paragraph> <Paragraph position="1"> As both potential sources of error can be automatically detected, a first step in tackling the problem would be to block the automatic generation of the entries involved when a problematic case occurs, or to have a user select the correct candidate. In this way the correctness of the output is guaranteed. The possible cost is a lack of completeness, when no user intervention is foreseen.</Paragraph> <Paragraph position="2"> Furthermore, techniques for the automatic resolution of template overlaps are under investigation. Such techniques assume the presence of a bilingual lexicon. The information contained therein is used to assign preferences to competing candidate entries, in two ways.</Paragraph> <Paragraph position="3"> Firstly, templates are probabilistically ranked, using the existing bilingual lexicon to estimate probabilities. When the choice is between single entries, the ranking can be performed by counting the frequency of each competing template in the lexicon. The entry with the most frequent template is chosen.</Paragraph> <Paragraph position="4"> Secondly, heuristics are used to assign preferences, based on the presence of pre-existing entries related in some way to the candidate entries. This technique is suited for resolving ambiguities where multiple entries are involved. For instance, given the equivalence between 'kick the bucket' and 'estirar la pata', and the competing candidates (8) a. {kick ~ bucket ~ estirar &pata) b. {kick ~-+ estirar, bucket ~ pata} the presence of an entry 'bucket ~-* balde' in the bilingual lexicon might be a clue for preferring the idiomatic interpretation. Conversely, if the hypothetical entry 'bucket ~ pata' were already in the lexicon, the compositional interpretation might be preferred.</Paragraph> <Paragraph position="5"> Finally, efficiency is also dependant on the restrictiveness of grammars. The more grammars overgenerate, the more the combinatoric indeterminacy in the matching process increases.</Paragraph> <Paragraph position="6"> However, overgeneration is as much a problem for translation as for bilingual generation. In other words, no additional requirement is placed on the MT system which is not independently motivated by translation alone.</Paragraph> </Section> class="xml-element"></Paper>