File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2006_intro.xml

Size: 7,831 bytes

Last Modified: 2025-10-06 14:04:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2006">
  <Title>Automatic Generation of Translation Dictionaries Using Intermediary Languages</Title>
  <Section position="3" start_page="0" end_page="43" type="intro">
    <SectionTitle>
3 The Experiment
</SectionTitle>
    <Paragraph position="0"> We have applied the method described in section 2 in order to automatically generate a Spanish-to-German dictionary using Spanish-to-English, English-to-German, German-to-English and English-to-Spanish dictionaries. We chose Spanish and German because we were able to find an online Spanish-to-German dictionary which could be used to evaluate our automatically-generateddictionary.</Paragraph>
    <Section position="1" start_page="0" end_page="42" type="sub_section">
      <SectionTitle>
3.1 Obtaining The Data
</SectionTitle>
      <Paragraph position="0"> We first collected large lists of German and English lemmas from the Celex Database, ((Baayen and Gulikers1995)). We also gathered a short list of Spanish lemmas, all starting with the letter 'a' from the Wiktionary website (Wiktionary) to use as our starting terms. We created our own dictionaries by making use of online dictionaries. In order to obtain the English translations for the German lemmas and vice versa, we queried 'The New English-German Dictionary' site of The Technical Universiy of Dresden  . Finally, we wanted to compare the performance of our automatically-generated Spanish-to-German dictionary with that of a manually-generated Spanish-to-German dictionary, and for this we used a website called 'DIX: Deutsch-Spanisch Woerterbuch'  .Tablea18 gives information about the four dictionaries which we created in order to automatically generate our Spanish-to-Germandictionary. The fifth is the manually-generateddictionaryusedforevaluation.</Paragraph>
      <Paragraph position="2"/>
    </Section>
    <Section position="2" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
3.2 Automatically Generating The Dictionary
</SectionTitle>
      <Paragraph position="0"> For our experiment, we used the method described in  section2toautomaticallyconstructascaled-downversion of a Spanish-to-German dictionary. It contained a14 a14 a15 Spanish terms, all starting with the letter 'a'. To storeandoperateonthedata,weusedtheopensource database program PostgresSQL, version</Paragraph>
      <Paragraph position="2"> ing with the Spanish-to-English dictionary, at each of stages a18 a0 a22 ,we produceda newdictionarytablewith an additional column to the right for the new language. We did this by using the appropriate dictionary to look up the translations for the terms in the old rightmost column, before inserting these translations into a new rightmost column. For example, to create the Spanish-to-English-to-German(SEG) table, we used the English-to-German dictionary to find the translations for the English terms in the Spanish-toEnglish(SE)table,andtheninsertedthesetranslations null into a new rightmostcolumn. We kept producingnew  tablesinthisfashionuntilwehadgeneratedaSpanishto-English-to-German-to-English-to-Spanish(SEGES) null table. Instage a15 ,thefinalstage,weselectedonlythose rows in which the starting and ending Spanish terms were the same. Important characteristics of these dic- null shows that the number of translations-per-term grew and grew from a18 a30 a18 translations in the startingSpanish-to-Englishdictionarytoanenormous a14</Paragraph>
      <Paragraph position="4"> translationsper term in the SEGES table afterstage a22 .</Paragraph>
      <Paragraph position="5"> However,afterstage a15 ,havingselectedonlythoserows with matchingfirst andlast entriesforSpanish,we reducedthenumberoftranslationsbackto a18 a30 a18 perterm.</Paragraph>
    </Section>
    <Section position="3" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
3.3 Evaluation
</SectionTitle>
      <Paragraph position="0"> Having automatically generated the Spanish-to-German dictionary containing a20 a22 a22 unique Spanish terms, we then compared it to the manually-generated Spanish-to-German dictionary (see section 3.1).</Paragraph>
      <Paragraph position="1"> We gave the same initial a14 a14 a15 Spanish terms to the manually-generated dictionary but received translationsforonly a14 a18  .</Paragraph>
      <Paragraph position="2"> The results are summarised in table a22 . We observe  ) for which there was a correspondingentryinourdictionary. Infact, ourdictionary produced more translations-per-term than the manually-generated one. An extra translation may be anerrororitmaynotappearinthemanually-generated dictionary because the manually-generated dictionary istoosparse. Furtherevaluationisrequiredin orderto assesshowmanyoftheextratranslationswereerrors. In conclusion, we find that our automatically-generated dictionary has an adequate but not perfect coverage and very good recall for each term covered withinourdictionary. Asfortheprecisionofthetranslationsfound,weneedmoreinvestigationandperhaps null a more complete manually-generated comparison dictionary. The results might have been even better had it not been for several problems with the four starting dictionaries. Forexample,atranslationforaparticular word could sometimes not be foundas an entry in the next dictionary. This might be because the entry simplywasn'tpresent,orbecauseofdifferentconventions null e.g. listing verbsas &amp;quot;to Z&amp;quot; whenanothersimply gives &amp;quot;Z&amp;quot;. Another cause was differences in font encoding e.g. with German umlauts. Results might also have improved had the starting dictionaries provided more translations per entry term, and had we used part-of-speech information - this was impossible since not all ofthedictionarieslistedpart-of-speech.Allinallgiven the fact that the quality of data with which we started was far from ideal, we believe that our method shows  greatpromiseforsavinghumanlabourintheconstructionoftranslationdictionaries. null</Paragraph>
    </Section>
    <Section position="4" start_page="42" end_page="43" type="sub_section">
      <SectionTitle>
4Conclusion
</SectionTitle>
      <Paragraph position="0"> In this paper we have described a method using one or more intermediary languages to automatically generate a dictionary to translate from one language,  , to another, a1 . The method relies on using dictionaries that can connect  to a1 and back to  via the intermediarylanguage(s). We appliedthe methodto automatically generate a Spanish-to-German dictionary, anddesptitethelimitationsofourstartingdictionaries, the result seems to be reasonablygood. As was stated insection a22 a30 a22 ,wedidnotevaluatewhethertranslations we generated that were not in the gold-standard manual dictionary were errors or good translations. This is essential futurework. We also intend to empirically   testwhathappenswhenfurtherintermediarydictionariesareintroducedintothechain. null We believe that our method can make a great contributiontotheconstructionoftranslationdictionaries. null  Evenifadictionaryproducedbyourmethodisnotconsidered quite complete or accurate enough for general use, it can serve as a very goodstarting point, thereby saving a great deal of human labour - human labour thatrequiresalargeamountoflinguisticexpertise. Our  methodcouldbeusedtoproducetranslationdictionaries for relatively unconnected language groups, most likely by using English as an intermediary language.  Suchtranslationdictionariescouldbeimportantinpromotingcommunicationbetweentheselanguagegroups null andanevermoreglobalisedandinterconnectedworld.</Paragraph>
      <Paragraph position="1"> A final point to make regards applying our method more generally outside of the domain of translation dictionary construction. We believe that our method, whichmakesuseoflinkstructures,couldbeappliedin differentareasinvolvinggraphs.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML