File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2206_evalu.xml
Size: 10,622 bytes
Last Modified: 2025-10-06 13:59:13
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2206"> <Title>A Method of Creating New Bilingual Valency Entries using Alternations</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> A total of 196 new entries were created for 62 verbs (25 Vi + 37 Vt) using the method outlined in x 4. We evaluated the quality by using the new entries in a machine translation system.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Translation-Based Evaluation </SectionTitle> <Paragraph position="0"> We evaluated the quality of the created entries in a translation-based regression test. We got two example sentences using each verb from Japanese newspapers and web pages: this gave a total of 124 test sentences. We translated the test sentences using ALT-J/E, both with (with) and without (w/out) the new entries.</Paragraph> <Paragraph position="1"> Translations that were identical were marked no change (the system translates with a simple word dictionary if it has no valency entry). Translations that changed were evaluated by people uent in both languages (two thirds by Japanese native speakers and one third by an English native speaker, not the authors). The translations were randomly presented to the evaluators labeled by A and B. Therefore evaluators did not know whether a translation is with or w/out. The translations were placed into three categories: (i) A is better than B, (ii) A and B are equivalent in quality, and (iii) A is worse than B.</Paragraph> <Paragraph position="2"> For example in (2), the evaluation was (iii). In this case A is w/out and B is with, so the new entry has improved the translation.</Paragraph> <Paragraph position="3"> in a blanket.</Paragraph> <Paragraph position="4"> Table 3 shows the evaluation results, split into those for transitive and intransitive verbs. The most common result was that the new translation was better (46.0%). The quality was equivalent for 13.7% and worse for 14.5%. The overall improvement was 31.5% (46.0 14.5).</Paragraph> <Paragraph position="5"> Extending the dictionary to include the missing alternations gave a measurable improvement in translation quality.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Lexicographer's Evaluation </SectionTitle> <Paragraph position="0"> A manual analysis of a subset of the created entries was carried out by expert lexicographers familiar with the seed lexicon (not the authors).</Paragraph> <Paragraph position="1"> They found three major source of errors. The rst was that alternation is a sense based phenomenon. As we built alternations for all patterns in the seed dictionary, this resulted in the creation of some spurious patterns. An example of an impossible entry is a22 a8a24a23 a6 a8 torawareru \be caught&quot;, translated as be picked up with the inappropriate semantic restriction hconcrete,material-phenomenoni on the subject. However, another good entry was cre-Creating Intransitive entries: if the original subcat has a control verb else (original head is Vt) if Vt undergoes the S = O alternation</Paragraph> <Paragraph position="3"> (A injure O in X ) S be injured in X ) We made a special rule for the English Vt have. In this case the intransitive alternation will be There is: for example, a0a2a1a4a3a6a5a8a7 A have O on X ) a0a9a1a11a10a8a7 There be S hpeople,animal,artifacti, and this was judged to be good.</Paragraph> <Paragraph position="4"> The second source of errors was in the selectional restrictions. In around 10% of the entries, the lexicographers wanted to change the SRs. The most common change was to make the SR for A more speci c than the default of agent. The third source of errors was in the English translation, where the lexicographers sometimes preferred a di erent verb as a translation, rather than a regular alternation.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 6 Discussion and Future Work </SectionTitle> <Paragraph position="0"> The above results show that alternations can be used to create rich and useful bilingual entries.</Paragraph> <Paragraph position="1"> In this section we discuss some of the reasons for errors, and suggest ways to improve and expand our method.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.1 Rejecting Innappropriate Candidates </SectionTitle> <Paragraph position="0"> To make the construction fully automatic, a test for whether the Japanese side of the entry is appropriate or not is required.</Paragraph> <Paragraph position="1"> One possibility is to add a corpus based lter: if no examples can be found that match the selectional restrictions for an entry, then it should be rejected. This could be done for each language individually. The problem with this approach is that many of the entries we created were for infrequent verbs. The average frequency in 16 years of Japanese newspaper text was only 173, and 22 verbs never appeared, although all were familiar to native speakers. We can, of course, use the web to alleviate the data sparseness problem.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.2 Improving the English Translations </SectionTitle> <Paragraph position="0"> In this section we compare the distribution of the di erent types of translations for the reference data (x 3.1) and the entries created by our method (x 3.2). The breakdown is shown in Table 4. The rst three rows show entries with the same English main verb.</Paragraph> <Paragraph position="1"> One major discrepancy is in the frequency of the control verb construction. In Vi, no original transitive entry used control verbs. In general, when lexicographers create an entry, they prefer a simple entry to a synthetic one. Looking at the linguists' reference data, about 6.5% of the examples used control verbs. In the constructed data, 66.1% (77 entries) use the control verb make, more than any other category. For example, when the original intransitive entry is N1 be exhausted, exhausted is de ned as adjective in the existing dictionary. So we create a new entry N1 make N2 exhaustedadj. However, there is a transitive verb exhaust, and it was preferred by the lexicographers: N1 exhaust N2. The algorithm needs to optionally convert adjectives to verbs in cases where there is overlap between the adjective and past participle.</Paragraph> <Paragraph position="2"> Finally, we consider those Japanese alternations where the transitive and intransitive alternatives need translations with di erent English main verbs. A good example of this is Vi a16 a11 a17 using our method. Even with reliable English syntactic data, it would be hard to rule out pass away as a possible transitive verb or lose as an intransitive. They can only be ruled out by using data linking the subcat with the meaning, and this would need to be linked to the Japanese verbs' meanings. This may become possible with larger linked multi-lingual dictionaries, such as those under construction in the Papillon project,5 but is not now within our reach.</Paragraph> <Paragraph position="3"> In summary, we could improve the construction of the English translations by using richer English information, especially about past-participles or verb senses.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.3 Usage as a Lexical/Translation Rule </SectionTitle> <Paragraph position="0"> Although we have investigated the use of alternations in lexicon construction, the algorithms could also be used directly, either as lexical/translation rules or to generate transitive and intransitive entries from a common underlying representation. For example, Shirai et al. (1999) uses the existing entries and lexical rules deploying them to translate causatives and passives (including adversative passives) from Japanese to English. Trujillo (1995) showed a method to apply lexical rules for word translation. That is, they expand the vocabulary using prepared lexical rules for each language, and create links for translation between the lexical rules of a pair of languages. Dorr (1997) and Baldwin et al. (1999) generate both alternates from a single underlying representation.</Paragraph> <Paragraph position="1"> Our proposed method could partially be implemented as a lexical or a translation rule. But not all the word senses alternate (x 4.2), and not all the target language entries are regularly translated by the same head (x 3). Further many of the rules mix lexical and syntactic information, making them quite complicated. Because of that, it is easier to expand out the rules beforehand and enter them into the system.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.4 Further Work </SectionTitle> <Paragraph position="0"> In this paper, we targeted native Japanese verbs only. ALT-J/E already has a very high coverage of native Japanese verbs. However, even in this case, we could increase the cover of this alternation from 85% to 98% (442 out of 449 alternation pairs now in the dictionary). Most valency dictionaries or new language pairs have less cover, and so will get more results. It is also possible to use this method so as to only create half the entries by hand, and then to automatically make the alternating halves (although not all the created entries will be perfect).</Paragraph> <Paragraph position="1"> In addition to the native Japanese verbs, there are many Sino-Japanese verbal nouns that un- null The products are sold out.</Paragraph> <Paragraph position="2"> ALT-J/E's Japanese dictionary has about 2,400 verbal nouns which have usage as both transitive and intransitive. Of these only 536 are in the valency dictionary. Our next plan is to add them all to the valency dictionary, using alternations to make the process more e cient and consistent.</Paragraph> <Paragraph position="3"> Another extension is to apply the method to other alternations, using either linguists' data or automatically acquired alternations (Oishi and Matsumoto, 1997; Furumaki and Tanaka, 2003; McCarthy, 2000). In particular, S = O alternations make up only 34% of those discovered by Bond et al. (2002), we intend to investigate the alternations that make up the remainder.</Paragraph> </Section> </Section> class="xml-element"></Paper>