File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1608_metho.xml
Size: 15,096 bytes
Last Modified: 2025-10-06 14:08:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1608"> <Title>Extending the Coverage of a Valency Dictionary</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Experimental Method </SectionTitle> <Paragraph position="0"> The approach is based on that of Fujita and Bond (2002). For the explanation we assume 1We use the following abbreviations: top: topic postposition; acc: accusative postposition; dat: dative postposition; quot: quotative postposition; NP: noun phrase: Cl: clause; V: verb.</Paragraph> <Paragraph position="1"> 2The subordinate clause is incorrectly translated as a that-clause. This is a bug in the English generation; the Japanese parse and semantic structure are correct.</Paragraph> <Paragraph position="2"> that the source language is Japanese, and the target language is English, although nothing depends on this.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Method of Making New Patterns </SectionTitle> <Paragraph position="0"> Our method is based on two facts: (1) verbs with similar meanings typically have similar valency structures; (2) verbs with identical translations typically have similar meanings. We use three resources: (1) a seed valency dictionary (in this case the verbs from ALT-J/E's valency dictionary, ignoring all idiomatic and adjectival entries |this gave 5,062 verbs and 11,214 valency patterns3) ; (2) a plain bilingual dictionary which contains word pairs without valency information (in our case a combination of ALT-J/E's Japanese-English word transfer dictionary and EDICT (Breen, 1995)); and (3) a source language corpus (mainly newspapers).</Paragraph> <Paragraph position="1"> Our method creates valency patterns for words in the bilingual dictionary whose English translations can be found in the valency dictionary. We cannot create patterns for words with unknown translations. Each combination consists of JU, an Unknown word for which we have no valency information; E, its English translation (or translations); which is linked to one or more valency patterns JV in the valency dictionary. Figure 1 shows the overall ow of creating candidate patterns.</Paragraph> <Paragraph position="2"> For each entry in the plain J-E dictionary If no entries with the same Japanese (JU) exist in the valency dictionary { For each valency entry (JV ) with the same English (E) Create a candidate pattern consisting of JV replaced by JU The de nition of \similar meaning&quot; used to generate new patterns is that they have the same English translation. We had to make this quite loose: any entry with the same En3We call an entry in the valency dictionary (consisting of source and target language subcategorization information and selectional restrictions on the source side) a valency pattern.</Paragraph> <Paragraph position="3"> glish head. Therefore give up and give back are counted as the same entry. This allows for minor inconsistencies in the target language dictionaries. In particular the valency dictionary is likely to include commonly appearing adjuncts and complements that do not normally appear in bilingual dictionaries. For example: iku \go&quot; is translated as to go in EDICT, go in the ALT-J/E word transfer dictionary and NP go from NP to NP in the ALT-J/E valency dictionary (among other translations). To match these entries it is necessary to have some exibility in the English matching.</Paragraph> <Paragraph position="4"> In order to lter out bad candidates, we compare the usage of JV with JU using examples from a corpus. Two judgments are made for each paraphrase pair: is the paraphrase grammatical, and if it is grammatical, are the meanings similar? This judgment can be done by monolingual speakers of the source language.</Paragraph> <Paragraph position="5"> This is done in both directions: rst we nd example sentenes using JU, replacing JU with JV and compare the paraphrased sentences, then we nd sentences for valence patterns using JV , replace with JU and judge the similarity. Figure 2 shows the comparison using paraphrases.</Paragraph> <Paragraph position="6"> In the implementation, we added a pre- lter: reject, for verb pairs that obviously di ered in meaning. This allowed the analysts to immediately reject verb pairs (JU-JV ) that were obviously not similar, and speeded up things</Paragraph> <Paragraph position="8"> ochitsuku \calm down&quot; and JV 2 a8a10a9 a3 a5 teijuu-suru \settle in&quot; are candidates but JV 2 appear because of the polysemy of E.4). The three grammaticality classes are: grammatical, ungrammatical, grammatical in some context.5 Semantic similarity was divided into the following classes: same: JU</Paragraph> <Paragraph position="10"> tences as irrelevant. These were sentences where the verb did not actually appear, but that had been selected due to errors in the morphological analysis.</Paragraph> <Paragraph position="11"> For each candidate pattern JU-E (from JV -E) If JU is obviously di erent to JV \argue against&quot; (their meanings overlap so they are classi ed into other classes in some context.) Next, we give an example of the paraphrasing; for the unknown Japanese word JU</Paragraph> <Paragraph position="13"> exists in the valency dictionary, with the same English translation.</Paragraph> <Paragraph position="14"> We extract 5 sentences from our corpus which use JU, for example (2; slightly simpli ed here), and replace JU with JV (3).</Paragraph> <Paragraph position="15"> jyoushin-shita.</Paragraph> <Paragraph position="16"> The paraphrase (3) is grammatical and the pair (2, 3) have close meanings. This is done for all ve sentences containing JU and then done in reverse for all 5 sentences matching the pattern for JV .</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Experiment </SectionTitle> <Paragraph position="0"> To test the useful range of our algorithm, we considered the coverage of ALT-J/E's valency dictionary on 9 years of Japanese newspaper text (6 years of Mainichi and 3 years of Nikkei) (see Table 1). The valency dictionary had Japanese entries for 4,997 verb types (37.5%), which covered most of the actual words (92.5%).</Paragraph> <Paragraph position="1"> There were 8,304 verbs with no Japanese entry in the valency dictionary. Of those, 4,129 (49.7 %) verbs appear in ALT-J/E's Japanese-English transfer dictionary or EDICT and have a pattern with the same translation in the valency dictionary.6 Most of these, 3,753 (90.9 %) have some examples that can be used to check the paraphrases. We made candidate patterns for these verbs.</Paragraph> <Paragraph position="2"> as nouns, without giving a separate verb entry: e.g., a29 a30 ky od o \cooperation&quot;. We used ALT-J/E's English morphological dictionary and the EDICT part-of-speech codes to create 10,395 new verb entries such as: a29 a30a32a31 a33 ky od o-suru \cooperate&quot;.</Paragraph> <Paragraph position="3"> For the 3,753 target verbs, we did the check using the pre- lter and paraphrasing. The original number of candidates was enormous: 108,733 pairs of JU and JV . Most of these were removed in the pre- ltering stage, leaving 2,570 unknown verbs matching 6,888 verbs in the valency dictionary (in fact, as the prelter check doesn't need the valency patterns, they can be made after this stage). When these were expanded into patterns, they made a total of 8,553 candidate patterns (3.3 patterns/verb) whose semantic similarity was then checked using paraphrases.</Paragraph> <Paragraph position="4"> It took the lexicographer about 7 minutes per verb to judge the tness of the paraphrases; all the rest of the construction was automatic. This is a signi cant speed-up over the 30 minutes normally taken by an expert lexicographer to construct a valency entry.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Evaluation and Results </SectionTitle> <Paragraph position="0"> We evaluated the e ect on translation quality for each new pattern that had at least one paraphrase that was grammatical. There were 6,893 new patterns, for 2,305 kinds of verbs (3.0 patterns/verb). For each verb (JU) we picked two shortish sentences (on average 81.8 characters/sentence: 40 words) from a corpus of 11 years of newspaper text (4 years of Mainichi and 5 of years Nikkei). This corpus had not been used in the paraphrasing stage, i.e., all the sentences were unknown. We tried to get 2 sentences for each verb, but could only nd one sentence for some verbs: this gave a total of 4,367 test sentences.</Paragraph> <Paragraph position="1"> Translations that were identical were marked no change. Translations that changed were evaluated by Japanese native speakers who are uent in English. The new translations were placed into three categories: improved, equivalent and degraded. All the judgments were based on the change in translation quality, not the absolute quality of the entire sentence. We compared the translations with and without the valency patterns. There were two setups. In the rst, we added one pattern at a time to the valency dictionary, so we could get a score for each pattern. Thus verbs with more than one pattern would be tested multiple times. In this case we tested 13,140 di erent sentence/pattern combinations. In the second, we added all the patterns together, and let the system select the most appropriate pattern using the valency information and selectional restrictions. The results of the evaluation are given in Table 2.</Paragraph> <Paragraph position="2"> As can be seen in Table 2, most sentences improved followed by equivalent or no change. Degraded is a minority. There was a clear improvement in the overall translation quality.</Paragraph> <Paragraph position="3"> In particular, the result using all patterns (which is the way the dictionary would normally be used) is better than using one pattern at a time. There are two reasons: (1) Many verbs have di erent alternations. When we used all patterns, we covered more alternations, therefore the system could select the right entry based on its subcategorization. (2) The entries also have di erent selectional restrictions for di erent translations. When we used all patterns, the system could select the best valency entry based on its selectional restrictions. Even without using the full paraphrase data, only the pre- lter and the grammatical judgments, 37.5% of translations improved and only 12.6% degraded, an overall improvement of 24.9%.</Paragraph> <Paragraph position="4"> Now we analyze the reasons for the improved and degraded translations. The reasons for improved: (1) The system was able to translate previously unknown words. The translation may not be the best but is better than an unknown word. (2) A new pattern with a better translation was selected. (3) The sentence was translated using the correct subcategorization, which allowed a zero pronoun to be supplemented or some other improvement. The reasons for degraded: (1) the detailed nuance was lost. For example, a0a2a1 a7a2a30 a5 nade-ageru \brush up&quot; became simply brush. (2) A new pattern was selected whose translation was less appropriate.</Paragraph> <Paragraph position="5"> Further Re nements We then examined the paraphrase data in an attempt to improve the quality even further by ltering out the bad entries. To examine this, we de ned scores for the evaluation categories: improved is +1, no change and equivalent are +0.5 and degraded is -1. Because we used up to two evaluation sentences for each JU, the evaluation score varies between -2 and 2.</Paragraph> <Paragraph position="6"> We expected that restricting the patterns to those with grammatical paraphrases and same or close meaning would improve translation quality. However, as can be seen in Table 3, the distribution of improvements in translation quality did not change signi cantly according to the percentage of paraphrases that were either same or close.</Paragraph> <Paragraph position="7"> One reason for the lack of correlation of the results is that change in translation quality of an example sentence is a very blunt instrument to evaluate the tness of a lexical entry. Particularly for complicated sentences, the parse may improve locally, but the translation degrade overall. In particular, a translation that was incomprehensible could become comprehensible but with the wrong meaning: the worst possible result for a translation system. However, even allowing for these imperfections, the result still holds: a test for paraphrasability on a small set of sentences is not a useful indicator of whether two verbs have the same valency. One reason for the lack of utility of the paraphrase tests is that the example sentences were chosen randomly: there is no guarantee that they show either typical usage patterns or the full range of use.</Paragraph> <Paragraph position="8"> We were actually suprised by these results, so we tried various other combinations of grammaticality and semantic similarity, and found the same lack of e ect. We also tried a mono-lingual similarity measure based on word-vector distances taken from word-de nitions and corpora (Kasahara et al., 1997). This measure was also not e ective. We then looked at our inital grammaticality lter (at least one paraphrase that was grammatical). Evaluating a small sample of verbs that failed this test (evaluated on 50 sentences), we found that only 16% improved and 24% were degraded. Therefore this test was useful. However, it only removed 265 verbs (less than 10%). If we had left them in, the extrapolated result (testing each pattern individually) is an improvement of 32% versus a degradation of 16%, which should improve further if all patterns are tested together.</Paragraph> <Paragraph position="9"> Next, we looked at the target language translation (used to judge that the meanings are similar). We made the conditions used to match the English stricter (e.g., head match plus n words di erent for various n), and found no useful difference. null Finally, we looked at the source of the English used to nd the candidate patterns. Translations with a very low preference value in our system dictionary (i.e., the 12th best or lower translation for a word with 12 translations or more) were signi cantly worse. However there were only 4 such verbs, so it is not a very useful lter in practice. An interesting trend we did nd was that translations found in EDICT were signi cantly better than those found in ALT-J/E's transfer dictionary (see Table 4).</Paragraph> <Paragraph position="10"> The main reason that EDICT gave such good results was that words with no entry in ALT-J/E's transfer dictionary could not be translated at all by the default system: the translated system included the Japanese verb as is. Building new patterns from EDICT allowed the system to nd a translation in these cases.</Paragraph> </Section> class="xml-element"></Paper>