File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/p04-1022_evalu.xml
Size: 7,803 bytes
Last Modified: 2025-10-06 13:59:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1022"> <Title>Collocation Translation Acquisition Using Monolingual Corpora</Title> <Section position="5" start_page="22" end_page="22" type="evalu"> <SectionTitle> 5 Experiments and evaluation </SectionTitle> <Paragraph position="0"> To evaluate the effectiveness of our methods, two experiments have been conducted. The first one compares our method with three other monolingual corpus based methods in triple translation. The second one evaluates the accuracy of the acquired collocation translation.</Paragraph> <Section position="1" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 5.1 Dependency triple translation </SectionTitle> <Paragraph position="0"> Triple translation experiments are conducted from Chinese to English. We randomly selected 2000 Chinese triples (whose frequency is larger than 2) from the dependency triple database. The standard translation answer sets were built manually by three linguistic experts. For each Chinese triple, its English translation set contain English triples provided by anyone of the three linguists. Among 2000 candidate triples, there are 101 triples that can't be translated into English triples with same relation. For example, the Chinese triple (Jiang , VO, Jie Qian ) should be translated into &quot;bargain&quot;. The two words in triple cannot be translated separately. We call this kind of collocation translation no-compositional translations. Our current model cannot deal with this kind of translation. In addition, there are also 157 error dependency triples, which result from parsing mistakes. We filtered out these two kinds of triples and got a standard test set with 1,742 Chinese triples and 4,645 translations in total.</Paragraph> <Paragraph position="1"> We compare our triple translation model with three other models on the same standard test set with the same translation dictionary. As the These two dictionaries are built by Harbin Institute of Technology and Microsoft Research respectively.</Paragraph> <Paragraph position="2"> translation database.</Paragraph> <Paragraph position="3"> baseline experiment, Model A selects the highest-frequency translation for each word in triple; Model B selects translation with the maximal target triple probability, as proposed in (Dagan 1994); Model C selects translation using both language model and translation model, but the translation probability is simulated by a similarity score which is estimated from monolingual corpus using mutual information measure (Zhou et al., 2001). And our triple translation model is model D. Suppose ),,( crcc ctri = is the Chinese triple to be translated. The four compared models can be formally expressed as follows: The evaluation results on the standard test set are shown in Table 4, where coverage is the percentages of triples which can be translated. Some triples can't be translated by Model B, C and D because of the lack of dictionary translations or data sparseness in triples. In fact, the coverage of Model A is 100%. It was set to the same as others in order to compare accuracy using the same test set. The oracle score is the upper bound accuracy under the conditions of current translation dictionary and standard test set. Top N accuracy is defined as the percentage of triples whose selected top N translations include correct translations. We can see that both Model C and Model D achieve better results than Model B. This shows that the translation model trained from monolingual corpora really helps to improve the performance of translation. Our model also outperforms Model C, which demonstrates the probabilities trained by our EM algorithm achieve better performance than heuristic similarity scores. In fact, our evaluation method is very rigorous. To avoid bias in evaluation, we take human translation results as standard. The real translation accuracy is reasonably better than the evaluation results. But as we can see, compared to the oracle score, the current models still have much room for improvement. And coverage is also not high due to the limitations of the translation dictionary and the sparse data problem.</Paragraph> </Section> <Section position="2" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 5.2 Collocation translation extraction </SectionTitle> <Paragraph position="0"> 47,632 Chinese collocation translations are extracted with the method proposed in section 4.</Paragraph> <Paragraph position="1"> We randomly selected 1000 translations for evaluation. Three linguistic experts tag the acceptability of the translation. Those translations that are tagged as acceptable by at least two experts are evaluated as correct. The evaluation results are shown in Table 5.</Paragraph> <Paragraph position="2"> We can see that the extracted collocation translations achieve a much better result than triple translation. The average accuracy is 63.20% and the collocations with relation AN achieve the highest accuracy of 68.15%. If we only consider those Chinese collocations whose translations are also English collocations, we obtain an even better accuracy of 72.16% as shown in the last row of Table 5. The results justify our idea that we can acquire reliable translation for collocation by making use of triple translation model in two directions.</Paragraph> <Paragraph position="3"> These acquired collocation translations are very valuable for translation knowledge building.</Paragraph> <Paragraph position="4"> Manually crafting collocation translations can be time-consuming and cannot ensure high quality in a consistent way. Our work will certainly improve the quality and efficiency of collocation translation acquisition.</Paragraph> </Section> <Section position="3" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 5.3 Discussion </SectionTitle> <Paragraph position="0"> Although our approach achieves promising results, it still has some limitations to be remedied in future work.</Paragraph> <Paragraph position="1"> (1) Translation dictionary extension Due to the limited coverage of the dictionary, a correct translation may not be stored in the dictionary. This naturally limits the coverage of triple translations. Some research has been done to expand translation dictionary using a non-parallel corpus (Rapp, 1999; Keohn and Knight, 2002). It can be used to improve our work.</Paragraph> <Paragraph position="2"> (2) Noise filtering of parsers Since we use parsers to generate dependency triple databases, this inevitably introduces some parsing mistakes. From our triple translation test data, we can see that 7.85% (157/2000) types of triples are error triples. These errors will certainly influence the translation probability estimation in the training process. We need to find an effective way to filter out mistakes and perform necessary automatic correction.</Paragraph> <Paragraph position="3"> (3) Non-compositional collocation translation.</Paragraph> <Paragraph position="4"> Our model is based on the dependency correspondence assumption, which assumes that a triple's translation is also a triple. But there are still some collocations that can't be translated word by word. For example, the Chinese triple (Fu You , VO, Cheng Xiao ) usually be translated into &quot;be effective&quot;; the English triple (take, VO, place) usually be translated into &quot;Fa Sheng &quot;. The two words in triple cannot be translated separately. Our current model cannot deal with this kind of non-compositional collocation translation. Melamed (1997) and Lin (1999) have done some research on non-compositional phrases discovery. We will consider taking their work as a complement to our model.</Paragraph> </Section> </Section> class="xml-element"></Paper>