File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/c02-1072_concl.xml

Size: 2,759 bytes

Last Modified: 2025-10-06 13:53:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1072">
  <Title>A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper describes a comparative evaluation of the accuracy performance in translation selection based on data-driven models. LSA and PLSA were utilized for implementation of the models, which are mainly used in estimating similarity between words. And a manuallybuilt grammatical relation dictionary was used for the purpose of appropriate translation selection of a word. To break down the data sparseness problem occurring when the dictionary is used, we utilized similarity measurements schemed out from the models. When an argumentwordisnotincludedinthedictionary, themost k similarwordstothewordarediscovered in the dictionary, and then the meaning of thegrammatically-relatedclassforthemajority of the k words is selected as the translation of an input word.</Paragraph>
    <Paragraph position="1"> We evaluated the accuracy ratio of LSA and PLSA comparatively and classifled the experiments with criteria of the values of k and the grammatical relations. We acquired up to 20% accuracy improvement, compared to direct matching to a collocation dictionary. PLSA showed the ability to select translation better than LSA, up to 3%. The value of k is strongly related with PLSA in translation accuracy, not too with LSA. That means the latent semantic space of PLSA has more sound distribution of latentsemanticsthanthatofLSA.Eventhough longer learning time than LSA, PLSA is beneflcial in translation accuracy and distributional soundness. A distributional soundness is expected to have better performance as the size of examples is growing.</Paragraph>
    <Paragraph position="2"> However, we should resolve several problems raised during the experiment. First, a robust stemming tool should be exploited for more accurate morphology analysis. Second, the optimal value of k should be obtained, according to thesizeofexamples. Finally,weshoulddiscover more speciflc contextual information suited to this type of problem. While simple text could be used properly in IR, MT should require another type of information.</Paragraph>
    <Paragraph position="3"> The data-driven models could be applied to other sub-flelds related with semantics in machine translation. For example, to-inflnitive phrase and preposition phrase attachment disambiguationproblemcanalsoapplythesemod- null els. And syntactic parser could apply the models for improvement of accurate analysis by usingsemanticinformationgeneratedbythemod- null els.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML