File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1079_intro.xml
Size: 3,900 bytes
Last Modified: 2025-10-06 14:00:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1079"> <Title>Representation and Recognition Method for Multi-Word Translation Units in Korean-to-Japanese MT System</Title> <Section position="2" start_page="0" end_page="544" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> As a transfer problem in a machine translation (MT), lexical and structural differences exist between source and target languages, which requires l-n, m-n, or n-1 mapping strategies for machine translation system. For such mapping strategies, we need to treat several (n, or m) words (or morphemes) as a single translation unit. Although some researches (D.Santos,1990; Linden E.,1990; Yoon Sung Hoe, 1992; Ha Gyu Lee, 1994; D.Arnold,1994) employ the term &quot;idiom&quot; for these units, we prefer MWTU (Multi-Word Translation Unit) because it is a more general and broader term for MT environment.</Paragraph> <Paragraph position="1"> Up to now, some reseamh has focused on recognition and transfer of MWTUs, although very little research has been undertaken for Korean-to-Japanese machine translation systems (Seen-He Kim,1997). In previous researches, some tended to simplify the problem by treating only special types of MWTUs, while others had some recognition errors and took too much recognition time because they did not restrict the recognition scope (D.Santos,1990; Yoon Sung Hee,1992; Ha Gyu Lee, 1994; Seen-He Kim, 1997).</Paragraph> <Paragraph position="2"> For a Korean-to-English MT, Lee and Kim (Ha Gyu Lee,1994) uses only weak restrictions like adjacent inforlnation for recognition scope. However, their method needs stronger restrictions to resolve recognition errors and to speed up the process. Although some differences exist depending on which kinds of source and target languages are dealt with, MWTUs in Korean-to-Japanese MT frequently have their component words close together, so that one can predict the location of their separated component words. For this reason, we can enhance the recognition accuracy and time effectively by restricting the recognition scope according to the characteristics of an MWTU rather than taking the whole sentence as the scope.</Paragraph> <Paragraph position="3"> Moreover, the method by Lee and Kim (Ha Gyu Lee,1994) deals with only surface-level consistency without considering word order because Korean has ahnost free word order. It is obvious that the method can deal with variable word-order MWTUs, but some incorrect recognition results arc possible whcn meaning changes according to word order. Because MWTUs to be treated in Korean-to-Japanese MT have an almost fixed word order sequence, their meaning may vary if the word order is changed. In (1), both sentences have the same lexical words (or morphemes), but while the first sentence must be treated as an MWTU, the second, which has the different sequence from the first, does not have the meaning of an MWTU. In (1), the words surrounded with a box are an essential component morpheme for an MWTU.</Paragraph> <Paragraph position="4"> (big) (nose) (get hurt) /*(1) had a bit~)erience */ (nose) (get hurt) (big) /* It is serious (that I) got hurt in my nose */ In this paper, to solve the word order problem and thus enhance a recognition accuracy and time for MWTUs, we fix the word order in an MWTU and define the recognition scope of component words according to their characteristics. Based on it, then we propose a representation and recognition method of MWTUs for a Korean-to-Japanese MT system.</Paragraph> <Paragraph position="5"> In the rest of this paper, details will be presented about lhese proposed ideas, logclher with some evalualion results. For representing Korean and Japanese expressions, the 1994-SK (ROK Ministry of Education) and the Kunrei Romanization systems are used respectively.</Paragraph> </Section> class="xml-element"></Paper>