File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1039_intro.xml
Size: 2,501 bytes
Last Modified: 2025-10-06 14:06:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1039"> <Title>Hybrid Approaches to Improvement of Translation Quality in Web-based English-Korean Machine Translation</Title> <Section position="3" start_page="251" end_page="252" type="intro"> <SectionTitle> 2 Domain Recognizer and Korean </SectionTitle> <Paragraph position="0"> sentence style In order to identify the domain of text and connect it to English terminology lexicon and Korean sentence style in Korean generation, we have developed a domain recognizer.</Paragraph> <Paragraph position="1"> We adapted a semi-automated decision tree induction using C4.5 (Quinlan, 1993) among diverse approaches to text categorization such as decision tree induction (Lewis et. al., 1994) and neural networks (Ng et. aL, 1997), because a semi-automated approach showed perhaps the best performance in domain identification according to (Ng et. al., 1997). Twenty-five domains were manually chosen from the categories of awarded Web sites. We collected 0.4 million Web pages by using Web search robot and counted the frequency of words to extract features for domain recognition. The words that appeared more than 200 times were used as features. Besides we added some manually chosen words to features because the features extracted automatically were not able to show the high accuracy.</Paragraph> <Paragraph position="2"> Given an input text, our domain recognizer assigns one or more domains to an input text.</Paragraph> <Paragraph position="3"> The domains can raise the translation quality by activating the corresponding domain-specific terminology and selecting the correct Korean sentence style. For example, given a &quot;driver&quot;, it may be screw driver, taxi driver or device driver program. After domain recognizer determines each domain of input text, &quot;driver&quot; can be translated into its appropriate Korean equivalent. The domain selected by the domain recognizer is able to have a contribution to generate a better Korean sentence style because Korean sentence style can be represented in various ways by the verbal endings relevant to the domain. For example, the formal domains such as technology and law etc. make use of the plain verbal ending like 'ta' because they have carateristics of formality, while the informal domains such as weather, food and fashion etc. are related to the polite verbal ending 'supnita' because they have carateristics of politeness.</Paragraph> </Section> class="xml-element"></Paper>