File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/a94-1005_evalu.xml
Size: 2,680 bytes
Last Modified: 2025-10-06 14:00:16
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1005"> <Title>Machine Translation of Sentences with Fixed Expressions</Title> <Section position="7" start_page="30" end_page="32" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="30" end_page="30" type="sub_section"> <SectionTitle> 5.1 Extracting fixed sentences </SectionTitle> <Paragraph position="0"> The parameters for EXTRA were selected as: P1) 3 to 6 words in fixed patterns P2) more than 10 times</Paragraph> <Paragraph position="2"> To satisfy conditions P1 and P2, about 92,000 fixed patterns were collected from AP wire-service news stories from a two-year period, which include about 1.6 million sentences. Using these fixed patterns, about 21,000 fixed sentences were extracted under the condition P3. The experiment was not limited to economic news stories. Examples of the extracted results are shown in Appendix.</Paragraph> <Paragraph position="3"> Since most of the sentences are economic ones with many idiomatic expressions, EXTRA would be a good method enough to extract fixed sentences.</Paragraph> </Section> <Section position="2" start_page="30" end_page="30" type="sub_section"> <SectionTitle> 5.2 Production of STRA data </SectionTitle> <Paragraph position="0"> The 388 most frequently occurring economics-related fixed sentences were manually sampled from the 21,000 fixed sentences. After manually translating them into Japanese, STRA data was produced by DTRA.</Paragraph> <Paragraph position="1"> While most of CFG rules in the STRA data include variables, a few do not, such as for &quot;Gold prices were mixed.&quot;</Paragraph> </Section> <Section position="3" start_page="30" end_page="32" type="sub_section"> <SectionTitle> 5.3 Experiment for ENTS </SectionTitle> <Paragraph position="0"> A series of experiments was conducted using the STRA data discussed in Section 5.2 to evaluate the accuracy of ENTS.</Paragraph> <Paragraph position="1"> Table 1 and 2 show each process's volume and translation accuracy, respectively for two data sets: Datal includes 193 economic sentences used to tune to the CFG rules of Process 2, and data2 includes 167 sentences which were not used in the tuning.</Paragraph> <Paragraph position="2"> About 30% of each data set is translated in Process 1 and its translation accuracy is 100% for both cases. The translation accuracy of Process 2 for data2 is so high as for datal, although Process 2 is not tuned to data2. The overall translation accuracy increases from about 20% with our conventional MT system to about 70%.</Paragraph> </Section> </Section> class="xml-element"></Paper>