XML Viewer - h01-1002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1002_metho.xml
Size: 4,316 bytes
Last Modified: 2025-10-06 14:07:35
<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1002">
  <Title>Translating Hong Kong News Training News News News Legal LangModel Legal News Prior Legal</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. EXPERIMENTAL DESIGN
</SectionTitle>
    <Paragraph position="0"> The primary purpose of this experiment was to determine the effect of each enhancement by operating with various subsets of the enhancements. Since it rapidly becomes impractical to test all possible combinations, we opted for the following test conditions:  1. baseline: parallel corpus segmented with the LDC segmenter and LDC dictionary/glossary 2. baseline plus improved segmenter 3. baseline plus improved segmenter and term finder 4. baseline plus improved segmenter and statistical dictionary 5. baseline plus improved segmenter, term finder, and statistical dictionary  For training, we had available two parallel Chinese-English corpora distributed by the LDC: the complete Hong Kong legal code (after cleaning: 47.86 megabytes, 5.5 million English words, 9 million Chinese characters) where 85% of the content (by sentence) is unique, and a collection of Hong Kong news articles (after cleaning: 24.58 megabytes, 2.67 million English words, 4.5 million Chinese characters). In addition, LDC distributes a bilingual dictionary/phrasebook, which we also used.</Paragraph>
    <Paragraph position="1"> To determine the effects of varying amounts of training data on overall performance, we divided the bilingual training corpus into ten nearly equal slices. Each test condition was then run ten times, each time increasing the number of slices used for training the system. After each training pass, the test sentences were translated and the system's performance evaluated automatically; selected points were then manually evaluated for translation quality.</Paragraph>
    <Paragraph position="2"> The automatic performance evaluation measured coverage of the input and average phrase length. Coverage is the percentage of the input text for which a translation is produced by a particular translation method (since the EBMT engine does not generally produce hypotheses that cover every word of input), while average phrase length is a crude indication of translation quality - the longer the phrase that is translated, the more context is incorporated and the less likely it is that the wrong sense will be used in the translation or that (for EBMT) the alignment will be incorrect. Since the dictionary and glossary remain constant for a given test condition, only the EBMT coverage will be presented.</Paragraph>
    <Paragraph position="3"> Manual grading of the output was performed using a web-based system with which the graders could assign one of three scores (&amp;quot;Good&amp;quot;, &amp;quot;OK&amp;quot;, &amp;quot;Bad&amp;quot;) in each of two dimensions: grammatical correctness and meaning preservation. This type of quality scoring is commonly used in assessing translation quality, and is used by other TIDES participants. Fifty-two test sentences were translated for each of four points from the automated evaluation and these sets of four alternatives presented to the graders. The four points chosen were the baseline system with 100% of the training corpus, the full system with 20% and 100% training, and the full system trained on a corpus of Hong Kong news text (cross-domain); only four points were selected due to the difficulty and expense of obtaining large numbers of manual quality judgements.</Paragraph>
    <Paragraph position="4"> To assess the performance of the system in a different domain, as well as the effect of the trigram language model on the selection of translated fragments for the final translation, we obtained manual judgements for 44 sentences on an additional four test conditions, each trained with the entire available parallel text and tested on Hong Kong news text rather than legal sentences. These points were the cross-domain case (trained on the legal corpus) and three different language models for within-domain training: an English language model derived from the legal corpus, one derived from the news corpus, and a pre-existing model generated from two gigabytes of newswire and broadcast news transcriptions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML