XML Viewer - p96-1023

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/p96-1023_evalu.xml
Size: 5,978 bytes
Last Modified: 2025-10-06 14:00:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1023">
  <Title>INVITED TALK Head Automata and Bilingual Tiling: Translation with Minimal Representations</Title>
  <Section position="9" start_page="173" end_page="174" type="evalu">
    <SectionTitle>
6 Experimental System
</SectionTitle>
    <Paragraph position="0"> We have built an experimental translation system using the monolingual and translation models described in this paper. The system translates sentences in the ATIS domain (Hirschman et al. 1993) between English and Mandarin Chinese. The translator is in fact a subsystem of a speech translation prototype, though the experiments we describe here are for transcribed spoken utterances. (We informally refer to the transcribed utterances as sentences.) The average time taken for translation of sentences (of unrestricted length) from the ATIS corpus was around 1.7 seconds with approximately 0.4 seconds being taken by the analysis algorithm and 0.7 seconds by the transfer algorithm.</Paragraph>
    <Paragraph position="1"> English and Chinese lexicons of around 1200 and 1000 words respectively were constructed. Altogether, the entries in these lexicons made reference to around 200 structurally distinct head automata.</Paragraph>
    <Paragraph position="2"> The transfer lexicon contained around 3500 paired graph fragments, most of which were used in both transfer directions. With this model structure, we tried a number of methods for assigning cost functions. The nature of the training methods and their corresponding cost functions meant that different amounts of training data could be used, as discussed further below.</Paragraph>
    <Paragraph position="3"> The methods make use of a supervised training set and an unsupervised training set, both sets being chosen at random from the 20,000 or so ATIS sentences available to us. The supervised training set comprised around 1950 sentences. A subcollection of 1150 of these sentences were translated by the system, and the resulting translations manually classified as 'good' (800 translations) or 'bad' (350 translations). The remaining 800 supervised training set sentences were hand-tagged for prepositional attachment points. (Prepositional phrase attachment is a major cause of ambiguity in the ATIS corpus, and moreover can affect English-Chinese translation, see Chen and Chen 1992.) The attachment information was used to generate additional negative and positive counts for dependency choices. The unsupervised training set consisted of approximately 13,000 sentences; it was used for automatic training (as described under 'Reflexive Training' above) by translating the sentences into Chinese and back to English.</Paragraph>
    <Paragraph position="4"> A. Qualitative Baseline: In this model, all choices were assigned the same cost except for irregular events (such as unknown words or partial analyses) which were all assigned a high penalty cost.</Paragraph>
    <Paragraph position="5"> This model gives an indication of performance based solely on model structure.</Paragraph>
    <Paragraph position="6"> B. Probabilistic: Counts for choices leading to good translations for sentences of the supervised training corpus, together with counts from the manually assigned attachment points, were used to compute negated log probability costs.</Paragraph>
    <Paragraph position="7"> C. Discriminative: The positive counts as in the probabilistic method, together with corresponding negative counts from bad translations or incorrect attachment choices, were used to compute log likelihood ratio costs.</Paragraph>
    <Paragraph position="8"> D. Normalized Distance: In this fully automatic method, normalized distance costs were computed from reflexive translation of the sentences in the unsupervised training corpus. The translation runs were carried out with parameters from method A.</Paragraph>
    <Paragraph position="9"> E. Bootstrapped Normalized Distance: The same as method D except that the system used to carry out the reflexive translation was running with parameters from method C.</Paragraph>
    <Paragraph position="10"> Table 1 shows the results of evaluating the performance of these models for translating 200 unrestricted length ATIS sentences into Chinese. This was a previously unseen test set not included in any of the training sets. Two measures of translation acceptability are shown, as judged by a Chinese speaker. (In separate experiments, we verified that the judgments of this speaker were near the average of five Chinese speakers). The first measure, &amp;quot;meaning and grammar&amp;quot;, gives the percentage of sentence translations judged to preserve meaning without the introduction of grammatical errors. For the second measure, &amp;quot;meaning preservation&amp;quot;, grammatical errors were allowed if they did not interfere with meaning (in the sense of misleading the hearer). In the table, we have grouped together methods A and D for  which the parameters were derived without human supervision effort, and methods B, C, and E which depended on the same amount of human supervision effort. This means that side by side comparison of these methods has practical relevance, even though the methods exploited different amounts of data. In the case of E, the supervision effort was used only as an oracle during training, not directly in the cost computations.</Paragraph>
    <Paragraph position="11"> We can see from Table 1 that the choice of method affected translation quality (meaning and grammar) more than it affected preservation of meaning. A possible explanation is that the model structure was adequate for most lexical choice decisions because of the relatively low degree of polysemy in the ATIS corpus. For the stricter measure, the differences were statistically significant, according to the sign test at the 5% significance level, for the following comparisons: C and E each outperformed B and D, and B and D each outperformed A.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML