XML Viewer - p98-2193

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2193_metho.xml
Size: 12,856 bytes
Last Modified: 2025-10-06 14:15:02
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2193">
  <Title>Learning Tense Translation from Bilingual Corpora</Title>
  <Section position="4" start_page="0" end_page="1183" type="metho">
    <SectionTitle>
2 Words Are Not Enough
</SectionTitle>
    <Paragraph position="0"> Often, sentence meaning is not compositional but arises from combinations of words (1).</Paragraph>
    <Paragraph position="1">  (1) a. Ich habe ihn gestern gesehen. I have him yesterday seen I saw him yesterday.</Paragraph>
    <Paragraph position="2"> b. Ich schlage Montag vor. I beat Monday forward I suggest Monday.</Paragraph>
    <Paragraph position="3"> c. Ich mSchte mich beschweren. I 'd like to myself weigh down  I'd like to make a complaint.</Paragraph>
    <Paragraph position="4"> For translation, the discontinuous words must be amalgamated into single semantic items. Single words or pairs of lemma and part of speech tag (L-POS pairs) are not appropriate. To verify this claim, we aligned the L-POS pairs of the Verbmobil corpus using the completely language-independent method of Dagan et al. (1993). Below find the results for sehen 1 (see) in order of frequency and some frequent alignments for reflexive pronouns.</Paragraph>
    <Paragraph position="6"/>
  </Section>
  <Section position="5" start_page="1183" end_page="1184" type="metho">
    <SectionTitle>
3 Partial Parsing
</SectionTitle>
    <Paragraph position="0"> A full syntactic analysis of the sort of unrestricted spoken language text found in the Verbmobil corpus is still beyond reach. Hence, we took a partial parsing approach.</Paragraph>
    <Section position="1" start_page="1183" end_page="1183" type="sub_section">
      <SectionTitle>
3.1 Complex Verb Predicates
</SectionTitle>
      <Paragraph position="0"> Both German and English exhibit complex verb predicates (CVPs), see (2). Every verb and verb particle belongs to such a CVP and there is only one CVP per clause.</Paragraph>
      <Paragraph position="1"> (2) He would not have called me up.</Paragraph>
      <Paragraph position="2"> The following two grammar fragments describe the relevant CVP syntax for English and German. Every auxiliary verb governs only one verb, so the CVP grammar is basically 2 regular and implementable with finite-state devices.</Paragraph>
      <Paragraph position="4"> English CVPs are left-headed, while German CVPs are partly left-, partly right-headed.</Paragraph>
      <Paragraph position="6"> Er wird es getan haben miissen he will it done have must He will have to have done it.</Paragraph>
      <Paragraph position="7">  other CVPs and partially fronted verb complexes (3). (3) Versuchen h/itte ich es schon gerne wollen.  try 'd have I it liked to I'd have liked to try it.</Paragraph>
    </Section>
    <Section position="2" start_page="1183" end_page="1183" type="sub_section">
      <SectionTitle>
3.2 Verb Form Subcategorization
</SectionTitle>
      <Paragraph position="0"> Auxiliary verbs form a closed class. Thus, the set sub(v) of infinite verb forms for which an auxiliary verb v subcategorizes can be specified by hand. English and German auxiliary verbs govern the following verb forms.</Paragraph>
    </Section>
    <Section position="3" start_page="1183" end_page="1184" type="sub_section">
      <SectionTitle>
3.3 Transducers
</SectionTitle>
      <Paragraph position="0"> Two partial parsers (rather: transducers) are used to detect English and German CVPs and to translate them into predicate argument structures (verb chains). The parsers presuppose POS tagging and lemmatization. A data base associates verbs v with sets mor(v) of possible tenses or infinite verb forms.</Paragraph>
      <Paragraph position="1"> Let m = \[{mor(v) : Verb vii andn = I{sub(v): Verb v }\[. Then the English CVP parser needs n + 1 states to encode which verb forms, if any, are expected by a preceding auxiliary verb.</Paragraph>
      <Paragraph position="2"> Verb particles are attached to the preceding verb. The German CVP parser is more complicated, but also more restrictive as all verbs in a verb complex (VC) must be adjacent. It operates in left-headed (S) or right-headed mode (VC). In VC-mode (i.e. inside VCs) the order of the verbs put on the output tape is reversed.</Paragraph>
      <Paragraph position="3"> In S-mode, n + 1 states again record the verb form expected by a preceding finite verb Vfi n-VC-mode is entered when an infinite verb form is encountered. A state in VC-mode records the verb form expected by Vii n (n + 1), the infinite verb form of the last verb encountered (rn), and the verb form expected by the VC verb, if the VC consists of only one verb (n + 1). So there are m * (n + 1) 2 states. As soon as a non-verb is encountered in VC-mode or the verb form of the previous verb does not fit the subcategorization requirements of the current verb, a test is performed to see if the verb form of the last verb</Paragraph>
      <Paragraph position="5"> in VC fits the verb form required by Vfin. If it does or there is no such finite verb, one CVP has been detected. Else Vfin forms a separate CVP.</Paragraph>
      <Paragraph position="6"> In case the VC consists of only one verb that can be interpreted as finite, the expected verb form is recorded in a new S-mode state. Separated verb prefixes are attached to the finite verb, first in the chain.</Paragraph>
    </Section>
    <Section position="4" start_page="1184" end_page="1184" type="sub_section">
      <SectionTitle>
3.4 Alignment
</SectionTitle>
      <Paragraph position="0"> Iu the CVP alignment, only 78 % of the turns proved to have CVPs on both sides, only 19 % had more than one CVP on some side. CVPs were further aligned by maximizing the translation probability of the full verbs (yielding 16,575 CVP pairs). To ensure correctness, turns with multiple CVPs were inspected by hand.</Paragraph>
      <Paragraph position="1"> In word alignment inside CVPs, surplus tensebearing auxiliary verbs were aligned with a tense-marked NULL auxiliary (similar to the English auxiliary do).</Paragraph>
    </Section>
    <Section position="5" start_page="1184" end_page="1184" type="sub_section">
      <SectionTitle>
3.5 Alignment Results
</SectionTitle>
      <Paragraph position="0"> The domain biases the corpus towards the future. So only 5 out of 6 German tenses and 12 out of 16 English tenses occurred in the corpus. Both will and be going to were analysed as future, while would was taken to indicate conditional mood, hence present.</Paragraph>
      <Paragraph position="1">  In some cases, tense was ambiguous when considered in isolation, and had to be resolved in tandem with tense translation. Ambiguous tenses on the target side were disambiguated to fit the particular disambiguation strategy.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1184" end_page="1186" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> Formally, we define source tense and target tense as two random variables S and T. Disambiguation strategies are modeled as functions tr from source to target tense. Precision figures give the proportion of source tense tokens ts that the strategy correctly translates to target tense tt, recall gives the proportion of source-target tense pairs that the strategy finds out.</Paragraph>
    <Paragraph position="2"> Combined precision and recall values are formed by taking the sum of the frequencies in numerator and denominator for all source and target tenses. Performance was cross-validated with test sets of 10 % of all CVP pairs.</Paragraph>
    <Section position="1" start_page="1184" end_page="1184" type="sub_section">
      <SectionTitle>
4.1 Baseline
</SectionTitle>
      <Paragraph position="0"> A baseline strategy assigns to every source tense the most likely target tense (tr(ts) = arg maxttP(tt\[ts), strategy t). The most likely target tenses can be read off Figures 1 and 2.</Paragraph>
      <Paragraph position="1"> Past tenses rarely denote logical past, as discussion circles around a future meeting event, they are rather used for politeness.</Paragraph>
      <Paragraph position="2"> (5) a. Ich wollte Sie fragen, wie das aussieht.</Paragraph>
      <Paragraph position="3"> I wanted to ask you what is on.</Paragraph>
      <Paragraph position="4"> b. iibermorgen war ich ja auf diesem Kongref~ in Ziirich.</Paragraph>
      <Paragraph position="5"> the day after tomorrow, I'll be (lit: was) at this conference in Zurich.</Paragraph>
    </Section>
    <Section position="2" start_page="1184" end_page="1184" type="sub_section">
      <SectionTitle>
4.2 Full Verb Information
</SectionTitle>
      <Paragraph position="0"> Three more disambiguation strategies condition the choice of tense on the full verb in a CVP, viz. the source verb (tr(ts,vs) -arg maxttP(ttlts,vs), strategy vs), the target verb (tr(ts,vt), strategy vt), and the combination of source and target verb (tr(ts, (vs,vt)), strategy vst). The table below gives precision and recall values for these strategies and for the strategies obtained by smoothing (e.g.</Paragraph>
      <Paragraph position="1"> Vst, Vs, Vt, t is Vst smoothed first with vs, then with vt, and finally with t). Smoothing with t results in identical precision and recall figures.</Paragraph>
      <Paragraph position="2">  We see that inclusion of verb information improves performance. Translation pairs approximate the verb semantics better than single source or target verbs. The full verb contexts of tenses can also be used for verb classifications. Aspectual classification: The aspect of a verb often depends on its reading and thus can be better extrapolated from an aligned corpus (e.g. I am having a drink (trinken)). German allows punctual events in the present, English prefers present perfect (e.g. sehen, finden, feststellen(discover, find, see), einfallen (occur, remember); treffen, erwischen, sehen (meet)).</Paragraph>
      <Paragraph position="3"> World knowledge: In many cases perfect maps an event to its result state.</Paragraph>
      <Paragraph position="4"> finish forget denken an sich verabreden sich vertun settle a question</Paragraph>
    </Section>
    <Section position="3" start_page="1184" end_page="1186" type="sub_section">
      <SectionTitle>
4.3 Subordinating
</SectionTitle>
      <Paragraph position="0"> =~ fertig sein =~ nicht mehr wissen =~ have in mind =~ have an appointment be wrong (the question) is settled Conjunctions Conjunctions often engender different mood. * In conditional clauses English past tenses usually denote present tenses. Interpreting hypothetical past as present increases performance by about 0.3 %.</Paragraph>
      <Paragraph position="1">  * In subjunctive environments logical future is expressed by English simple present. The verbs vorschlagen (suggest) (in 11 out of 14 cases) and sagen (say) (2/5) force simple present on verbs that normally prefer a translation to future. (6) I suggest that we meet on the tenth. . Certain matrix verbs 3 trigger translation of German present to English future.</Paragraph>
    </Section>
    <Section position="4" start_page="1186" end_page="1186" type="sub_section">
      <SectionTitle>
4.4 Representation of Tense
</SectionTitle>
      <Paragraph position="0"> Tense can not only be viewed as a single item (as sketched above, representation rt). In compositional analyses of tense, source tense S and target tense T are decomposed into components (S1,... , Sn) and (T1,... ,Tn). A disambiguation strategy tr is correct if Vi : tr(Si) = T,.</Paragraph>
      <Paragraph position="1"> One decomposition is suggested by the encoding of tense on the surface ((present/past, O / will/ be going to/werden, O / have/ haben/ sein, 0/be), representation rs). Another widely used framework in tense analysis (Reichenbach, 1947) ((E&lt;/~/&gt;R, R&lt;/~/&gt;S, +-progr), representation rr) analyses English tenses as follows:</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="1186" end_page="1186" type="metho">
    <SectionTitle>
R~S R&lt;S R&gt;S
</SectionTitle>
    <Paragraph position="0"> E~R present past E&lt;R present perf. past perf. fut. perf. E&gt;R future future past A similar classification can be used for German except that present and perfect are analysed as ambiguous between present and future (E_&gt;R~S and E&lt;R_&gt;S).</Paragraph>
  </Section>
  <Section position="8" start_page="1186" end_page="1186" type="metho">
    <SectionTitle>
G-+E E-+G
</SectionTitle>
    <Paragraph position="0"> repr. strat, prec. recall , t prec. recall , t  The poor performance of strategy rs corroborates the expectation that tense disambiguation is helped by recognition of analytic tenses. Strategy rr performs slightly worse than rt. The really hard step with Reichenbach seems to be aausgehen von, denken, meinen (think), hoffen (hope), schade sein (be a pity) the mapping from surface tense to abstract representation (e.g. deciding if (polite) past is mapped to logical present or past), rr performs slightly better in E-+G, since the burden of choosing surface tense is shifted to generation. null repr. strat.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML