XML Viewer - w00-0205

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0205_metho.xml
Size: 6,417 bytes
Last Modified: 2025-10-06 14:07:23
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0205">
  <Title>Telicity as a Cue to Temporal and Discourse Structure in Chinese-English Machine Translation*</Title>
  <Section position="4" start_page="37" end_page="37" type="metho">
    <SectionTitle>
4 Predictions
</SectionTitle>
    <Paragraph position="0"> Based on (Dowty, 1986) and others, as discussed above, we predict that sentences that have a telic LCS will better translate into English as the past tense, and those that lack telic identifiers will translate as present tense. Moreover, we predict that verbs in the main clause that are telic, will be past with respect to their subordinates (X then Y). Verbs in the main clause that are atelic we predict will temporally overlap (X while Y).</Paragraph>
  </Section>
  <Section position="5" start_page="37" end_page="37" type="metho">
    <SectionTitle>
5 Implementation
</SectionTitle>
    <Paragraph position="0"> LCSes are used as the interlingua for our machine translation efforts. Following the principles in (Dorr, 1993), lexical information and constraints on well-formed LCSes are used to compose an LCS for a complete sentence from a sentence parse in a source language. This composed LCS (CLCS) is then used as the starting points for generation into the target language, using lexical information and constraints for the target language.</Paragraph>
    <Paragraph position="1"> The generation component consists of the following subcomponents: Decomposition and lexlcal selection First, primitive LCSes for words in the target language are matched against CLCSes, and tree structures of covering words are selected. Ambiguity in the input and analysis represented in the CLCS is maintained (insofar as it is possible to realize particular readings using the target language lexicon), and new ambiguities are introduced when there are different ways of realizing a CLCS in the target language.</Paragraph>
    <Paragraph position="2"> AMR Construction This tree structure is then translated into a representation using the Augmented Meaning Representation (AMR) syntax * of instances and hierarchical relations (Langkfide and Knight, 1998a); however the relations include information present in the CLCS and LCSes for target language words, including theta roles, LCS type, and associated features.</Paragraph>
    <Paragraph position="3"> Realization The AMR structure is then linearized, as described in (Dorr et al., 1998), and morphological realization is performed. The result is a lattice of possible realizations, representing both the preserved ambiguity from previous processing phases and multiple ways of linearizing the sentence.</Paragraph>
    <Paragraph position="4"> Extraction The final stage uses a statistical bi-gram extractor to pick an approximation of the most fluentrealization (Langkilde and Knight, 1998b).</Paragraph>
    <Paragraph position="5"> While there are several possible ways to address the tense and discourse connective issues mentioned above, such as modifying the LCS primitive elements and/or the composition of the LCS from the source language, we instead have been experimenting for the moment with solutions implemented within the generation component. The only extensions to the LCS language have been loosening of the constraint against direct modification of states and events by other states and events (thus allowing composed LCSes to be formed from Chinese with these structures, but creating a challenge for fluent generation into English), and a few added features to cover some of the discourse markers that are present. We are able to calculate telicity of a CLCS, using the algorithm in Figure 1 and encode this information as a binary teliC/ feature in the Augmented Meaning Representation (AMR).</Paragraph>
    <Paragraph position="6"> The realization algorithm has been augmented with the rules in (6)  (6) a. If there is no tense feature, use telicity to determine the tense: : telic + -~ : tense past : relic -- --~ : tense present b. In an event or state directly modifying  another event or state, if there is no other clausal connective (coming from a subordinating conjunction or post-position in the original), then use telicity to pick a connective expressing assumed temporal relation:  : relic -~ -~ : sconj then : relic -- -~ : sconj while</Paragraph>
  </Section>
  <Section position="6" start_page="37" end_page="38" type="metho">
    <SectionTitle>
6 The Corpus
</SectionTitle>
    <Paragraph position="0"> We have applied this machine translation system to a corpus of Chinese newspaper text from Xinhua and other sources, primarily in the economics domain.</Paragraph>
    <Paragraph position="1"> The genre is roughly comparable to the American  Wall Street Journal. Chinese newspaper genre differs from other Chinese textual sources, in a number of ways, including:  However, the presence of multiple events and states in a single sentence, without explicit modificatioia is characteristic of written Chinese in general. In the 80-sentence corpus under consideration, the sentence structure is complex and stylized; with an average of 20 words per sentence. Many sentences, such as (1)and (2), have multiple clauses that are not in a direct complement relationship or indicated with explicit connective words.</Paragraph>
  </Section>
  <Section position="7" start_page="38" end_page="38" type="metho">
    <SectionTitle>
7 Ground Truth
</SectionTitle>
    <Paragraph position="0"> To evaluate the extent to which our Predictions result in an improvement in translation, we have used a database of human translations of the sentences in our corpus as the ground truth, or gold standard.</Paragraph>
    <Paragraph position="1"> One of the translators is included among our authors. null The ground truth data was created to provide a fluid human translation of the text early in our system development. It therefore includes many complex tenses and multiple sentences combined, both currently beyond the state of our system. Thus, two of the authors and an additional researcher also created a database of temporal relations among the clauses in the sentences that produced illegal event/state modifications. This was used to test predictions of temporal relationships indicated by telicity. In evaluating our results, we concentrate on how well the System did at matching past and present, and on the appropriateness of temporal connectives generated.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML