File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2012_evalu.xml

Size: 2,073 bytes

Last Modified: 2025-10-06 13:59:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2012">
  <Title>Phrase Linguistic Classification and Generalization for Improving Statistical Machine Translation</Title>
  <Section position="6" start_page="70" end_page="71" type="evalu">
    <SectionTitle>
5 Ongoing and future research
</SectionTitle>
    <Paragraph position="0"> Ongoing research is mainly focused on developing an appropriate generalization technique for unseen instances and evaluating its impact in translation quality.</Paragraph>
    <Paragraph position="1"> Later, we expect to run experiments with a much bigger parallel corpus such as the European Parliament corpus, in order to evaluate the improvement due to morphological information for different sizes of the training data. Advanced methods to compute Pr(~ei|T, ~fj) should also be tested (based on source and target contextual features).</Paragraph>
    <Paragraph position="2"> The next step will be to extend the approach to other potential classes such as: * Nouns and adjectives. A straightforward strategy would classify all nouns and adjectives to their base form, reducing sparseness.</Paragraph>
    <Paragraph position="3"> * Simple Noun phrases. Noun phrases with or without article (determiner), and with or without preposition, could also be classified to the base form of the head noun, leading to a further reduction of the data sparseness, in a subsequent stage. In this case, expressions like at night, the night, nights or during the night would all be mapped to the class 'night'.</Paragraph>
    <Paragraph position="4"> * Temporal and numeric expressions. As they are usually tackled in a preprocessing stage in current SMT systems, we did not deal with them here.</Paragraph>
    <Paragraph position="5"> More on a long-term basis, ambiguous linguistic classification could also be allowed and included in the translation model. For this, incorporating statistical classification tools (chunkers, shallow parsers, phrase detectors, etc.) should be considered, and evaluated against the current implementation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML