XML Viewer - p06-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1011_concl.xml

Size: 1,388 bytes

Last Modified: 2025-10-06 13:55:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1011">
  <Title>Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora</Title>
  <Section position="7" start_page="86" end_page="87" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have presented a simple and effective method for extracting sub-sentential fragments from comparable corpora. We also presented a method for computing a probabilistic lexicon based on the LLR statistic, which produces a higher quality lexicon. We showed that using this lexicon helps improve the precision of our extraction method.</Paragraph>
    <Paragraph position="1"> Our approach can be improved in several aspects. The signal filtering function is very simple; more advanced filters might work better, and eliminate the need of applying additional  heuristics (such as our requirement that the extracted fragments have at least 3 words). The fact that the source and target signal are filtered separately is also a weakness; a joint analysis should produce better results. Despite the better lexicon, the greatest source of errors is still related to false word correspondences, generally involving punctuation and very common, closed-class words. Giving special attention to such cases should help get rid of these errors, and improve the precision of the method.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML