File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3216_concl.xml

Size: 2,395 bytes

Last Modified: 2025-10-06 13:54:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3216">
  <Title>A Phrase-Based HMM Approach to Document/Abstract Alignment</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> Despite the success of our model, it's performance still falls short of human performance (we achieve an F-score of 0:548 while humans achieve 0:736).</Paragraph>
    <Paragraph position="1"> Moreover, this number for human performance is a lower-bound, since it is calculated with only one reference, rather than two.</Paragraph>
    <Paragraph position="2"> We have begun to perform a rigorous error analysis of the model to attempt to identify its deficiencies: currently, these appear to primarily be due to the model having a zeal for aligning identical words.</Paragraph>
    <Paragraph position="3"> This happens for one of two reasons: either a summary word should be null-aligned (but it is not), or a summary word should be aligned to a different, non-identical document word. We can see the PBHMMO model as giving us an upper bound on performance if we were to fix this first problem. The second problem has to do either with synonyms that do not appear frequently enough for the system to learn reliable rewrite probabilities, or with coreference issues, in which the system chooses to align, for instance, &amp;quot;Microsoft&amp;quot; to &amp;quot;Microsoft,&amp;quot; rather than &amp;quot;Microsoft&amp;quot; to &amp;quot;the company,&amp;quot; as might be correct in context. Clearly more work needs to be done to fix these problems; we are investigating solving the first problem by automatically building a list of synonyms from larger corpora and using this in the mixture model, and the second problem by investigating the possibility of including some (perhaps weak) coreference knowledge into the model.</Paragraph>
    <Paragraph position="4"> Finally, we are looking to incorporate the results of this model into a real system. This can be done either by using the word-for-word alignments to automatically build sentence-to-sentence alignments for training a sentence extraction system (in which case the precision/recall numbers over full sentences are likely to be much higher), or by building a system that exploits the word-for-word alignments explicitly. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML