File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/w01-1008_concl.xml
Size: 1,818 bytes
Last Modified: 2025-10-06 13:53:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1008"> <Title>Document Fusion for Comprehensive Event Description</Title> <Section position="5" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> The document fusion system described is just prototype and there is much more space for improvement. Although detecting redundancies by using a shallow notion of entailment works reasonably well, it is still far from perfect.</Paragraph> <Paragraph position="1"> In the current implementation, text analysis is very shallow. Pattern matching is used to avoid dangling anaphora and lemmatization is used to make the entailment and similarity scores unsusceptible to morphological variations such as number and tense. A question for future research is to what extent shallow parsing techniques can improve the entailment scores. In particular, does considering the relational structure of a sentence improve computing entailment relations? This has shown to be successful in inference-based approaches to question-answering, see (Harabagiu et al., 2000), and document fusion might also benefit from representations that are a bit deeper than the one discussed in this paper.</Paragraph> <Paragraph position="2"> Another open issue at this point is the need for standards for evaluating the quality of document fusion. We think that this can be done by using standard IR measures like Miss and False Alarm.</Paragraph> <Paragraph position="3"> Although Miss can be approximated extrinsically, it is unclear whether this also possible for False Alarm. Obviously, intrinsic evaluation is more reliable, but it remains an extremely laborious process, where inter-judge disagreement is still an issue, see (Radev et al., 2000).</Paragraph> </Section> class="xml-element"></Paper>