File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1621_intro.xml
Size: 2,079 bytes
Last Modified: 2025-10-06 14:03:20
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1621"> <Title>Using a Corpus of Sentence Orderings Defined by Many Experts to Evaluate Metrics of Coherence for Text Structuring</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 An additional evaluation test </SectionTitle> <Paragraph position="0"> As [Barzilay et al., 2002] report, different humans often order sentences in distinct ways. Thus, there might exist more than one equally good solution for TS, a view shared by almost all TS researchers, but which has not been accounted for in the evaluation methodologies of [Karamanis et al., 2004] and [Barzilay and Lee, 2004].2 Collecting sentence orderings defined by many experts in our domain enables us to investigate the possibility that there might exist many good solutions for TS. Then, the measure of [Lapata, 2003], which estimates how close two orderings stand, can be employed not only to verify the reliability of E0 but also to compare the orderings preferred by the assumed TS approach with the orderings of the experts.</Paragraph> <Paragraph position="1"> However, this evaluation methodology has its limitations as well. Being engaged in other obligations, the experts normally have just a limited amount of time to devote to the 2A more detailed discussion of existing corpus-based methods for evaluating TS appears in [Karamanis and Mellish, 2005].</Paragraph> <Paragraph position="2"> NLG researcher. Similarly to standard psycholinguistic experiments, consulting these informants is difficult to extend to a larger corpus like the one used e.g. by [Karamanis et al., 2004] (122 sets of facts).</Paragraph> <Paragraph position="3"> In this paper, we reach a reasonable compromise by showing how the methodology of [Lapata, 2003] supplements the evaluation efforts of [Karamanis et al., 2004] using a similar (yet by necessity smaller) dataset. Clearly, a metric of coherence that has already done well in the previous study, gains extra bonus by passing this additional test.</Paragraph> </Section> class="xml-element"></Paper>