File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/j02-1002_concl.xml

Size: 2,884 bytes

Last Modified: 2025-10-06 13:53:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-1002">
  <Title>A Critique and Improvement of an Evaluation Metric for Text Segmentation</Title>
  <Section position="5" start_page="32" end_page="33" type="concl">
    <SectionTitle>
5. Conclusions
</SectionTitle>
    <Paragraph position="0"> We have found that the P k error metric for text segmentation algorithms is affected by the variation of segment size distribution, becoming slightly more lenient as the variance increases. It penalizes false positives significantly less than false negatives, particularly if the false positives are uniformly distributed throughout the document. It penalizes near-miss errors more than pure false positives of equal magnitude. Finally, it fails to take into account situations in which multiple boundaries occur between the two sides of the probe, and it often misses or underpenalizes mistakes in small segments.</Paragraph>
    <Paragraph position="1"> We proposed two modifications to tackle these problems. The first, which we call</Paragraph>
    <Paragraph position="3"> , simply doubles the false positive penalty. This solves the problem of overpenalizing false negatives, but it is not effective at dealing with the other problems. The second, which we call WindowDiff (WD), counts the number of boundaries between the two ends of a fixed-length probe, and compares this number with the number of boundaries found in the same window of text for the reference segmentation. This modification addresses all of the problems listed above. WD is only slightly affected by variation of segment size distribution, gives equal weight to the false positive penalty and the false negative penalty, is able to catch mistakes in small segments just as well as mistakes in  Computational Linguistics Volume 28, Number 1 large segments, and penalizes near-miss errors less than pure false positives of equal magnitude. However, it has some problems of its own. WD penalizes all pure false positives the same amount regardless of how close they are to an actual boundary.</Paragraph>
    <Paragraph position="4"> It is not clear whether this is a good thing or not, but it seems to be preferable to overpenalizing near misses.</Paragraph>
    <Paragraph position="5"> The discussion above addresses Problems 1 through 4 but does not address Problem 5: how does one interpret the values produced by the metric? From the tests we have run, it appears that the WD metric grows in a roughly linear fashion with the difference between the reference and the experimental segmentations. In addition, we feel that WD is a more meaningful metric than P k . Comparing two stretches of text to see how many discrepancies occur between the reference and the algorithm's result seems more intuitive than determining how often two text units are incorrectly labeled as being in different segments.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML