File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-4018_evalu.xml

Size: 2,368 bytes

Last Modified: 2025-10-06 13:59:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4018">
  <Title>Improving Automatic Sentence Boundary Detection with Confusion Networks</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Table 1 shows the results in terms of slot error rate on the four test sets. The middle column indicates the performance on a single hypothesis, with the words derived from the pruned set of N-best hypotheses. The right column indicates the performance of the system using multiple hypotheses merged with confusion networks.</Paragraph>
    <Paragraph position="1"> Multiple hypotheses provide a reduction of error for both test sets of CTS (significant at p a0 .02 using the Mc-Nemar test), but give insignificant (and mixed) results for BN. The small increase in error for the BN evaluation set  fusion nets.</Paragraph>
    <Paragraph position="2"> may be due to the fact that the 1-best parameters were tuned on different news shows than were represented in the evaluation data.</Paragraph>
    <Paragraph position="3"> We expected a greater gain from the use of confusion networks in CTS than BN, given the previously shown impact of WER on 1-best SU detection. Additionally, incorporating a larger number of N-best hypotheses has improved results in all experiments so far, so we would expect this trend to continue for additional increases, but time constraints limited our ability to run these larger experiments. One possible explanation for the relatively small performance gains is that we constrained the confusion network topology so that there was no change in the word recognition results. We imposed this constraint in our initial investigations to allow us to compare performance using the same words. It it possible that better performance could be obtained by using confusion network topologies that link words and metadata.</Paragraph>
    <Paragraph position="4"> A more specific breakout of error improvement for the CTS development set is given in Table 2, showing that both recall and precision benefit from using the N-best framework. Including multiple hypotheses reduces the number of SU deletions (improves recall), but the primary gain is in reducing insertion errors (higher precision). The same effect holds for the CTS evaluation set.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML