File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-2145_concl.xml

Size: 1,945 bytes

Last Modified: 2025-10-06 13:58:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2145">
  <Title>Text Segmentation with Multiple Surface Linguistic Cues</Title>
  <Section position="8" start_page="884" end_page="884" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we described a method for identifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues. We made the claim that automatically training the weights that are used for combining multiple linguistic cues is an effective method for text segmentation. Furthermore, we presented the multiple regression analysis with the stepwise method as a method of automatically training the weights without causing the overfltting problem. Though our experiments might be small-scale, they showed that our claims and our approach are promising. We think that we should experiment with large datasets.</Paragraph>
    <Paragraph position="1"> As a future work, we now plan to calculate the weights for a subset of the texts by clustering the training texts. Since there may be some differences among real texts which reflect the differences of their author, their style, their genre, etc., we think that clustering a set of the training texts and calculating the weights for each cluster, rather than calculating the weights for the entire set of texts, might improve the accuracy. In the area of speech recognition, to improve the accuracy of the language models, clustering the training data is considered to be a promising method for automatic training(Carter, 1994; Iyer et al., 1994). Carter presents a method for clustering the sentences in a training corpus automatically into some subcorpora on the criterion of entropy reduction and calculating separate language model parameters for each cluster. He asserts that this kind of clustering offers a way to improve the performance of a model significantly.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML