File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2914_concl.xml
Size: 2,516 bytes
Last Modified: 2025-10-06 13:55:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2914"> <Title>Word Distributions for Thematic Segmentation in a Support Vector Machine Approach</Title> <Section position="10" start_page="106" end_page="106" type="concl"> <SectionTitle> 7 Conclusions </SectionTitle> <Paragraph position="0"> We have introduced a new approach based on word distributions for performing thematic segmentation.</Paragraph> <Paragraph position="1"> The thematic segmentation task is modeled here as a binary classification problem and support vector machine learning is adopted. In our experiments, we make a comparison of our approach versus existing linear thematic segmentation systems reported in the literature, by running them over three different data sets. When evaluating on real data, our approach either outperformed the other existing methods or performs comparably to the best. We view this as a strong evidence that our approach provides a unified and robust framework for the thematic segmentation task. The results also suggest that word distributions themselves might be a good candidate for capturing the thematic shifts of text and that SVM learning can play an important role in building an adaptable correlation. null Our experiments also show the sensitivity of a segmentation method to the type of a corpus on which it is tested. For instance, the C99 algorithm which achieves superior performance on a synthetic collection performs quite poorly on the real-life data sets.</Paragraph> <Paragraph position="2"> While we have shown empirically that our technique can provide considerable gains by using single word distribution features, future work will investigate whether the system can be improved by exploiting other features derived for instance from syntactic, lexical and, when available, prosodic information. If further annotated meeting data becomes available, it would be also interesting to replicate our experiments on a bigger data set in order to verify whether our system performance improves.</Paragraph> <Paragraph position="3"> Acknowledgments This work is partially supported by the Interactive Multimodal Information Management project (http://www.im2.ch/). Many thanks to the reviewers for their insightful suggestions. We are grateful to the International Computer Science Institute (ICSI), University of California for sharing the data with us. The authors also thank Michael Galley who kindly provided us the thematic annotations of the ICSI data.</Paragraph> </Section> class="xml-element"></Paper>