File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1116_concl.xml
Size: 2,430 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1116"> <Title>Term Aggregation: Mining Synonymous Expressions using Personal Stylistic Variations</Title> <Section position="7" start_page="6" end_page="6" type="concl"> <SectionTitle> 6 Conclusion and Future Work </SectionTitle> <Paragraph position="0"> This paper describes how to use the coherent corpus for term aggregation. In this paper we used the personal stylistic variations based on the idea that one person mostly uses one expression for one meaning.</Paragraph> <Paragraph position="1"> Although variations of personal writing styles are cause of the synonymous expressions in general, we managed to take advantage of such personal writing styles in order to reduce noise for term aggregation system.</Paragraph> <Paragraph position="2"> We argued mainly about synonymous expressions in this paper, we can extract abbreviations and frequent missspelled words, and they should be considered as terms in term aggregation. We have to consider not only role-based word similarities, but also string-based similarities.</Paragraph> <Paragraph position="3"> In general, a wide range of variations in expressions for the same meaning is a problematic feature of noisy data. However, in our method, we exploit these problematic variations for useful information for improving the accuracy of the system. This noise removal approach is effective when the data contains various expressions coming from various authors. Gasperin (Gasperin, 2001) indicated the specific prepositions are relevant to characterize the significant syntactic contexts used for the measurement of word similarity, considering what prepositions do and do not depend on personal writing style remains as future work.</Paragraph> <Paragraph position="4"> In this paper, our work is based on the call center's logs, but this method is suitable for data from other domains. For example we anticipate that patent application data will be a suitable resource, because this data includes various expressions, and the expressions are based on each company's terminology. On the other hand, e-mail data does not seem suitable for our approach because other authors influence the expressions used. While we restricted ourselves in this work to this specific data, our future work will include an investigation of the character of the data and how it influences our method.</Paragraph> </Section> class="xml-element"></Paper>