File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2808_intro.xml
Size: 2,204 bytes
Last Modified: 2025-10-06 14:04:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2808"> <Title>Anomaly Detecting within Dynamic Chinese Chat Text</Title> <Section position="3" start_page="48" end_page="48" type="intro"> <SectionTitle> 2 Related Works </SectionTitle> <Paragraph position="0"> Some works had been carried out in (Xia et. al., 2005a) in which an SVM classifier is implemented to recognize anomalous chat text terms.</Paragraph> <Paragraph position="1"> A within-domain open test is conducted on chat text posted in March 2005. The SVM classifier is trained on five training sets which contain chat text posted from December 2004 to February 2005. The experiments show that performance of the SVM classifier increases when the training period and test period are closer. This reveals that chat text is written in a style that changes quickly with time. Many anomalous popular chat terms in last year are forgotten today and new ones replace them. This makes SVM based pattern learning technique ineffective to reflect the changes.</Paragraph> <Paragraph position="2"> The solution to this problem in (Xia et. al., 2005b) is to re-train the SVM classifier periodically. This costs a lot of manpower in producing the timely chat text corpora, in which each piece of anomalous chat text should be annotated with several attributes manually.</Paragraph> <Paragraph position="3"> We argue that the anomalous chat text can be identified using negative training samples in static Chinese corpora. Our proposal is that we model the standard natural language using standard Chinese corpora. We incorporate a static chat text corpus to provide positive training samples to reflect fundamental characteristics of anomalous chat text. We then apply the models to detect the anomalous chat text by calculating confidence and entropy.</Paragraph> <Paragraph position="4"> Regarding the approaches proposed in this paper, our arguments are, 1) the approaches can achieve performance equivalent to the best ones produced by the approaches in existence; and 2) the good performance can be achieved stably.</Paragraph> <Paragraph position="5"> We prove these arguments in the following sections. null</Paragraph> </Section> class="xml-element"></Paper>