File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1068_evalu.xml
Size: 10,433 bytes
Last Modified: 2025-10-06 13:59:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1068"> <Title>Filtering Speaker-Specific Words from Electronic Discussions</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> The example presented in the last section pertains to a specific run of the clustering procedure. We now evaluate our system more generally by looking at hp (1 cluster, 1825 documents, P=0.58, R=0.97, F=0.73) unable, connected, hat, entry, fix, configure, lpd, configuration, parallel, psc, kernel, configured, kurt, de, taylor, report, local, asnd@triumf.ca, grant, plain, debian, linuxprinting.org, officejet, instructions, letter, appears, update, called, extra, compile tex (2 clusters, 375 documents, P=1.00, R=0.34, F=0.50) luecking, arkansas, http://www.tex.ac.uk. . . , herbert, piet, oostrum, university, heiko, lars, mathematical, department, voss, van, http://people.ee.eth. . . , sciences, madsen, rtfsignature, http://www.ctan.org/. . . , http://www.ams.org/t. . . , wilson, oberdiek, http://www.ctan.org/. . . , apr, examples, english, asnd@triumf.ca, chapter, rf@cl.cam.ac.uk, sincerely, private photoshop (2 clusters, 1143 documents, P=0.95, R=0.95, F=0.95) gifford, million, jgifford@surewest.ne. . . , heinlein, www.nitrosyncretic.c. . . , john@stafford.net, america, urban, dragon, fey, imperial, created, hard, pictures, rgb, edjh, folder, face=3darial, tutorials, professional, comic, graphic, sketches, http://www.sover.net. . . , move, drive, wdflannery@aol.com, colors, buy, posted hp (2 clusters, 1162 documents, P=0.97, R=0.93, F=0.95) lprng, connected, linuxprinting.org, kernel, red, psc, hat, configure, unable, configuration, configured, parallel, ljet, printtool, series, database, jobs, gimp-print, debian, entry, suse, cupsomatic, officejet, cat, perfectly, jetdirect, duplex, devices, kde, happens tex (1 cluster, 1040 documents, P=0.98, R=0.91, F=0.95) arseneau, ctan, fairbairns, style, miktex, pdflatex, faq, chapter, apr, symbols, dvips, figures, title, include, math, bibtex, kastrup, university, examples, english, dvi, peter, plain, documents, contents, written, e.g, macro, robin, donald photoshop (2 clusters, 1287 documents, P=0.88, R=0.98, F=0.93) tacit, james, gifford, folder, rgb, pictures, created, colors, tutorials, illustrator, window, tom, mask, money, whatever, newsgroup, drive, brush, plugin, professional, stafford, view, menu, palette, channel, graphic, pixel, ram, clustering performance for a range of values of a0 , and inspecting the implications of this performance with respect to a document retrieval task.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Coarse-Level Clustering </SectionTitle> <Paragraph position="0"> Figure 1 shows the overall clustering performance obtained without filtering (solid line) and with filtering (dashed line). The left-hand-side of the figure shows the average number of newsgroups matched to clusters, while the right-hand-side shows the overall performance (F-score) obtained. The error bars in the plots are averages of 100 repetitions of the clustering procedure described in Section 3.2 (with random initialisation of the centroids at the start of each run). The widths of the error bars indicate 95% confidence intervals for these averages.</Paragraph> <Paragraph position="1"> Hence, non-overlapping intervals correspond to a difference with p-value lower than 0.05.</Paragraph> <Paragraph position="2"> In (Marom and Zukerman, 2004), we show that the effect of the filtering mechanism on clustering performance depends on three factors: (1) the presence of signature words from dominant contributors; (2) the 'natural', topical overlap between the newsgroups; and (3) the level of granularity in the clustering, i.e. the number of centroids.</Paragraph> <Paragraph position="3"> The main conclusions with respect to the dataset presented here are as follows.</Paragraph> <Paragraph position="4"> a0 Firstly, there is a heavy presence of signature words in two of the newsgroups ('tex' and 'photoshop' - see Table 1), and therefore the filtering mechanism has a significant effect on this dataset as a whole. As can be seen in Figure 1, the performance (F-score) without filtering is poorer for all values of a0 , and substantially more so for low values of a0 . Although the clustering procedure without filtering is able to find three distinct newsgroups with a0 a4 a5 , it requires a higher value of a0 to achieve a satisfactory performance. This suggests that the signature words create undesirable overlaps between the clusters. In contrast, when filtering is used, the clustering procedure reaches its best performance with a0 a4 a1 , where the performance is extremely good.</Paragraph> <Paragraph position="5"> a0 Secondly, the fact that the performance with filtering converges for such a low value of a0 suggests that there is little true topical overlap between the newsgroups, and the fact that the performance is significantly better for a0 a4 than for a0 a4a1a0 suggests that there is some overlap, possibly created by a sub-topic of one of the newsgroups. That is, although there are only three newsgroups, four centroids are better at finding them than three centroids, because the fourth centroid may correspond to an overlap region between two clusters, which then gets assigned to the correct newsgroup.</Paragraph> <Paragraph position="6"> We can get a better insight into these results by inspecting the individual performance of the pooled clusters, particularly their precision and recall. Figure 2 shows the average performance of the pooled clusters separately for each of the three newsgroups. This figure confirms that the 'hp' newsgroup is the least affected by signature words: for low values of a0 , without filtering, the average performance (Fscore) of the pooled clusters corresponding to the 'hp' newsgroup is generally better than that of the clusters corresponding to the other newsgroups (and it even matches the performance achieved with filtering for a0 a4a3a2 ). This is particularly evident when we compare recall curves: recall for the 'hp' newsgroup without filtering reaches the recall obtained with filtering when a0 a4 a5 . In contrast, precision only achieves this level of performance for higher values of a0 -- this is because some of the documents in the 'hp' newsgroup are confused with documents in the other two newsgroups.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Simple Information Retrieval </SectionTitle> <Paragraph position="0"> A desirable outcome for retrieval systems that perform document clustering prior to retrieval is that the returned clusters contain as much useful information as possible regarding a user's query. If the clustering is performed well, the words in the query should appear in many documents in the best matching cluster(s).</Paragraph> <Paragraph position="1"> Our retrieval experiments consist of retrieving documents that match three simple queries, each comprising a word pair that occurs frequently in the newsgroups. As before, for each experiment we repeated the clustering procedure 100 times and averaged the results. Retrieval performance was measured as follows: correct documents in the selected cluster total correct documents in the dataset where a correct document is one that contains all the words in a query, and the selected cluster is that whose centroid has the highest average value for the query terms. That is, if a query comprises the words a40a5a4 a9a5a6 a4 a12 a6 a42 a42 a42 a6 a4a8a7 a44 , and cluster a37 has a cen- null Our measure for retrieval performance considers only recall (i.e. how many correct documents were found for a particular query). It does not have a precision component, because the system retrieves only documents that contain all the words in the query.</Paragraph> <Paragraph position="2"> That is, precision is always perfect.</Paragraph> <Paragraph position="3"> According to Figure 2, the recall for the 'hp' newsgroup is equally high with and without filtering when a0 a46 a5 , as opposed to the other newsgroups, where the recall is significantly better with filtering for all values of a0 . We therefore chose a0 a4 a5 to evaluate retrieval, in order to expose the differences between the newsgroups.</Paragraph> <Paragraph position="4"> Table 4 shows the retrieval performance obtained for the three queries, when clustering is performed with and without filtering, and with a0 a4 a5 . The table shows the average performance of the pooled clusters separately for each of the three newsgroups. Also shown for each query is the total number of documents in the dataset that contain all the words in the query. The average performance of the best-matching cluster is displayed in bold font, and the standard deviation appears in brackets next to the performance.</Paragraph> <Paragraph position="5"> The first query is related to the 'hp' newsgroup.</Paragraph> <Paragraph position="6"> The retrieval performance of the matching cluster filter hp tex photoshop for this query is high with and without the filtering mechanism (the difference in performance is not statistically significant). As discussed above, this result is expected due to the similar recall score of the pooled cluster obtained with and without filtering for this newsgroup.</Paragraph> <Paragraph position="7"> Filtering has a more significant effect for the queries relating to the other newsgroups. Query 2 is very specific to the 'tex' newsgroup: when filtering is used, almost all the relevant documents are retrieved by the corresponding cluster. The benefit of filtering is very clear when we consider the poor retrieval performance when filtering is not used: 33% of the documents are missed (the p-value for the difference in retrieval score is a0 a3 a42 a3 a21 ). The third query has more ambiguity (the word 'colour' appears in the 'hp' newsgroup), and therefore the over-all retrieval performance is worse than for the other queries. About 17% of the documents were missed when filtering was used, most of which were allocated to the 'hp' newsgroup. Nevertheless, the filtering mechanism has a significant effect even for this ambiguous query (p-value=0.03).</Paragraph> </Section> </Section> class="xml-element"></Paper>