File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/p01-1049_evalu.xml

Size: 4,048 bytes

Last Modified: 2025-10-06 13:58:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1049">
  <Title>Building Semantic Perceptron Net for Topic Spotting</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6. Experiment and Discussion
</SectionTitle>
    <Paragraph position="0"> We employ the ModApte Split version of Reuters21578 corpus to test our method. In order to ensure that the training is meaningful, we select only those classes that have at least one document in each of the training and test sets. This results in 90 classes in both the training and test sets. After eliminating documents that do not belong to any of these 90 classes, we obtain a training set of 7,770 documents and a test set of 3,019 documents.</Paragraph>
    <Paragraph position="1"> From the set of training documents, we derive the set of semantic nodes for each topic using the procedures outlined in Section 4. From the training set, we found that the average number of semantic nodes for each topic is 132, and the average number of terms in each node is 2.4. For illustration, Table 1 lists some examples of the semantic nodes that we found. From table 1, we can draw the following general observations.</Paragraph>
    <Paragraph position="2">  a) Under the topic &amp;quot;wheat &amp;quot;, we list four semantic nodes. Node 1 contains the common attribute set of the topic. Node 2 is related to the &amp;quot;buying and selling of wheat&amp;quot;. Node 3 is related to &amp;quot;wheat production&amp;quot;; and node 4 is related to &amp;quot;the effects of insect on wheat production&amp;quot;. The results show that the automatically extracted basic semantic nodes are meaningful and are able to capture most semantics of a topic.</Paragraph>
    <Paragraph position="3"> b) Node 1 originally contains two terms &amp;quot;wheat&amp;quot; and &amp;quot;corn&amp;quot; that belong to the same synset found by looking up WordNet. However, in the training stage, the weight of the word &amp;quot;corn&amp;quot; was found to be very small in topic &amp;quot;wheat&amp;quot;, and hence it was removed from the semantic group. This is similar to the discourse based word sense disambiguation.</Paragraph>
    <Paragraph position="4"> c) The granularity of information expressed by the semantic nodes may not be the same as what human expert produces. For example, it is possible that a human expert may divide node 2 into two nodes {import} and {export, output}.</Paragraph>
    <Paragraph position="5"> d) Node 5 contains four words and is formed by analyzing context. Each context vector of the four words has the same two components: &amp;quot;price&amp;quot; and &amp;quot;digital number&amp;quot;. Meanwhile, &amp;quot;rise&amp;quot; and &amp;quot;fall&amp;quot; can also be grouped together by &amp;quot;antonym&amp;quot; relation. &amp;quot;fell&amp;quot; is actually the past tense of &amp;quot;fall&amp;quot;. This means that by comparing context, it is possible to group together those words with grammatical variations without performing grammatical analysis.</Paragraph>
    <Paragraph position="6"> Table 2 summarizes the results of SPN in terms of macro and micro F1 values (see Yang &amp; Liu (1999) for definitions of the macro and micro F1 values). For comparison purpose, the Table also lists the results of other TC methods as reported in Yang &amp; Liu (1999). From the table, it can be seen that the SPN method achieves the best macF1 value. This indicates that the method performs well on classes with a small number of training samples.</Paragraph>
    <Paragraph position="7"> In terms of the micro F1 measures, SPN out-performs NB, NNet, LSF and KNN, while posting a slightly lower performance than that of SVM.</Paragraph>
    <Paragraph position="8"> The results are encouraging as they are rather preliminary. We expect the results to improve further by tuning the system ranging from the initial values of various parameters, to the choice of error functions, context, grouping algorithm, and the structures of topic tree and SPN.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML