XML Viewer - p05-1017

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-1017_evalu.xml
Size: 9,300 bytes
Last Modified: 2025-10-06 13:59:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1017">
  <Title>Extracting Semantic Orientations of Words using Spin Model</Title>
  <Section position="5" start_page="136" end_page="138" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We used glosses, synonyms, antonyms and hypernyms of WordNet (Fellbaum, 1998) to construct an English lexical network. For part-of-speech tagging and lemmatization of glosses, we used Tree-Tagger (Schmid, 1994). 35 stopwords (quite frequent words such as &amp;quot;be&amp;quot; and &amp;quot;have&amp;quot;) are removed from the lexical network. Negation words include 33 words. In addition to usual negation words such as &amp;quot;not&amp;quot; and &amp;quot;never&amp;quot;, we include words and phrases which mean negation in a general sense, such as &amp;quot;free from&amp;quot; and &amp;quot;lack of&amp;quot;. The whole network consists of approximately 88,000 words. We collected 804 conjunctive expressions from Wall Street Journal and Brown corpus as described in Section 4.2.</Paragraph>
    <Paragraph position="1"> The labeled dataset used as a gold standard is General Inquirer lexicon (Stone et al., 1966) as in the work by Turney and Littman (2003). We extracted the words tagged with &amp;quot;Positiv&amp;quot; or &amp;quot;Negativ&amp;quot;, and reduced multiple-entry words to single entries. As a result, we obtained 3596 words (1616 positive words and 1980 negative words) 1. In the computation of 1Although we preprocessed in the same way as Turney and Littman, there is a slight difference between their dataset and our dataset. However, we believe this difference is insignificant.  networks and four different sets of seed words. In the parentheses, the predicted value of b is written.</Paragraph>
    <Paragraph position="2"> For cv, no value is written for b, since 10 different values are obtained.</Paragraph>
    <Paragraph position="3">  accuracy, seed words are eliminated from these 3596 words.</Paragraph>
    <Paragraph position="4"> We conducted experiments with different values of b from 0.1 to 2.0, with the interval 0.1, and predicted the best value as explained in Section 4.3. The threshold of the magnetization for hyper-parameter estimation is set to 1.0 x 10[?]5. That is, the predicted optimal value of b is the largest b whose corresponding magnetization does not exceeds the threshold value.</Paragraph>
    <Paragraph position="5"> We performed 10-fold cross validation as well as experiments with fixed seed words. The fixed seed words are the ones used by Turney and Littman: 14 seed words {good, nice, excellent, positive, fortunate, correct, superior, bad, nasty, poor, negative, unfortunate, wrong, inferior}; 4 seed words {good, superior, bad, inferior}; 2 seed words {good, bad}.</Paragraph>
    <Section position="1" start_page="136" end_page="137" type="sub_section">
      <SectionTitle>
5.1 Classification Accuracy
</SectionTitle>
      <Paragraph position="0"> Table 1 shows the accuracy values of semantic orientation classification for four different sets of seed words and various networks. In the table, cv corresponds to the result of 10-fold cross validation, in which case we use the pseudo leave-one-out error for hyper-parameter estimation, while in other cases we use magnetization.</Paragraph>
      <Paragraph position="1"> In most cases, the synonyms and the cooccurrence information from corpus improve accuracy. The only exception is the case of 2 seed words, in which G performs better than GT. One possible reason of this inversion is that the computation is trapped in a local optimum, since a small number of seed words leave a relatively large degree of freedom in the solution space, resulting in more local optimal points. We compare our results with Turney and  with various networks and four different sets of seed words. In the parenthesis, the actual best value of b is written, except for cv.</Paragraph>
      <Paragraph position="2">  Littman's results. With 14 seed words, they achieved 61.26% for a small corpus (approx. 1x107 words), 76.06% for a medium-sized corpus (approx. 2x109 words), 82.84% for a large corpus (approx. 1x1011 words).</Paragraph>
      <Paragraph position="3"> Without a corpus nor a thesaurus (but with glosses in a dictionary), we obtained accuracy that is comparable to Turney and Littman's with a medium-sized corpus. When we enhance the lexical network with corpus and thesaurus, our result is comparable to Turney and Littman's with a large corpus.</Paragraph>
    </Section>
    <Section position="2" start_page="137" end_page="137" type="sub_section">
      <SectionTitle>
5.2 Prediction of b
</SectionTitle>
      <Paragraph position="0"> We examine how accurately our prediction method for b works by comparing Table 1 above and Table 2 below. Our method predicts good b quite well especially for 14 seed words. For small numbers of seed words, our method using magnetization tends to predict a little larger value.</Paragraph>
      <Paragraph position="1"> We also display the figure of magnetization and accuracy in Figure 1. We can see that the sharp change of magnetization occurs at around b = 1.0 (phrase transition). At almost the same point, the classification accuracy reaches the peak.</Paragraph>
    </Section>
    <Section position="3" start_page="137" end_page="138" type="sub_section">
      <SectionTitle>
5.3 Precision for the Words with High
Confidence
</SectionTitle>
      <Paragraph position="0"> We next evaluate the proposed method in terms of precision for the words that are classified with high confidence. We regard the absolute value of each average as a confidence measure and evaluate the top words with the highest absolute values of averages.</Paragraph>
      <Paragraph position="1"> The result of this experiment is shown in Figure 2, for 14 seed words as an example. The top 1000 words achieved more than 92% accuracy. This result shows that the absolute value of each average  Comparison between the proposed method and the shortest-path method.</Paragraph>
      <Paragraph position="2"> seeds proposed short. path  can work as a confidence measure of classification.</Paragraph>
    </Section>
    <Section position="4" start_page="138" end_page="138" type="sub_section">
      <SectionTitle>
5.4 Comparison with other methods
</SectionTitle>
      <Paragraph position="0"> In order to further investigate the model, we conduct experiments in restricted settings.</Paragraph>
      <Paragraph position="1"> We first construct a lexical network using only synonyms. We compare the spin model with the shortest-path method proposed by Kamps et al. (2004) on this network, because the shortest-path method cannot incorporate negative links of antonyms. We also restrict the test data to 697 adjectives, which is the number of examples that the shortest-path method can assign a non-zero orientation value. Since the shortest-path method is designed for 2 seed words, the method is extended to use the average shortest-path lengths for 4 seed words and 14 seed words. Table 3 shows the result. Since the only difference is their algorithms, we can conclude that the global optimization of the spin model works well for the semantic orientation extraction.</Paragraph>
      <Paragraph position="2"> We next compare the proposed method with a simple bootstrapping method proposed by Hu and Liu (2004). We construct a lexical network using synonyms and antonyms. We restrict the test data to 1470 adjectives for comparison of methods. The result in Table 4 also shows that the global optimization of the spin model works well for the semantic orientation extraction.</Paragraph>
      <Paragraph position="3"> We also tested the shortest path method and the bootstrapping method on GTC and GT, and obtained low accuracies as expected in the discussion in Section 4.</Paragraph>
    </Section>
    <Section position="5" start_page="138" end_page="138" type="sub_section">
      <SectionTitle>
5.5 Error Analysis
</SectionTitle>
      <Paragraph position="0"> We investigated a number of errors and concluded that there were mainly three types of errors.</Paragraph>
      <Paragraph position="1"> One is the ambiguity of word senses. For example, one of the glosses of &amp;quot;costly&amp;quot;is &amp;quot;entailing great loss or sacrifice&amp;quot;. The word &amp;quot;great&amp;quot; here means &amp;quot;large&amp;quot;, although it usually means &amp;quot;outstanding&amp;quot; and is positively oriented.</Paragraph>
      <Paragraph position="2"> Another is lack of structural information. For example, &amp;quot;arrogance&amp;quot; means &amp;quot;overbearing pride evidenced by a superior manner toward the weak&amp;quot;. Although &amp;quot;arrogance&amp;quot; is mistakingly predicted as positive due to the word &amp;quot;superior&amp;quot;, what is superior here is &amp;quot;manner&amp;quot;.</Paragraph>
      <Paragraph position="3"> The last one is idiomatic expressions. For example, although &amp;quot;brag&amp;quot; means &amp;quot;show off&amp;quot;, neither of &amp;quot;show&amp;quot; and &amp;quot;off&amp;quot; has the negative orientation. Idiomatic expressions often does not inherit the semantic orientation from or to the words in the gloss.</Paragraph>
      <Paragraph position="4"> The current model cannot deal with these types of errors. We leave their solutions as future work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML