File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1642_evalu.xml

Size: 7,680 bytes

Last Modified: 2025-10-06 13:59:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1642">
  <Title>Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis</Title>
  <Section position="8" start_page="360" end_page="362" type="evalu">
    <SectionTitle>
6 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="360" end_page="361" type="sub_section">
      <SectionTitle>
6.1 Evaluation by Polar Atoms
</SectionTitle>
      <Paragraph position="0"> First we propose a method of evaluation of the lexical learning.</Paragraph>
      <Paragraph position="1">  ments of 200 polar atoms. k=0.83. It is costly to make consistent and large 'gold standards' in multiple domains, especially in identification tasks such as clause-level SA (cf. classification tasks). Therefore we evaluated the learning results by asking human annotators to classify the acquired polar atoms as positive, negative, and neutral, instead of the instances of polar clauses detected with the new lexicon. This can be done because the polar atoms themselves are informative enough to imply to humans whether the expressions hold positive or negative meanings in the domain.</Paragraph>
      <Paragraph position="2"> To justify the reliability of this evaluation method, two annotators9 evaluated 200 randomly selected candidate polar atoms in the digital camera domain. The agreement results are shown in Table 7. The manual classification was agreed upon in 89% of the cases and the Kappa value was 0.83, which is high enough to be considered consistent.</Paragraph>
      <Paragraph position="3"> Using manual judgment of the polar atoms, we evaluated the performance with the following three metrics.</Paragraph>
      <Paragraph position="4"> Type Precision. The coincidence rate of the polarity between the acquired polar atom and the human evaluators' judgments. It is always false if the evaluators judged it as 'neutral.' Token Precision. The coincidence rate of the polarity, weighted by its frequency in the corpus. This metric emulates the precision of the detection of polar clauses with newly acquired poler atoms, in the runtime SA system.</Paragraph>
      <Paragraph position="5"> Relative Recall. The estimated ratio of the number of detected polar clauses with the expanded lexicon to the number of detected polar clauses with the initial lex9For each domain, we asked different annotators who are familiar with the domain. They are not the authors of this paper.</Paragraph>
      <Paragraph position="6">  The column '#' denotes the number of polar atoms acquired in each domain.</Paragraph>
      <Paragraph position="7"> icon. Relative recall will be 1 when no newpolaratomisacquired. Sincetheprecision was high enough, this metric can be used for approximation of the recall, which is hard to evaluate in extraction tasks such as clause-/phrase-level SA.</Paragraph>
    </Section>
    <Section position="2" start_page="361" end_page="362" type="sub_section">
      <SectionTitle>
6.2 Robustness for Different
Conditions
6.2.1 Diversity of Corpora
</SectionTitle>
      <Paragraph position="0"> For each of the four domain corpora, the annotators evaluated 100 randomly selected polar atoms which were newly acquired by our method, to measure the precisions. Relative recall is estimated by comparing the numbers of detected polar clauses from randomly selected 2,000 sentences, with and without the acquired polar atoms. Table 8 shows the results. The token precision is higher than 90% in all of the corpora, including the movie domain, which is considered to be difficult for SA (Turney, 2002). This is extremely high precision for this task, because the correctness of both the extraction and polarity assignment was evaluated simultaneously. The relative recall 1.28 in the digital camera domain means the recall is increased from 43%10 to 55%. The difference was smaller in other domains, but the domain-dependent polar clauses are much informative than general ones, thus the high-precision detection significantly enhances the system.</Paragraph>
      <Paragraph position="1"> To see the effects of our method, we conducted a control experiment which used pre-set criteria. To adopt the candidate atom a, the frequency of polarity, max(p(a),n(a)) was required to be 3 or more, and the ratio of polarity, max(p(a),n(a))f(a) was required to be higher than the threshold th. Varying th from 0.05 to 10The human evaluation result for digital camera do- null with various preset threshold values th for the digital camera and movie domains. The right-most star and circle denote the performance of our method.</Paragraph>
      <Paragraph position="2"> 0.8, we evaluated the token precision and the relative recall in the domains of digital cameras and movies. Figure 4 shows the results.</Paragraph>
      <Paragraph position="3"> The results showed both relative recall and token precision were lower than in our method for every th, in both corpora. The optimum th was 0.3 in the movie domain and 0.1 in the digital camera domain. Therefore, in this pre-set approach, a tuning process is necessary for each domain. Our method does not require this tuning, and thus fully automatic learning was possible.</Paragraph>
      <Paragraph position="4"> Unlike the normal precision-recall tradeoff, the token precision in the movie domain got lower when the th is strict. This is due to the frequent polar atoms which can be acquired at the low ratios of the polarity. Our method does not discard these important polar atoms.</Paragraph>
      <Paragraph position="5">  We also tested the performance while varying the size of the initial lexicon L. We prepared three subsets of the initial lexicon, L0.8, L0.5, and L0.2, removing polar atoms randomly. These lexicons had 0.8, 0.5, 0.2 times the polar atoms, respectively, compared to L. Table 9 shows the precisions and recalls using these lexicons for the learning process.</Paragraph>
      <Paragraph position="6"> Though the cd values vary, the precision was stable, which means that our method was robust even for different sizes of the lexicon. The smaller the initial lexicon, the higher the relative recall, because the polar atoms which were removed from L were recovered in the learning process. This result suggests the possibility of  the initial lexicon (the digital camera domain). the bootstrapping method from a small initial lexicon.</Paragraph>
    </Section>
    <Section position="3" start_page="362" end_page="362" type="sub_section">
      <SectionTitle>
6.3 Qualitative Evaluation
</SectionTitle>
      <Paragraph position="0"> As seen in the agreement study, the polar atoms used in our study were intrinsically meaningful to humans. This is because the atoms are predicate-argument structures derived from predicative clauses, and thus humans could imagine the meaning of a polar atom by generating the corresponding sentence in its predicative form.</Paragraph>
      <Paragraph position="1"> In the evaluation process, some interesting results were observed. For example, a negative atom nai - kerare-ga ('to be free from vignetting') was acquired in the digital camera domain. Even the evaluator who was familiar with digital cameras did not know the term kerare ('vignetting'), but after looking up the dictionary she labeled it as negative. Our learning method could pick up such technical terms and labeled them appropriately.</Paragraph>
      <Paragraph position="2"> Also, there were discoveries in the error analysis. An evaluator assigned positive to aru - kamera-ga ('to have camera') in the mobile phone domain, but the acquired polar atom had the negative polarity. This was actually an insight from the recent opinions that many userswantphoneswithoutcamerafunctions11.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML