File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1071_evalu.xml
Size: 7,877 bytes
Last Modified: 2025-10-06 13:59:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1071"> <Title>Deeper Sentiment Analysis Using Machine Translation Technology</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> We conducted two experiments on the extraction of sentiment units from bulletin boards on the WWW that are discussing digital cameras. A total of 200 randomly selected sentences were analyzed by our system. The resources were created by looking at other parts of the same domain texts, and therefore this experiment is an open test.</Paragraph> <Paragraph position="1"> Experiment 1 measured the precision of the sentiment polarity, and Experiment 2 evaluated the informativeness of the sentiment units. In this section we handled only the sentiments [fav] and [unf] sentiments, thus the other two sentiments [qst] and [req] were not evaluated.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Experiment 1: Precision and Recall </SectionTitle> <Paragraph position="0"> In order to see the reliability of the extracted sentiment polarities, we evaluated the following three metrics: Weak precision: The coincidence rate of the sentiment polarity between the system's output and manual output when both the system and the human evaluators assigned either a favorable or unfavorable sentiment.</Paragraph> <Paragraph position="1"> Strong precision: The coincidence rate of the sentiment polarity between the system's output and manual output when the system assigned either a favorable or unfavorable sentiment.</Paragraph> <Paragraph position="2"> Recall: The detection rate of sentiment units within the manual output.</Paragraph> <Paragraph position="3"> These metrics are measured by using two methods: (A) our proposed method based on the machine translation engine, and (B) the lexicon-only method, which emulates the shallow parsing approach. The latter method used the simple polarity lexicon of adjectives and verbs, where an adjective or a verb had only one sentiment polarity, then no disambiguation was done. Except for the direct negation of output and the system output, respectively (f: favorable, n: non-sentiment, u: unfavorable). The sum of the bold numbers equals the numerators of the precision and recall.</Paragraph> <Paragraph position="4"> an adjective or a verb5, no translation patterns were used. Instead of the top-down pattern matching, sentiment units were extracted from any part of the tree structures (the results of full-parsing were used also here).</Paragraph> <Paragraph position="5"> Table 1 shows the results. With the MT framework, the weak precision was perfect, and also the strong precision was much higher, while the recall was lower than for the lexicon-only method. Their breakdowns in the two parts of Table 2 indicate that most of errors where the system wrongly assigned either of sentiments (i.e. human regarded an expression as non-sentiment) have been reduced with the MT framework.</Paragraph> <Paragraph position="6"> All of the above results are consistent with intuition. The MT method outputs a sentiment unit only when the expression is reachable from the root node of the syntactic tree through the combination of sentiment fragments, while the lexicon-only method picks up sentiment units from any node in the syntactic tree. The sentence (6) is an example where the lexicon-only method output the wrong sentiment unit (6a). The MT method did not output this sentiment unit, thus the precision values of the MT method did not suffer from this example.</Paragraph> <Paragraph position="7"> ... gashitsu-ga kirei-da-to iu hyouka-ha uke-masen-deshi-ta. (6) 'There was no opinion that the picture was sharp.' / [fav] clear h picture i (6a) In the lexicon-only method, some errors occurred due to the ambiguity in sentiment polarity of an adjective or a verb, e.g. &quot;Kanousei-ga takai. (Capabilities are high.)&quot; since 'takai (high/expensive)' is always assigned the [unf] feature.</Paragraph> <Paragraph position="8"> 5&quot;He doesn't like it.&quot; is regarded as negation, but &quot;I don't think it is good.&quot; is not.</Paragraph> <Paragraph position="9"> declinable noun noun noun ga wo ni Figure 10: A na&quot;ive predicate-argument structure used by the system (C). Nouns preceding three major postpositional particles 'ga', 'wo', and 'ni' are supported as the slots of arguments. On the other hand, in the system (A), there are over 3,000 principal patterns that have information on appropriate combinations for each verb and adjective.</Paragraph> <Paragraph position="10"> The numbers mean the counts of the better output for each system among 35 sentiment units. The remainder is the outputs that were the same in both systems.</Paragraph> <Paragraph position="11"> The recall was not so high, especially in the MT method, but according to our error analysis the recall can be increased by adding auxiliary patterns. Ontheotherhand, itisalmostimpossibletoincrease the precision without our deep analysis techniques. Consequently, ourproposedmethodoutperformsthe shallow (lexicon-only) approach.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Experiment 2: Scope of Sentiment Unit </SectionTitle> <Paragraph position="0"> We also compared the appropriateness of the scope of the extracted sentiment units between (A) the proposed method with the MT framework and (C) a method that supports only na&quot;ive predicate-argument structures as in Figure 10 and doesn't use any nominal patterns.</Paragraph> <Paragraph position="1"> According to the results shown in Table 3, the MT method produced less redundant or more informative sentiment units than did relying on the na&quot;ive predicate-argument structures in about half of the cases among the 35 extracted sentiment units.</Paragraph> <Paragraph position="2"> The following example (7) is a case where the sentiment unit output by the MT method (7a) was less redundant than that output by the na&quot;ive method (7b). The translation engine understood that the phrase 'kyonen-no 5-gatsu-ni (last May)' held temporal information, therefore it was excluded from the arguments of the predicate 'enhance', while both 'function'and'May'weretheargumentsof'enhance' in (7b). Apparently the argument 'May' is not necessary here.</Paragraph> <Paragraph position="3"> ... kyonen-no 5-gatsu-ni kinou-ga kairyou-sare-ta you-desu. (7) 'It seems the function was enhanced last May.' [fav] enhance h function i (7a) ? [fav] enhance h function, May i (7b) Example (8) is another case where the sentiment unit output by the MT method (8a) was more informative than that output by the na&quot;ive method (8b). Than the Japanese functional noun 'hou', its modifier 'zoom' was more informative. The MT method successfully selected the noun 'zoom' as the argument of 'desirable'.</Paragraph> <Paragraph position="4"> ... zuum-no hou-ga nozomashii. (8) 'A zoom is more desirable.' [fav] desirable h zoom i (8a) ? [fav] desirable h hou i (8b) The only one case we encountered where the MT method extracted a less informative sentiment unit was the sentence &quot;Botan-ga satsuei-ni pittaridesu (The shutter is suitable for taking photos)&quot;. The na&quot;ive method could produce the sentiment unit &quot;[fav] suitable h shutter, photo i&quot;, but the MT method created &quot;[fav] suitable h shutter i&quot;. This is due to the lack of a noun phrase preceding the post-positional particle 'ni' in the principal pattern. Such problems can be avoided by modifying the patterns, and thus the effect of the combination of patterns for SA has been shown here.</Paragraph> </Section> </Section> class="xml-element"></Paper>