XML Viewer - i05-2011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-2011_evalu.xml
Size: 6,229 bytes
Last Modified: 2025-10-06 13:59:21
<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2011">
  <Title>Automatic Detection of Opinion Bearing Words and Sentences</Title>
  <Section position="6" start_page="64" end_page="65" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> We tested our system on three different data sets.</Paragraph>
    <Paragraph position="1"> First, we ran the system on MPQA data provided by ARDA. Second, we participated in the novelty track of TREC 2003. Third, we ran it on our own test data described in Section 4.2.</Paragraph>
    <Section position="1" start_page="64" end_page="64" type="sub_section">
      <SectionTitle>
5.1 MPQA Test
</SectionTitle>
      <Paragraph position="0"> The MPQA corpus contains news articles manually annotated using an annotation scheme for subjectivity (opinions and other private states that cannot be directly observed or verified.</Paragraph>
      <Paragraph position="1"> (Quirk et al., 1985), such as beliefs, emotions, sentiment, speculation, etc.). This corpus was collected and annotated as part of the summer 2002 NRRC Workshop on Multi-Perspective Question Answering (MPQA) (Wiebe et al., 2003) sponsored by ARDA. It contains 535 documents and 10,657 sentences.</Paragraph>
      <Paragraph position="2"> The annotation scheme contains two main components: a type of explicit private state and speech event, and a type of expressive subjective element. Several detailed attributes and strengths are annotated as well. More details are provided in (Riloff et al., 2003).</Paragraph>
      <Paragraph position="3"> Subjective sentences are defined according to their attributes and strength. In order to apply our system at the sentence level, we followed their definition of subjective sentences. The annotation GATE_on is used to mark speech events and direct expressions of private states.</Paragraph>
      <Paragraph position="4"> The onlyfactive attribute is used to indicate whether the source of the private state or speech event is indeed expressing an emotion, opinion or other private state. GATE_expressivesubjectivity annotation marks words and phrases that indirectly express a private state.</Paragraph>
      <Paragraph position="5"> In our experiments, our system performed relatively well in both precision and recall. We interpret our opinion markers as coinciding with (enough of) the &amp;quot;subjective&amp;quot; words of MPQA. In order to see the relationship between the number of opinion-bearing words in a sentence and its classification by MPQA as subjective, we varied the threshold number of opinion-bearing words required for subjectivity. Table 4 shows accuracy, precision, and recall according to the list used and the threshold value t.</Paragraph>
      <Paragraph position="6"> The random row shows the average of ten runs of randomly assigning sentences as either subjective or objective. As we can see from Table 4, our word list which is the combination of the Collection1 and Collection2, achieved higher accuracy and precision than the Columbia list. However, the Columbia list achieved higher recall than ours. For a fair comparison, we took top 10682 opinion-bearing words from each side and ran the same sentence classifier system.</Paragraph>
    </Section>
    <Section position="2" start_page="64" end_page="65" type="sub_section">
      <SectionTitle>
5.2 TREC data
</SectionTitle>
      <Paragraph position="0"> Opinion sentence recognition was a part of the novelty track of TREC 2003 (Soboroff and Harman, 2003). The task was as follows. Given a TREC topic and an ordered list of 25 documents relevant to the topic, find all the opinion-bearing sentences. No definition of opinion was provided by TREC; their assessor's intuitions were considered final. In 2003, there were 22 opinion topics containing 21115 sentences in total. The opinion topics generally related to the pros and cons of some controversial subject, such as, &amp;quot;partial birth abortion ban&amp;quot;, &amp;quot;Microsoft antitrust charges&amp;quot;, &amp;quot;Cuban child refugee Elian Gonzalez&amp;quot;, &amp;quot;marijuana legalization&amp;quot;, &amp;quot;Clinton relationship with Lewinsky&amp;quot;, &amp;quot;death penalty&amp;quot;, &amp;quot;adoption same-sex partners, and etc. For the opinion topics, a sentence is relevant if it contains an opinion about that subject, as decided by the assessor.</Paragraph>
      <Paragraph position="1"> There was no categorizing of polarity of opinion or ranking of sentences by likelihood that they contain an opinion. F-score was used to measure system performance.</Paragraph>
      <Paragraph position="2"> We submitted 5 separate runs, using different models. Our best model among the five was Model 2. It performed the second best of the 55 runs in the task, submitted by 14 participating  In comparison, the HP-Subj (height precision subjectivity classifier) (Riloff, 2003) produced recall 40.1 and precision 90.2 on test data using text patterns, and recall 32.9 and precision 91.3 without patterns. These figures are comparable with ours.</Paragraph>
      <Paragraph position="3">  institutions. (Interestingly, and perhaps disturbingly, RUN3, which simply returned every sentence as opinion-bearing, fared extremely well, coming in 11th. This model now provides a baseline for future research.) After the TREC evaluation data was made available, we tested Model 1 and Model 2 further. Table 5 shows the performance of each model with the two best-performing cutoff values.</Paragraph>
    </Section>
    <Section position="3" start_page="65" end_page="65" type="sub_section">
      <SectionTitle>
5.3 Test with Our Data
</SectionTitle>
      <Paragraph position="0"> Section 4.2 described our manual data annotation by 3 humans. Here we used the work of one human as development test data for parameter tuning. The other set with 62 sentences on the topic of gun control we used as blind test data.</Paragraph>
      <Paragraph position="1"> Although the TREC and MPQA data sets are larger and provide comparisons with others' work, and despite the low kappa agreement values, we decided to obtain cutoff values on this data too.</Paragraph>
      <Paragraph position="2"> The graphs in Figure 3 show the performance of Models 1 and 2 with different values.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML