File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1640_intro.xml
Size: 5,125 bytes
Last Modified: 2025-10-06 14:03:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1640"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning</Title> <Section position="3" start_page="0" end_page="336" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Sentiment analysis is concerned with extracting attitudes, opinions, evaluations, and sentiment from text. Work in this area has been motivated by the desire to provide information analysis applications in the arenas of government, business, and politics (e.g. Coglianese (2004)). Additionally, sentiment analysis can augment existing NLP applications such as question answering, information retrieval, summarization, and clustering by providing information about sentiment (e.g. Stoyanov et al. (2005), Riloff et al. (2005)). To date, research in the area (see Related Work section) has focused on the problem of extracting sentiment both at the document level (coarse-grained sentiment information), and at the level of sentences, clauses, or individual expressions (finegrained sentiment information).</Paragraph> <Paragraph position="1"> In contrast, our work concerns the summarization of fine-grained information about opinions. In particular, while recent research efforts have shown that fine-grained opinions (e.g.</Paragraph> <Paragraph position="2"> Riloff and Wiebe (2003), Bethard et al. (2004), Wiebe and Riloff (2005)) as well as their sources (e.g. Bethard et al. (2004), Choi et al. (2005), Kim and Hovy (2005)) can be extracted automatically, little has been done to create opinion summaries, where opinions from the same source/target are combined, statistics are computed for each source/target and multiple opinions from the same source to the same target are aggregated. A simple opinion summary is shown in figure 1.1 We expect that this type of opinion summary, based on fine-grained opinion information, will be important for information analysis applications in any domain where the analysis of opinions is critical.</Paragraph> <Paragraph position="3"> This paper addresses the problem of opinion summarization by considering the creation of simple opinion summaries like those of figure 1. We propose source coreference resolution -- the task of determining which mentions of opinion sources refer to the same entity -- as the primary mechanism for identifying the set of opinions attributed to each real-world source. For this type of summary, source coreference resolution constitutes an integral step in the process of generating full opinion summaries. For example, given the opinion expressions of figure 1, their polarity, and the associated opinion sources and targets, the bulk of the resulting summary can be produced by recognizing that source mentions &quot;Zacarias Moussaoui&quot;, &quot;he&quot;, &quot;my&quot;, and &quot;Mr. Moussaoui&quot; all refer to the same person; and that source mentions &quot;Mr.</Paragraph> <Paragraph position="4"> Zerkin&quot; and &quot;Zerkin&quot; refer to the same person.2 At first glance, source coreference resolution appears equivalent to the task of noun phrase coreference resolution and therefore amenable to traditional coreference resolution techniques (e.g.</Paragraph> <Paragraph position="5"> Ng and Cardie (2002), Morton (2000)). We hypothesize in Section 3, however, that the task is likely to succumb to a better solution by treating it in the context of a new machine learning setting that we refer to as partially supervised clustering. In particular, due to high coreference annotation costs, data sets that are annotated with opinion information (like ours) do not typically include supervisory coreference information for all noun phrases in a document (as would be required for the application of traditional coreference resolution techniques), but only for noun phrases that act as opinion sources (or targets).</Paragraph> <Paragraph position="6"> As a result, we define the task of partially supervised clustering, the goal of which is to learn a clustering function from a set of partially specified clustering examples (Section 4). We are not aware of prior work on the problem of partially supervised clustering and argue that it differs substantially from that of semi-supervised clustering.</Paragraph> <Paragraph position="7"> We propose an algorithm for partially supervised clustering that extends a rule learner with structure information and is generally applicable to problems that fit the partially supervised clustering definition (Section 5). We apply the algorithm to the source coreference resolution task and evaluate its performance on a standard sentiment analysis data set that includes source coreference chains (Section 6). We find that our algorithm outperforms highly competitive baselines by a considerable margin - B3 score of 83.2 vs. 81.8 and 67.1 vs. 60.9 F1 score for the identification of positive source coreference links.</Paragraph> </Section> class="xml-element"></Paper>