File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1025_metho.xml
Size: 14,405 bytes
Last Modified: 2025-10-06 14:10:11
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1025"> <Title>Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution</Title> <Section position="4" start_page="192" end_page="194" type="metho"> <SectionTitle> 3 Coreference Resolution Using Semantic </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="192" end_page="192" type="sub_section"> <SectionTitle> Knowledge Sources 3.1 Corpora Used </SectionTitle> <Paragraph position="0"> To establish a competitive coreference resolver, the system was initially prototyped using the MUC-6 and MUC-7 data sets (Chinchor & Sundheim, 2003; Chinchor, 2001), using the standard partitioning of 30 texts for training and 20-30 texts for testing. Then, we moved on and developed and tested the system with the ACE 2003 Training Data corpus (Mitchell et al., 2003)1. Both the Newswire (NWIRE) and Broadcast News (BNEWS) sections where split into 60-20-20% document-based partitions for training, development, and testing, and later per-partition merged (MERGED) for system evaluation. The distribution of coreference chains and referring expressions is given in Table 1.</Paragraph> </Section> <Section position="2" start_page="192" end_page="193" type="sub_section"> <SectionTitle> 3.2 Learning Algorithm </SectionTitle> <Paragraph position="0"> For learning coreference decisions, we used a Maximum Entropy (Berger et al., 1996) model. This was implemented using the MALLET library (McCallum, 2002). To prevent the model from overfitting, we employed a tunable Gaussian prior as a smoothing method. The best parameter value is found by searching in the [0,10] interval with step value of 0.5 for the variance parameter yielding the highest MUC score F-measure on the development data.</Paragraph> <Paragraph position="1"> Coreference resolution is viewed as a binary classification task: given a pair of REs, the classifier has to decide whether they are coreferent or not. The MaxEnt model produces a probability for each category y (coreferent or not) of a candidate pair, conditioned on the context x in which the candidate occurs. The conditional probability is calculated by:</Paragraph> <Paragraph position="3"> of the test data is restricted to ACE participants. Therefore, the results we report cannot be compared directly with those using the official test data.</Paragraph> <Paragraph position="4"> BNEWS (147 docs - 33,479 tokens) NWIRE (105 docs - 57,205 tokens) #coref ch. #pron. #comm. nouns #prop. names #coref ch. #pron. #comm. nouns #prop. names where fi(x,y) is the value of feature i on outcome y in context x, and li is the weight associated with i in the model. Zx is a normalization constant. The features used in our model are all binary-valued feature functions (or indicator functions), e.g.</Paragraph> <Paragraph position="6"> In our system, a set of pre-processing components including a POS tagger (Gim'enez & M`arquez, 2004), NP chunker (Kudoh & Matsumoto, 2000) and the Alias-I LingPipe Named Entity Recognizer2 is applied to the text in order to identify the noun phrases, which are further taken as referring expressions (REs) to be used for instance generation.</Paragraph> <Paragraph position="7"> Therefore, we use automatically extracted noun phrases, rather than assuming perfect NP chunking. This is in contrast to other related works in coreference resolution (e.g. Luo et al. (2004), Kehler et al. (2004)).</Paragraph> <Paragraph position="8"> Instances are created following Soon et al. (2001).</Paragraph> <Paragraph position="9"> We create a positive training instance from each pair of adjacent coreferent REs. Negative instances are obtained by pairing the anaphoric REs with any RE occurring between the anaphor and the antecedent.</Paragraph> <Paragraph position="10"> During testing each text is processed from left to right: each RE is paired with any preceding RE from right to left, until a pair labeled as coreferent is output, or the beginning of the document is reached.</Paragraph> <Paragraph position="11"> The classifier imposes a partitioning on the available REs by clustering each set of expressions labeled as coreferent into the same coreference chain.</Paragraph> </Section> <Section position="3" start_page="193" end_page="194" type="sub_section"> <SectionTitle> 3.3 Baseline System Features </SectionTitle> <Paragraph position="0"> Following Ng & Cardie (2002), our baseline system reimplements the Soon et al. (2001) system.</Paragraph> <Paragraph position="1"> The system uses 12 features. Given a potential antecedent REi and a potential anaphor REj the features are computed as follows3.</Paragraph> <Paragraph position="2"> STRING MATCH T if REi and REj have the same spelling, else F.</Paragraph> <Paragraph position="3"> ALIAS T if one RE is an alias of the other; else F. (b) Grammatical features I PRONOUN T if REi is a pronoun; else F.</Paragraph> <Paragraph position="4"> J PRONOUN T if REj is a pronoun; else F.</Paragraph> <Paragraph position="5"> J DEF T if REj starts with the; else F.</Paragraph> <Paragraph position="6"> J DEM T if REj starts with this, that, these, or those; else F.</Paragraph> <Paragraph position="7"> NUMBER T if both REi and REj agree in number; else F.</Paragraph> <Paragraph position="8"> GENDER U if either REi or REj have an undefined gender. Else if they are both defined and agree T; else F.</Paragraph> <Paragraph position="9"> PROPER NAME T if both REi and REj are proper names; else F.</Paragraph> <Paragraph position="10"> APPOSITIVE T if REj is in apposition with REi; else F.</Paragraph> <Paragraph position="11"> (c) Semantic features WN CLASS U if either REi or REj have an undefined WordNet semantic class. Else if they both have a defined one and it is the same T; else F. (d) Distance features DISTANCE how many sentences REi and REj are apart.</Paragraph> <Paragraph position="12"> 3Possible values are U(nknown), T(rue) and F(alse). Note that in contrast to Ng & Cardie (2002) we interpret ALIAS as a lexical feature, as it solely relies on string comparison and acronym string matching.</Paragraph> </Section> <Section position="4" start_page="194" end_page="194" type="sub_section"> <SectionTitle> 3.4 WordNet Features </SectionTitle> <Paragraph position="0"> In the baseline system semantic information is limited to WordNet semantic class matching. Unfortunately, a WordNet semantic class lookup exhibits problems such as coverage, sense proliferation and ambiguity4, which make the WN CLASS feature very noisy. We enrich the semantic information available to the classifier by using semantic similarity measures based on the WordNet taxonomy (Pedersen et al., 2004). The measures we use include path length based measures (Rada et al., 1989; Wu & Palmer, 1994; Leacock & Chodorow, 1998), as well as ones based on information content (Resnik, 1995; Jiang & Conrath, 1997; Lin, 1998).</Paragraph> <Paragraph position="1"> In our case, the measures are obtained by computing the similarity scores between the head lemmata of each potential antecedent-anaphor pair. In order to overcome the sense disambiguation problem, we factorise over all possible sense pairs: given a candidate pair, we take the cross product of each antecedent and anaphor sense to form pairs of synsets.</Paragraph> <Paragraph position="2"> For each measure WN SIMILARITY, we compute the similarity score for all synset pairs, and create the following features.</Paragraph> <Paragraph position="3"> WN SIMILARITY BEST the highest similarity score from all <SENSEREi,n, SENSEREj,m> synset pairs.</Paragraph> <Paragraph position="4"> WN SIMILARITY AVG the average similarity score from all <SENSEREi,n, SENSEREj,m> synset pairs.</Paragraph> <Paragraph position="5"> Pairs containing REs which cannot be mapped to WordNet synsets are assumed to have a null similarity measure.</Paragraph> </Section> <Section position="5" start_page="194" end_page="194" type="sub_section"> <SectionTitle> 3.5 Wikipedia Features </SectionTitle> <Paragraph position="0"> Wikipedia is a multilingual Web-based free-content encyclopedia5. The English version, as of 14 February 2006, contains 971,518 articles with 16.8 million internal hyperlinks thus providing a large coverage available knowledge resource. In addition, since May 2004 it provides also a taxonomy by means of the category feature: articles can be placed in one wikimedia.org/. In our experiments we use the English Wikipedia database dump from 19 February 2006.</Paragraph> <Paragraph position="1"> or more categories, which are further categorized to provide a category tree. In practice, the taxonomy is not designed as a strict hierarchy or tree of categories, but allows multiple categorisation schemes to co-exist simultaneously. Because each article can appear in more than one category, and each category can appear in more than one parent category, the categories do not form a tree structure, but a more general directed graph. As of December 2005, 78% of the articles have been categorized into 87,000 different categories.</Paragraph> <Paragraph position="2"> Wikipedia mining works as follows (for an in-depth description of the methods for computing semantic relatedness in Wikipedia see Strube & Ponzetto (2006)): given the candidate referring expressions REi and REj we first pull the pages they refer to. This is accomplished by querying the page titled as the head lemma or, in the case of NEs, the full NP. We follow all redirects and check for disambiguation pages, i.e. pages for ambiguous entries which contain links only (e.g. Lincoln). If a disambiguation page is hit, we first get all the hyperlinks in the page. If a link containing the other queried RE is found (i.e. a link containing president in the Lincoln page), the linked page (President of the United States) is returned, else we return the first article linked in the disambiguation page. Given a candidate coreference pair REi/j and the Wikipedia pages PREi/j they point to, obtained by querying pages titled as TREi/j, we extract the following features: I/J GLOSS CONTAINS U if no Wikipedia page titled TREi/j is available. Else T if the first paragraph of text of PREi/j contains TREj/i; else F.</Paragraph> </Section> </Section> <Section position="5" start_page="194" end_page="194" type="metho"> <SectionTitle> I/J RELATED CONTAINS U if no Wikipedia </SectionTitle> <Paragraph position="0"> page titled as TREi/j is available. Else T if at least one Wikipedia hyperlink of PREi/j contains TREj/i; else F.</Paragraph> </Section> <Section position="6" start_page="194" end_page="195" type="metho"> <SectionTitle> I/J CATEGORIES CONTAINS U if no Wiki- </SectionTitle> <Paragraph position="0"> pedia page titled as TREi/j is available. Else T if the list of categories PREi/j belongs to contains TREj/i; else F.</Paragraph> <Paragraph position="1"> GLOSS OVERLAP the overlap score between the first paragraph of text of PREi and PREj. Following Banerjee & Pedersen (2003) we compute the score as summationtextn m2 for n phrasal m-word overlaps. null Additionally, we use the Wikipedia category graph. We ported the WordNet similarity path length based measures to the Wikipedia category graph. However, the category relations in Wikipedia cannot only be interpreted as corresponding to is-a links in a taxonomy since they denote meronymic relations as well. Therefore, the Wikipedia-based measures are to be taken as semantic relatedness measures. The measures from Rada et al. (1989), Leacock & Chodorow (1998) and Wu & Palmer (1994) are computed in the same way as for WordNet. Path search takes place as a depth-limited search of maximum depth of 4 for a least common subsumer. We noticed that limiting the search improves the results as it yields a better correlation of the relatedness scores with human judgements (Strube & Ponzetto, 2006).</Paragraph> <Paragraph position="2"> This is due to the high regions of the Wikipedia category tree being too strongly connected.</Paragraph> <Paragraph position="3"> In addition, we use the measure from Resnik (1995), which is computed using an intrinsic information content measure relying on the hierarchical structure of the category tree (Seco et al., 2004). Given PREi/j and the lists of categories CREi/j they belong to, we factorise over all possible category pairs. That is, we take the cross product of each antecedent and anaphor category to form pairs of 'Wikipedia synsets'. For each measure WIKI RELATEDNESS, we compute the relatedness score for all category pairs, and create the following features.</Paragraph> <Paragraph position="4"> WIKI RELATEDNESS BEST the highest relatedness score from all <CREi,n,CREj,m> category pairs.</Paragraph> <Paragraph position="5"> WIKI RELATEDNESS AVG the average relatedness score from all <CREi,n,CREj,m> category pairs.</Paragraph> <Section position="1" start_page="195" end_page="195" type="sub_section"> <SectionTitle> 3.6 Semantic Role Features </SectionTitle> <Paragraph position="0"> The last semantic knowledge enhancement for the baseline system uses SRL information. In our experiments we use the ASSERT parser (Pradhan et al., 2004), an SVM based semantic role tagger which uses a full syntactic analysis to automatically identify all verb predicates in a sentence together with their semantic arguments, which are output as Prop-Bank arguments (Palmer et al., 2005). It is often the case that the semantic arguments output by the parser do not align with any of the previously identified noun phrases. In this case, we pass a semantic role label to a RE only when the two phrases share the same head. Labels have the form &quot;ARG1 pred1 ... ARGn predn&quot; for n semantic roles filled by a constituent, where each semantic argument label is always defined with respect to a predicate. Given such level of semantic information available at the RE level, we introduce two new features6. I SEMROLE the semantic role argument-predicate pairs of REi.</Paragraph> <Paragraph position="1"> J SEMROLE the semantic role argument-predicate pairs of REj.</Paragraph> <Paragraph position="2"> For the ACE 2003 data, 11,406 of 32,502 automatically extracted noun phrases were tagged with 2,801 different argument-predicate pairs.</Paragraph> </Section> </Section> class="xml-element"></Paper>