File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2906_metho.xml
Size: 20,423 bytes
Last Modified: 2025-10-06 14:10:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2906"> <Title>Resolving and Generating Definite Anaphora by Modeling Hypernymy using Unlabeled Corpora</Title> <Section position="5" start_page="37" end_page="40" type="metho"> <SectionTitle> 3 Models for Lexical Acquisition </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="37" end_page="39" type="sub_section"> <SectionTitle> 3.1 TheY-Model Ouralgorithmismotivatedbytheobservationthatin </SectionTitle> <Paragraph position="0"> a discourse, the use of the definite article (&quot;the&quot;) in a non-deictic context is primarily licensed if the concept has already been mentioned in the text. Hence a sentence such as &quot;The drug is very expensive&quot; generally implies that either the word drug itself was previously mentioned (e.g. &quot;He is taking a new drug for his high cholesterol.&quot;) or a hyponym of drug was previously mentioned (e.g. &quot;He is taking Lipitor for his high cholesterol.&quot;). Because it is straightforward to filter out the former case by string matching, the residual instances of the phrase &quot;the drug&quot; (without previous mentions of the word &quot;drug&quot; in the discourse) are likely to be instances of hypernymic definite anaphora. We can then determine which nouns earlier in the discourse (e.g. Lipitor) are likely antecedents by unsupervised statistical co-occurrence modeling aggregated over the entire corpus. All we need is a large corpus without any anaphora annotation and a basic tool for noun tagging and NP head annotation. The detailed algorithm is as follows: 1. Find each sentence in the training corpus that contains a definite NP ('the Y') and does not contain 'a Y', 'an Y' or other instantiations of Y2 appearing before the definite NP within a fixed window.3 2. In the sentences that pass the above definite NP and a/an test, regard all the head words (X) occurring in the current sentence before the definite NP and the ones occurring in previous two sentences as potential antecedents.</Paragraph> <Paragraph position="1"> 3. Count the frequency c(X,Y) for each pair obtained in the above two steps and pre-store it in a table.4 The frequency table can be modified to give other scores for pair(X,Y) such as standard TF-IDF and Mutual Information scores.</Paragraph> <Paragraph position="2"> 4. Given a test sentence having an anaphoric def- null inite NP Y, consider the nouns appearing before Y within a fixed window as potential antecedents. Rank the candidates by their pre-computed co-occurence measures as computed in Step 3.</Paragraph> <Paragraph position="3"> Sinceweconsiderall headwordsprecedingthedefinite NP as potential correct antecedents, the raw frequency of the pair (X,Y) can be very noisy. This can be seen clearly in Table 1, where the first column shows the top potential antecedents of definite NP the drug as given by raw frequency. We normalize the raw frequency using standard TF-IDF niques for the TheY-Model in isolation. (60 million word corpus) and Mutual Information scores to filter the noisy pairs.5 In Table 2, we report our results for antecedent selection using Raw frequency c(X,Y), TF-IDF 6 and MI in isolation. Accuracy is the fraction of total examples that were assigned the correct antecedent and Accuracytag is the same excluding the examples that had POS tagging errors for the correct antecedent.7 Av Rank is the rank of the true antecedent averaged over the number of test examples.8 Based on the above experiment, the rest of thispaperassumesMutualInformationscoringtechnique for TheY-Model. 5Note that MI(X,Y ) = log P(X,Y ) P(X)P(Y ) and this is directly proportional to P(Y |X) = c(X,Y )c(X) for a fixed Y . Thus, we can simply use this conditional probability during implementation since the definite NP Y is fixed for the task of antecedent selection.</Paragraph> <Paragraph position="4"> ble for any model to miss the correct antecedent because it was not tagged correctly as a noun in the first place. There were 14 such examples in the test set and none of the model variants can find the correct antecdent in these instances.</Paragraph> <Paragraph position="5"> 8Knowing average rank can be useful when a n-best ranked list from coreference task is used as an input to other downstream tasks such as information extraction.</Paragraph> <Paragraph position="6"> bined model performance on the antecedent selection task. Corpus Size: 60 million words.</Paragraph> </Section> <Section position="2" start_page="39" end_page="39" type="sub_section"> <SectionTitle> 3.2 WordNet-Model (WN) </SectionTitle> <Paragraph position="0"> Because WordNet is considered as a standard resource of lexical knowledge and is often used in coreference tasks, it is useful to know how well corpus-based approaches perform as compared to a standard model based on the WordNet (version 2.0).9 The algorithm for the WordNet-Model is as follows: Given a definite NP Y and its potential antecedent X, choose X if it occurs as a hyponym (either direct or indirect inheritance) of Y. If multiple potential antecedents occur in the hierarchy of Y, choose the one that is closest in the hierarchy.</Paragraph> </Section> <Section position="3" start_page="39" end_page="39" type="sub_section"> <SectionTitle> 3.3 Combination: TheY+WordNet Model </SectionTitle> <Paragraph position="0"> Most of the literature on using lexical resources for definite anaphora has focused on using individual models (either corpus-based or manually build resources such as WordNet) for antecedent selection. Some of the difficulties with using WordNet is its limited coverage and its lack of empirical ranking model. We propose a combination of TheYModelandWordNet-Modeltoovercometheseprob- null lems. Essentially, we rerank the hypotheses found in WordNet-Model based on ranks of TheY-model or use a backoff scheme if WordNet-Model does not return an answer due to its limited coverage. Given a definite NP Y and a set of potential antecedents Xs the detailed algorithm is specified as follows: 1. Rerank with TheY-Model: Rerank the potential antecedents found in the WordNet-Model table by assiging them the ranks given by TheY-Model. If TheY-Model does not return a rank forapotentialantecedent, usetherankgivenby 9We also computed the accuracy using a weaker baseline, namely, selecting the closest previous headword as the correct antecedent. This recency based baseline obtained a low accuracy of 15% and hence we used the stronger WordNet based model for comparison purposes.</Paragraph> <Paragraph position="1"> the WordNet-Model. Now pick the top ranked antecedent after reranking.</Paragraph> <Paragraph position="2"> 2. Backoff: If none of the potential antecedents were found in the WordNet-Model then pick the correct antecedent from the ranked list of The-Y model. If none of the models return an answer then assign ranks uniformly at random.</Paragraph> <Paragraph position="3"> The above algorithm harnesses the strength of WordNet-Model to identify good hyponyms and the strength of TheY-model to identify which are more likely to be used as an antecedent. Note that this combination algorithm can be applied using any corpus-based technique to account for poor-ranking and low-coverage problems of WordNet and the Sections 3.4, 3.5 and 3.6 will show the results for backing off to a Hearst-style hypernym model. Table 4 shows the decisions made by TheY-model, WordNet-Model and the combined model for a sample of test examples. It is interesting to see how both the models mutually complement each other in these decisions. Table 3 shows the results for the models presentedsofarusinga60millionwordtrainingtext from the Gigaword corpus. The combined model results in a substantially better accuracy than the individual WordNet-Model and TheY-Model, indicating its strong merit for the antecedent selection task.10</Paragraph> </Section> <Section position="4" start_page="39" end_page="39" type="sub_section"> <SectionTitle> 3.4 OtherY-Modelfreq </SectionTitle> <Paragraph position="0"> This model is a reimplementation of the corpus-based algorithm proposed by Markert and Nissim (2005) for the equivalent task of antecedent selectionfordefiniteNPcoreference. Weimplementtheir approach of using the lexico-syntactic pattern X and A* other B* Y{pl} for extracting (X,Y) pairs.The A* and B* allow for adjectives or other modifiers to be placed in between the pattern. The model presented in their article uses the raw frequency as the criteria for selecting the antecedent.</Paragraph> </Section> <Section position="5" start_page="39" end_page="40" type="sub_section"> <SectionTitle> 3.5 OtherY-ModelMI(normalized) </SectionTitle> <Paragraph position="0"> We normalize the OtherY-Model using Mutual Information scoring method. Although Markert and Nissim (2005) report that using Mutual Information performs similar to using raw frequency, Table 5 shows that using Mutual Information makes a substantial impact on results using large training corpora relative to using raw frequency.</Paragraph> <Paragraph position="1"> Summary Keyword True TheY Truth WordNet Truth TheY+WN Truth (Def. Ana) Antecedent Choice Rank Choice Rank Choice Rank Both metal gold gold 1 gold 1 gold 1 correct sport soccer soccer 1 soccer 1 soccer 1 TheY-Model drug steroid steroid 1 NA NA steroid 1 helps drug azt azt 1 medication 2 azt 1 WN-Model instrument trumpet king 10 trumpet 1 trumpet 1 helps drug naltrexone alcohol 14 naltrexone 1 naltrexone 1 Both weapon bomb artillery 3 NA NA artillery 3 incorrect instrument voice music 9 NA NA music 9</Paragraph> </Section> <Section position="6" start_page="40" end_page="40" type="sub_section"> <SectionTitle> 3.6 Combination: TheY+OtherYMI Model </SectionTitle> <Paragraph position="0"> Our two corpus-based approaches (TheY and OtherY)makeuseofdifferentlinguisticphenomenaand null it would be interesting to see whether they are complementary in nature. We used a similar combination algorithm as in Section 3.3 with the WordNet-Model replaced with the OtherY-Model for hypernym filtering, and we used the noisy TheY-Model for reranking and backoff. The results for this approach are showed as the entry TheY+OtherYMI in Table 5. We also implemented a combination (OtherY+WN) of Other-Y model and WordNet-Model by replacing TheY-Model with OtherY-Model in the algorithm described in Section 3.3. The respective results are indicated as OtherY+WN entry in Table</Paragraph> </Section> </Section> <Section position="6" start_page="40" end_page="41" type="metho"> <SectionTitle> 5. 4 Further Anaphora Resolution Results </SectionTitle> <Paragraph position="0"> Table 5 summarizes results obtained from all the models defined in Section 3 on three different sizes of training unlabeled corpora (from Gigaword corpus). The models are listed from high accuracy to low accuracy order. The OtherY-Model performs particularly poorly on smaller data sizes, where coverage of the Hearst-style patterns maybe limited, as also observed by Berland and Charniak (1999).</Paragraph> <Paragraph position="1"> We further find that the Markert and Nissim (2005) OtherY-Model and our MI-based improvement do show substantial relative performance growth at increased corpus sizes, although they still underperform our basic TheY-Model at all tested corpus sizes. Also, the combination of corpus-based models (TheY-Model+OtherY-model) does indeed performs better than either of them in isolation. Finally, note that the basic TheY-algorithm still does relatively well by itself on smaller corpus sizes, suggesting its merit on resource-limited languages with smaller available online text collections and the unavailability of WordNet. The combined models of WordNet-Model with the two corpus-based approaches still significantly (p < 0.01) outperform any of the other individual models.11</Paragraph> </Section> <Section position="7" start_page="41" end_page="43" type="metho"> <SectionTitle> 5 Generation Task </SectionTitle> <Paragraph position="0"> Having shown positive results for the task of antecedent selection, we turn to a more difficult task, namely generating an anaphoric definite NP given a nominal antecedent. In Example (1), this would correspond to generating &quot;the drug&quot; as an anaphor knowing that the antecedent is pseudoephedrine.</Paragraph> <Paragraph position="1"> Thistaskclearlyhasmanyapplications: currentgeneration systems often limit their anaphoric usage to pronouns and thus an automatic system that does well on hypernymic definite NP generation can directly be helpful. It also has strong potential application in abstractive summarization where rewriting a fluent passage requires a good model of anaphoric usage.</Paragraph> <Paragraph position="2"> There are many interesting challenges in this problem: first of all, there maybe be multiple acceptable choices for definite anaphor given a particular antecedent, complicating automatic evaluation. Second, when a system generates a definite anaphora, the space of potential candidates is essentially unbounded, unlike in antecdent selection, where it is limited only to the number of potential antecedents in prior context. In spite of the complex nature of this problem, our experiments with the human judgements, WordNet and corpus-based approaches show a simple feasible solution. We evaluate our automatic approaches based on exact-match agreement with definite anaphora actually used in the corpus (accuracy) and also by agreement with definite anaphora predicted independently by a human judge in an absence of context.</Paragraph> <Paragraph position="3"> 11Note that syntactic co-reference candidate filters such as the Hobbs algorithm were not utilized in this study. To assess the performance implications, the Hobbs algorithm was applied to a randomly selected 100-instance subset of the test data. Although the Hobbs algorithm frequently pruned at least one of the coreference candidates, in only 2% of the data did such candidate filtering change system output. However, since both of these changes were improvements, it could be worthwhile to utilize Hobbs filtering in future work, although the gains would likely be modest.</Paragraph> <Section position="1" start_page="41" end_page="41" type="sub_section"> <SectionTitle> 5.1 Human experiment </SectionTitle> <Paragraph position="0"> We extracted a total of 103 <true antecedent, definite NP>pairs from the set of test instances used in the resolution task. Then we asked a human judge (a native speaker of English) to predict a parent class of the antecedent that could act as a good definite anaphora choice in general, independent of a particular context. Thus, the actual corpus sentence containing the antecedent and definite NP and its context was not provided to the judge. We took the predictions provided by the judge and matched them with the actual definite NPs used in the corpus.</Paragraph> <Paragraph position="1"> The agreement between corpus and the human judge was 79% which can thus be considered as an upper bound of algorithm performance. Table 7 shows a sample of decisions made by the human and how they agree with the definite NPs observed in the corpus. It is interesting to note the challenge of the sense variation and figurative usage. For example, &quot;corruption&quot; is refered to as a &quot;tool&quot; in the actual corpus anaphora, a metaphoric usage that would be difficult to predict unless given the usage sentence and its context. However, a human agreement of 79% indicate that such instances are relatively rare and the task of predicting a definite anaphor without its context is viable. In general, it appears from our experiements that humans tend to select from a relatively small set of parent classes when generating hypernymic definite anaphora. Furthermore, there appears to be a relatively context-independent concept of the &quot;natural&quot; level in the hypernym hierarchy for generating anaphors. For example, although <&quot;alkaloid&quot;, &quot;organic compound&quot;, &quot;compound&quot;, &quot;substance&quot;, &quot;entity&quot;> are all hypernyms of &quot;Pseudoephederine&quot; in WordNet, &quot;the drug&quot; appears to be the preferred hypernym for definite anaphora in the data, with the other alternatives being either too specific or too general to be natural.</Paragraph> <Paragraph position="2"> This natural level appears to be difficult to define by rule. For example, using just the immediate parent hypernym in the WordNet hierarchy only achieves 4% match with the corpus data for definite anaphor generation.</Paragraph> </Section> <Section position="2" start_page="41" end_page="42" type="sub_section"> <SectionTitle> 5.2 Algorithms </SectionTitle> <Paragraph position="0"> Thefollowingsectionspresentsourcorpus-basedalgorithms as more effective alternatives. with human judge and with definite NP used in the corpus.</Paragraph> <Paragraph position="1"> For the corpus-based approaches, the TheY-Model and OtherY-Model were trained in the same manner as for the antecedent selection task. The only difference was that in the generation case, the frequency statistics were reversed to provide a hypernym given a hyponym. Additionally, we found that raw frequency outperformed either TF-IDF or Mutual Information and was used for all results in Table 6. The stand-alone WordNet model is also very simple: Given an antecedent, we lookup its direct hypernym (using first sense) in the WordNet and use it as the definite NP, for lack of a better rule for preferred hypernym location.</Paragraph> <Paragraph position="2"> WordNet Each of the corpus-based approaches was combined with WordNet resulting in two different models as follows: Given an antecedent X, the corpus-based approach looks up in its table the hypernym of X, for example Y, and only produces Y as the output if Y also occurs in the WordNet as hypernym. Thus WordNet is used as a filtering tool for detecting viable hypernyms. This combination resulted in two models: 'TheY+WN' and 'OtherY+WN'.</Paragraph> <Paragraph position="3"> We also combined all the three approaches, 'TheY', 'OtherY' and WordNet resulting in a single model 'TheY+OtherY+WN'. This was done as follows: We first combine the models 'TheY' and 'OtherY' using a backoff model. The first priority is to use the hy- null man judge and our best performing model (TheY+OtherY+WN) on the generation task.</Paragraph> <Paragraph position="4"> pernym from the model 'OtherY', if not found then use the hypernym from the model 'TheY'. Given a definiteNPfromthebackoffmodel, applytheWord-Net filtering technique, specifically, choose it as the correct definite NP if it also occurs as a hypernym in the WordNet hierarchy of the antecedent.</Paragraph> </Section> <Section position="3" start_page="42" end_page="43" type="sub_section"> <SectionTitle> 5.3 Evaluation of Anaphor Generation </SectionTitle> <Paragraph position="0"> We evaluated the resulting algorithms from Section 5.2 on the definite NP prediction task as described earlier. Table 6 shows the agreement of the algorithm predictions with the human judge as well as with the definite NP actually observed in the corpus.</Paragraph> <Paragraph position="1"> It is interesting to see that WordNet by itself performs very poorly on this task since it does not have any word-specific mechanism to choose the correct level in the hierarchy and the correct word sense for selecting the hypernym. However, when combined with our corpus-based approaches, the agreement increases substantially indicating that the corpus-based approaches are effectively filtering the space of hypernyms that can be used as natural classes.</Paragraph> <Paragraph position="2"> Likewise, WordNet helps to filter the noisy hypernyms from the corpus predictions. Thus, this interplay between the corpus-based and WordNet algorithm works out nicely, resulting in the best model being a combination of all three individual models and achieving a substantially better agreement with both the corpus and human judge than any of the individual models. Table 7 shows decisions made by this algorithm on a sample test data.</Paragraph> </Section> </Section> class="xml-element"></Paper>