File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1306_metho.xml
Size: 26,802 bytes
Last Modified: 2025-10-06 14:14:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1306"> <Title>High Precision Coreference CogNIAC : with Limited Knowledge Resources and Linguistic</Title> <Section position="4" start_page="38" end_page="39" type="metho"> <SectionTitle> 2) Reflexive: Pick nearest possible </SectionTitle> <Paragraph position="0"> antecedent in read-in portion of current sentence if the anaphor is a reflexive pronoun: 16 correct, and I incorrect.</Paragraph> <Paragraph position="1"> Mafiana motioned for Sarah to seat herself on a two-seater lounge.</Paragraph> <Paragraph position="2"> 3) Unique in Current + Prior: If there is a single possible antecedent i in the prior sentence and the read-in portion of the current sentence, then pick i as the antecedent: 114 correct, and 2 incorrect.</Paragraph> <Paragraph position="3"> Rupert Murdock's News Corp. confirmed his interest in buying back the ailing New York Post. But analysts said that if be winds up bidding for the paper .....</Paragraph> </Section> <Section position="5" start_page="39" end_page="39" type="metho"> <SectionTitle> 4) Possessive Pro: If the anaphor is a </SectionTitle> <Paragraph position="0"> possessive pronoun and there is a single exact string match i of the possessive in the prior sentence, then pick i as the antecedent: 4 correct, and 1 incorrect.</Paragraph> <Paragraph position="1"> After he was dry, Joe carefully laid out the damp towel in front of his locker. Travis went over to his locker, took out a towel and started to dry off.</Paragraph> <Paragraph position="2"> 5) Unique Current Sentence: If there is a single possible antecedent in the read-in portion of the current sentence, then pick i as the antecedent: 21 correct, and 1 incorrect.</Paragraph> <Paragraph position="3"> Like a large bear, he sat motionlessly in the lounge in one of the faded armchairs, watching Constantin. After a week Constantin tired of rreading the old novels in the bottom shelf of the bookcase-somewhere among the gray well thumbed pages he had hoped to find a message from one of his predecessors .....</Paragraph> </Section> <Section position="6" start_page="39" end_page="39" type="metho"> <SectionTitle> 6) Unique Subject/ Subject Pronoun: </SectionTitle> <Paragraph position="0"> If the subject of the prior sentence contains a single possible antecedent i, and the anaphor is the subject of the current sentence, then pick i as the antecedent: 11 correct, and 0 incorrect.</Paragraph> <Paragraph position="1"> Besides, if he provoked Malek, uncertainties were introduced, of which there were already far too many. He noticed the supervisor enter the lounge ...</Paragraph> <Paragraph position="2"> The method of resolving pronouns within CogNIAC works as follows: Pronouns are resolved left-to-right in the text. For each pronoun, the rules are applied in the presented order. For a given rule, if an antecedent is found, then the appropriate annotations are made to the text and no more rules are tried for that pronoun, otherwise the next rule is tried. If no rules resolve the pronoun, then it is left unresolved. These rules are individually are high precision rules, and collectively they add up to reasonable recall. The precision is 97% (121/125) and the recall is 60% (121/201) for 198 pronouns of training data.</Paragraph> </Section> <Section position="7" start_page="39" end_page="41" type="metho"> <SectionTitle> 3 Evaluation: </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="39" end_page="40" type="sub_section"> <SectionTitle> 3.1 Comparison to Hobbs' Naive </SectionTitle> <Paragraph position="0"> Algorithm: The Naive Algorithm \[Hobbs 1976\] works by specifying a total order on noun phrases in the prior discourse and comparing each noun phrase against the selectional restrictions (i.e. gender, number) of the anaphor, and taking the antecedent to be the first one to satisfy them. Thespecification of the ordering constitutes a traversal order of the syntax tree of the anaphors clause and from there to embedding clauses and prior clauses.</Paragraph> <Paragraph position="1"> The Winograd sentences, with either verb, would yield the following ordering of possible antecedents: The city council > the women The algorithm would resolve they to The city council. This is incorrect on one choice of verb, but the algorithm does not integrate the verb information into the salience ranking.</Paragraph> <Paragraph position="2"> In comparison, none of the six rules of CogNIAC would resolve the pronoun. Rules have been tried that resolved a subject pronoun of a nested clause with the subject of the dominating clause, but no configuration has been found that yielded sufficient precision 2. Consequently, they is not resolveff.</Paragraph> <Paragraph position="3"> The naive algorithm has some interesting properties. First it models relative salience as relative depth in a search space. For two candidate antecedents a and b, if a is encountered before b in the search space, then a is more salient than b. Second, the relative saliency of all candidate antecedents is totally ordered, that is, for any two candidate antecedents a and b, a is more salient than b xor b is more salient than a.</Paragraph> <Paragraph position="4"> 2 In experiment 2, discussed below, the rule 'subject same clause' would resolve they to the city council, but it was added to the MUC-6 system without testing, and has shown itself to not be a high precision rule.</Paragraph> <Paragraph position="5"> CogNIAC shares several features of the Naive Algorithm: * Both use basic selectional restrictions to find semantically acceptable potential antecedents.</Paragraph> <Paragraph position="6"> * Both use highly syntactic generalizations to resolve anaphors, and do not attempt to do more sophisticated semantic processing.</Paragraph> <Paragraph position="7"> But they also differ in significant ways: * CogNIAC is not committed to totally ordering all potential antecedents, the Naive Algorithm is.</Paragraph> <Paragraph position="8"> * CogNIAC is sensitive to ambiguity, i.e.</Paragraph> <Paragraph position="9"> circumstances of many possible antecedents, and will not resolve pronouns in such cases.</Paragraph> <Paragraph position="10"> The Naive Algorithm has no means of noting ambiguity and will resolve a pronoun as long as there is at least one possible antecedent.</Paragraph> <Paragraph position="11"> Perhaps the most convincing reason to endorse partially ordered salience rankings is that salience distinctions fade as the discourse moves on.</Paragraph> <Paragraph position="12"> Earl was working with Ted the other day. He fell into the threshing machine.</Paragraph> <Paragraph position="13"> Earl was working with Ted the other day.</Paragraph> <Paragraph position="14"> All of the sudden, the cows started making a ruckus. The noise was unbelievable. He fell into the threshing machine.</Paragraph> <Paragraph position="15"> In the first example 'He' takes 'Earl' as antecedent, which is what rule 6, Unique Subject/Subject Pronoun, would resolve the pronoun to. However in the second example, the use of 'He' is ambiguous--a distinction that existed before is now gone. The Naive Algorithm would still maintain a salience distinction between 'Earl' and 'Ted', where CogNIAC has no rule that makes a salience distinction between subject and object of a sentence which has two intervening sentences. The closest rule would be Unique in Discourse, rule 1, which does not yield a unique antecedent.</Paragraph> </Section> <Section position="2" start_page="40" end_page="41" type="sub_section"> <SectionTitle> 3.2 Performance: </SectionTitle> <Paragraph position="0"> CogNIAC has been evaluated in two different contexts. The goal of the first experiment was to establish relative performance of CogNIAC to Hobbs' Naive Algorithm--a convenient benchmark that allows indirect comparison to other algorithms. The second experiment reports results on Wall Street Journal data.</Paragraph> <Paragraph position="1"> The chosen domain for comparison with Hobbs' Naive Algorithm was narrative texts about two persons of the same gender told from a third person perspective. The motivation for this data was that we wanted to maximize the ambiguity of resolving pronouns. Only singular third person pronouns were considered. The text was pre-processed with a part-of-speech tagger over which basal noun phrases were delimited and finite clauses and their relative nesting were identified by machine. This pre-processing was subjected to hand correction in order to make comparison with Hobbs' as fair as possible since that was an entirely hand executed algorithm, but CogNIAC was otherwise machine run and scored. Errors were not chained, i.e. in left-to-right processing of the text, earlier mistakes were corrected before processing the next noun phrase.</Paragraph> <Paragraph position="2"> Since the Naive Algorithm resolves all pronouns, two lower precision rules were added to rules 1-6) for comparisons sake. The rules are: 7) Cb-Picking3: If there is a Cb i in the current finite clause that is also a candidate antecedent, then pick i as the antecedent.</Paragraph> <Paragraph position="3"> 8) Pick Most Recent: Pick the most recent potential antecedent in the text.</Paragraph> <Paragraph position="4"> The last two rules are lower precision than the first six, but perform well enough to merit their inclusion in a 'resolve all pronouns' configuration. Rule 7 performed reasonably well with 77% precision in training (10/13 correct for 201 pronouns), and rule 8 performed with 65% precision in training (44/63 correct). The first six rules each had a precision of greater than 90% for the training data with the exception of rule 4 which had a precision of 80% for 5 resolutions. The summary performance of the Naive Since both the Naive Algorithm and the resolve all pronouns configuration of CogNIAC are required to resolve all pronouns, precision and recall figures are not appropriate. Instead % correct figures are given. The high precision version of CogNIAC is reported with recall (number correct/number of instances of coreference) and precision (number correct/number of guesses) measures.</Paragraph> <Paragraph position="5"> The conclusion to draw from these results is: if forced to commit to all anaphors, CogNIAC performs comparably to the Naive Algorithm. Lappin and Leass 3 Rule 7 is based on the primitives of Centering Theory (Grosz, Joshi and Weinstein '86). The Cb of an utterance is the highest ranked NP (Ranking being: Subject > All other NPs) from the prior finite clause realized anaphorically in the current finite clause. Please see Baldwin '95 for a full discussion of the details of the rule.</Paragraph> <Paragraph position="6"> 1994 correctly resolved 86% of 360 pronouns in computer manuals. Lapin and Leass run Hobbs' algorithm on the their data and the Naive Algorithm is correct 82% of the time--4% worse. This allows indirect comparison with CogNIAC, with the suggestive conclusion that the resolve all pronouns configuration of CogNIAC, like the Naive Algorithm, is at least in the ballpark of more modern approaches 4. The breakdown of the individual rules is as follows: Far more interesting to consider is the performance of the high precision rules 1 through 6. The first four rules perform quite well at 96% precision (148/154) and 50% recall (148/298). Adding in rules 5 and 6 resolves a total of 190 pronouns correctly, with only 16 mistakes, a precision of 92% and recall of 64%. This contrasts strongly with the resolve-all-pronouns results of 78%. The last two rules, 7 and 8 performed quite badly on the test data. Despite their poor performance, CogNIAC still remained comparable to the Naive Algorithm.</Paragraph> <Paragraph position="7"> pronouns in MUC-6 evaluation: CogNIAC was used as the pronoun component in the University Pennsylvania's coreference entry 5 in the MUC-6 evaluation. Pronominal anaphora constitutes 17% of coreference annotations in the evaluation data used. The remaining instances of anaphora included common noun anaphora and coreferent instances of proper nouns. As a result being part of a larger system, changes were made to CogNIAC to make it fit in better with the other components of the overall system in addition to adding rules that were specialized for the new kinds of pronominal anaphora. These changes include: 4 This is not to say that RAP was not an advancement of the state of the art. A significant aspect of that research is that both RAP and the Naive Algorithm were machine executed--the Naive Algorithm was not machine executed in either the Hobbs 76 paper or in the evaluation in this work. 5 Please see Baldwin et al '96 for performance statistics and a bit more detail about the entire system. * Processing quoted speech in a limited fashion (Quoted Speech).</Paragraph> <Paragraph position="8"> * Addition of a rule that searched back for a unique antecedent through the text at first 3 sentences back, 8 sentences back, 12 sentences back and so on (Search Back).</Paragraph> <Paragraph position="9"> * Addition of a partial parser \[Collins 1996\] to determine what a finite clause is.</Paragraph> <Paragraph position="10"> * A new pattern was added which selected the subject of the immediately surrounding clause (Subject Same Clause).</Paragraph> <Paragraph position="11"> * Addition of a pleonastic-it detector which filtered uses of it that were not pronominal.</Paragraph> <Paragraph position="12"> * Disabling of several rules because they did not appear to be appropriate for the domain; 4, 7 and 8.</Paragraph> <Paragraph position="13"> A total of thirty articles were used in the formal evaluation, of which I chose the first fifteen for closer analysis. The remaining fifteen were retained for future evaluations. The performance of CogNIAC was as follows:</Paragraph> <Paragraph position="15"> The precision (73%) is quite a bit worse than that encountered in the narrative. The performance of the individual rules was quite different from the narrative texts, as shown in the table below: The results for CogNIAC for all pronouns in the first 15 articles of the MUC-6 evaluation.</Paragraph> <Paragraph position="16"> Upon closer examination approximately 75% of the errors were due to factors outside the scope of the CogNIAC pronominal resolution component. Software problems accounted for 20% of the incorrect cases, another 30% were due to semantic errors like misclassification of a noun phrase into person or company, singular/plural etc. The remaining errors were due to incorrect noun phrase identification, failure to recognize pleonastic-it or other cases where there is no instance of an antecedent. However, 25% of the errors were due directly to the rules of CogNIAC being plain wrong.</Paragraph> </Section> </Section> <Section position="8" start_page="41" end_page="43" type="metho"> <SectionTitle> 4 Discussion: </SectionTitle> <Paragraph position="0"> CogNIAC is both an engineering effort and a different approach to information processing in variable knowledge contexts. Each point is addressed in turn.</Paragraph> <Section position="1" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.1 The utility of high precision </SectionTitle> <Paragraph position="0"> coreference: A question raised by a reviewer asked whether there was any use for high precision coreference given that it is not resolving as much correference as other methods. In the first experiment, the high precision version of CogNIAC correctly resolved 62% of the pronouns as compared to the resolve all pronouns version which resolved 79% of them--a 27% loss of overall recall.</Paragraph> <Paragraph position="1"> The answer to this question quite naturally depends on the application coreference is being used in. Some examples follow.</Paragraph> </Section> <Section position="2" start_page="42" end_page="42" type="sub_section"> <SectionTitle> Information Retrieval </SectionTitle> <Paragraph position="0"> Information retrieval is characterized as a process by which a query is used to retrieve relevant documents from a text database. Queries are typically natural language based or Boolean expressions. Documents are retrieved and ranked for relevance using various string matching techniques with query terms in a document and the highest scoring documents are presented to the user first.</Paragraph> <Paragraph position="1"> The role that coreference resolution might play in information retrieval is that retrieval algorithms that a) count the number of matches to a query term in a document, or b) count the proximity of matches to query terms, would benefit by noticing alternative realizations of the terms like 'he' in place 'George Bush'.</Paragraph> <Paragraph position="2"> In such an application, high precision coreference would be more useful than high recall coreference if the information retrieval engine was returning too many irrelevant documents but getting a reasonable number of relevant documents. The coreference would only help the scores of presumably relevant documents, but at the expense of missing some relevant documents. A higher recall, lower precision algorithm would potentially add more irrelevant documents.</Paragraph> </Section> <Section position="3" start_page="42" end_page="42" type="sub_section"> <SectionTitle> Coherence Checking </SectionTitle> <Paragraph position="0"> A direct application of the &quot;ambiguity noticing&quot; ability of CogNIAC is in checking the coherence of pronoun use in text for children and English as a second language learners. Ambiguous pronoun use is a substantial problem for beginning writers and language learners. CogNIAC could scan texts as they are being written and evaluate whether there was sufficient syntactic support from the context to resolve the pronoun--if not, then the user could be notified of a potentially ambiguous use. It is not clear that CogNIAC's current levels of performance could support such an application, but it is a promising application.</Paragraph> </Section> <Section position="4" start_page="42" end_page="42" type="sub_section"> <SectionTitle> Information Extraction </SectionTitle> <Paragraph position="0"> Information extraction amounts to filling in template like data structures from free text. Typically the patterns which are used to fill the templates are hand built. The latest MUC-6 evaluation involved management changes at companies. A major problem in information extraction is the fact that the desired information can be spread over many sentences in the text and coreference resolution is essential to relate relevant sentences to the correct individuals, companies etc. The MUC-6 correference task was developed with the idea that it would aid information extraction technologies.</Paragraph> <Paragraph position="1"> The consequences for an incorrectly resolved pronoun can be devastating to the final template filling task--one runs the risk of conflating information about one individual with another. High precision coreference appears to be a natural candidate for such applications.</Paragraph> </Section> <Section position="5" start_page="42" end_page="43" type="sub_section"> <SectionTitle> 4.2 The methodology behind CogNIAC </SectionTitle> <Paragraph position="0"> CogNIAC effectively circumscribes those cases where coreference can be done with high confidence and those cases that require greater world knowledge, but how might CogNIAC be a part of a more knowledge rich coreference application? CogNIAC as a set of seven or so high precision rules would act as an effective filter on what a more knowledge rich application would have to resolve. But the essential component behind CogNIAC is not the rules themselves, but the control structure of behind its coreference resolution algorithm. This control structure could control general inference techniques as well.</Paragraph> <Paragraph position="1"> An interesting way to look at CogNIAC is as a search procedure. The Naive Algorithm can be over simplified as depth first search over parse trees. Depth first search is also a perfectly reasonable control structure for an inference engine-- as it is with PROLOG. The search structure of CogNIAC could be characterized as parallel iterative deepening with solutions being accepted only if a unique solution is found to the depth of the parallel search. But there is not enough room in this paper to explore the general properties of CogNIAC's search and evaluation strategy.</Paragraph> <Paragraph position="2"> Another angle on CogNIAC's role with more robust knowledge sources is to note that the recall limitations of CogNIAC for the class of pronouns/data considered are due to insufficient filtering mechanisms on candidate antecedents. There is not a need to expand the space of candidate antecedents with additional knowledge, but rather eliminate semantically plausible antecedents with constraints from verb knowledge and other sources of constraints currently not available to the system.</Paragraph> <Paragraph position="3"> However, there are classes of coreference that require strong knowledge representation to assemble the initial set of candidate antecedents. This includes the realm of inferred definites &quot;I went to the house and opened the door&quot; and synonymy between definite common nouns as in &quot;the tax' and 'the levy.</Paragraph> </Section> <Section position="6" start_page="43" end_page="43" type="sub_section"> <SectionTitle> 4.3 The possibility of perfect </SectionTitle> <Paragraph position="0"> coreference Hobbs 1976 ultimately rejects the Naive Algorithm as a stand-alone solution to the pronoun resolution problem. In that rejection he states: The naive algorithm does not work. Anyone can think of examples where it fails. In these cases it not only fails; it gives no indication that it has failed and offers no help in finding the real antecedent.</Paragraph> <Paragraph position="1"> Hobbs then articulates a vision of what the appropriate technology is, which entails inference over an encoding of world knowledge. But is world knowledge inherent in resolving all pronouns as Hobbs skepticism seems to convey? It has not been clear up to this point whether any anaphora can be resolved with high confidence given that there are clear examples which can only be resolved with sophisticated world knowledge, e.g. the Winograd city council sentences. But the results from the first and second experiments demonstrate that it is possible to have respectable recall with very high precision (greater than 90%) for some kinds of pronominal resolution. However, good performance does not necessarily falsify Hobbs' skepticism.</Paragraph> <Paragraph position="2"> The high precision component of CogNIAC still makes mistakes, 8-9% error for the first experiment--it is harder to evaluate the second experiment. If it were the case that integration of world knowledge would have prevented those errors, then Hobbs' skepticism still holds since CogNIAC has only minimized the role of world knowledge, not eliminated it. In looking at the mistakes made in the second experiment, there were no examples that appea_ed to be beyond the scope of further improving the syntactic rules or expanding the basic categorization of noun phrases into person, company or place. For the data considered so far, there does appear to be a class of anaphors that can be reliably recognized and resolved with non-knowledge intensive techniques. Whether this holds in general remains an open question, but it is a central design assumption behind the system.</Paragraph> <Paragraph position="3"> A more satisfying answer to Hobbs' skepticism is contained in the earlier suggestive conjecture that world knowledge facilitates anaphora by eliminating ambiguity. This claim can be advanced to say that world knowledge comes into play in those cases of anaphora that do not fall under the purview of rules 1 through 7 and their refinements. If this is correct, then the introduction of better world knowledge sources will help in the recall of the system rather than the precision.</Paragraph> <Paragraph position="4"> Ultimately, the utility of CogNIAC is a function of how it performs. The high precision rules of CogNIAC performed very well, greater than 90% precision with good recall for the first experiment. In the second experiment, components other than the rules of CogNIAC began to degrade the performance of the system unduly. But there is promise in the high precision core of CogNIAC across varied domains.</Paragraph> </Section> </Section> <Section position="9" start_page="43" end_page="43" type="metho"> <SectionTitle> 5 The future of CogNIAC: </SectionTitle> <Paragraph position="0"> CogNIAC is currently the common noun and pronoun resolution component of the University of Pennsylvania's coreference resolution software and general NLP software (Camp). This paper does not address the common noun coreference aspects of the system but there are some interesting parallels with pronominal coreference. Some changes planned include the following sorts of coreference: The processing of split antecedents, John called Mary. They went to a movie.</Paragraph> <Paragraph position="1"> This class of coreference is quite challenging because the plural anaphor 'they' must be able to collect a set of antecedents from the prior discourse--but how far should it look back, and once it has found two antecedents, should it continue to look for more? Event reference is a class of coreference that will also prove to be quite challenging. For example: The computer won the match. It was a great triumph.</Paragraph> <Paragraph position="2"> The antecedent to 'It' could be any of 'The computer', 'the match' or the event of winning. The space of ambiguity will certainly grow substantially when events are considered as candidate antecedents.</Paragraph> <Paragraph position="3"> Currently the system uses no verb semantics to try and constrain possible coreference. While the Winograd sentences are too difficult for current robust lexical semantic systems, simpler generalizations about what can fill an argument are possible, consider: The price of aluminum rose today due to large purchases by ALCOA Inc. It claimed that it was not trying to corner the market.</Paragraph> <Paragraph position="4"> Since 'It' is an argument to 'claimed' , a verb that requires that its subject be animate, we can eliminate 'The price of aluminum' and 'today' from consideration, leaving 'ALCOA Inc.' as the sole singular antecedent from the prior sentence. Work has been done along these lines by Dagan '90.</Paragraph> </Section> class="xml-element"></Paper>