File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/j98-2001_metho.xml
Size: 74,075 bytes
Last Modified: 2025-10-06 14:14:48
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-2001"> <Title>A Corpus-based Investigation of Definite Description Use</Title> <Section position="3" start_page="190" end_page="1192" type="metho"> <SectionTitle> 3. A First Experiment in Classification </SectionTitle> <Paragraph position="0"> For our first experiment evaluating subjects' performance at the classification task, we developed a taxonomy of definite description uses based on the schemes discussed in the previous section, preliminarily tested the taxonomy by annotating the corpus ourselves, and then asked two annotators to do the same task. This first experiment is described in the rest of this section. We explain first the classification we developed for this experiment, then the experimental conditions, and, finally, discuss the results.</Paragraph> <Section position="1" start_page="190" end_page="1192" type="sub_section"> <SectionTitle> 3.1 The First Classification Scheme </SectionTitle> <Paragraph position="0"> The annotation schemes for noun phrases proposed in the literature fall into one of two categories. On the one hand, we have what we might call labeling schemes, most typically used by corpus linguists, which involve assigning to each noun phrase a class such as those discussed in the previous section; the schemes used by Fraurud and Prince fall into this category. On the other hand, there are what we might call linking schemes, concerned with identifying the links between the discourse entity or Computational Linguistics Volume 24, Number 2 entities introduced by a noun phrase and other entities in the discourse; the scheme used in MUC-6 is of this type.</Paragraph> <Paragraph position="1"> In our experiments, we tried both a pure labeling scheme and a mixed labeling and linking scheme. We also tried two slightly different taxonomies of definite descriptions, and we varied the way membership in a class was defined to the subjects. Both taxonomies were based on the schemes proposed by Hawkins and Prince, but we introduced some changes in order, first, to find a scheme that would be easily understood by individuals without previous linguistic training and would lead to maximum agreement among the classifiers; and second, to make the classification more useful for our goal of feeding the results into an implementation.</Paragraph> <Paragraph position="2"> In the first experiment, we used a labeling scheme, and the classes were introduced to the subjects with reference to the surface characteristics of the definite descriptions. (See below and Appendix A.) The taxonomy we used in this experiment is a simplification of Hawkins's scheme, to which we made three main changes. First of all, we separated those anaphoric descriptions whose antecedents have the same descriptive content as their antecedent (which we will call anaphoric (same head)) from other cases of anaphoric descriptions in which the association is based on more complex forms of lexical or commonsense knowledge (synonyms, hypernyms, information about events, etc.). We grouped these latter definite descriptions with Hawkins's associative descriptions in a class that we called associative. This was done in order to see how much need there is for complex lexical inferences in resolving anaphoric definite descriptions, as opposed to simple head matching.</Paragraph> <Paragraph position="3"> Secondly, we grouped together all the definite descriptions that introduce a novel discourse entity not associated to some previously established object in the text, i.e., that were discourse-new in Prince's sense. This class, that we will call larger situation/unfamiliar, includes both definite descriptions that exploit situational information (Hawkins's larger situation uses) and discourse-new definite descriptions introduced together with their links or referents (unfamiliar). This was done because of Fraurud's observation that distinguishing the two classes is generally difficult (Fraurud 1990).</Paragraph> <Paragraph position="4"> Third, we did not include a class for immediate situation uses, since we assumed they would be rare in written text. ldeg We also introduced a separate class of idioms including indirect references, idiomatic expressions and metaphorical uses, and we allowed our subjects to mark definite descriptions as doubts.</Paragraph> <Paragraph position="5"> To summarize, the classes used in this experiment were as follows: I. Anaphoric same head. This class includes uses of definite descriptions that refer back to an antecedent introduced in discourse; it differs from Hawkins's anaphoric use or Prince's textually evoked classes because it only includes definite-antecedent pairs with the same head noun.</Paragraph> <Paragraph position="6"> (16) Grace Energy just two weeks ago hauled a rig here 500 miles from Caspar, Wyo., to drill the Bilbrey well, a 15,000-foot, $1-million-plus 10 This was indeed the case, but we did observe a few instances of an interesting kind of immediate situation use. In these cases, the text is describing the immediate situation in which the writer is, and the writer apparently expects the reader to reconstruct this situation: (i) &quot;And you didn't want me to buy earthquake insurance&quot;, says Mrs. Hammack, reaching across the table and gently tapping his hand.</Paragraph> <Paragraph position="7"> (ii) &quot;I will sit down and talk some of the problems out, but take on the political system ? Uh-uh', he says with a shake of the head.</Paragraph> <Paragraph position="8"> Poesio and Vieira A Corpus-based Investigation of Definite Description Use natural gas well. The rig was built around 1980, but has drilled only two wells, the last in 1982.</Paragraph> <Paragraph position="9"> II. Associative. We assigned to this class those definite descriptions that stand in an anaphoric or associative anaphoric relation with an antecedent explicitly mentioned in the text, but that are not identified by the same head noun as their antecedent. This class includes Hawkins's associative anaphoric definite descriptions and Prince's inferrables, as well as some definite descriptions that would be classified as anaphoric by Hawkins and as textually evoked in Prince (1981). Recognizing the antecedent of these definite descriptions involves at least knowledge of lexical associations, and possibly general commonsense knowledge. 11 (17) a. With all this, even the most wary oil men agree something has changed.</Paragraph> <Paragraph position="10"> &quot;It doesn't appear to be getting worse. That in itself has got to cause people to feel a little more optimistic,&quot; says Glenn Cox, the president of Phillips Petroleum Co. Though modest, the change reaches beyond the oil patch, too.</Paragraph> <Paragraph position="11"> b. Toni Johnson pulls a tape measure across the front of what was once a stately Victorian home. A deep trench now runs along its north wall, exposed when the house lurched two feet off its foundation during last week's earthquake.</Paragraph> <Paragraph position="12"> c. Once inside, she spends nearly four hours measuring and diagramming each room in the 80-year-old house, gathering enough information to estimate what it would cost to rebuild it. While she works inside, a tenant returns with several friends to collect furniture and clothing. One of the friends sweeps broken dishes and shattered glass from a countertop and starts to pack what can be salvaged from the kitchen.</Paragraph> <Paragraph position="13"> III. Larger situation~unfamiliar. This class includes Hawkins's larger situation uses of definite descriptions based on specific and general knowledge (discourse-new, hearer-old in Prince's terms) as well as his unfamiliar uses (many of which correspond to Prince's containing inferrables).</Paragraph> <Paragraph position="14"> (18) a. Out here on the Querecho Plains of New Mexico, however, the mood is more upbeat-trucks rumble along the dusty roads and burly men in hard hats sweat and swear through the afternoon sun.</Paragraph> <Paragraph position="15"> b. Norton Co. said net income for the third quarter fell 6% to $20.6 million, or 98 cents a share, from $22 million, or $1.03 a share.</Paragraph> <Paragraph position="16"> c. For the Parks and millions of other young Koreans, the long-cherished dream of home ownership has become a cruel illusion. For the government, it has become a highly volatile political issue.</Paragraph> <Paragraph position="17"> d. About the same time, the Iran-Iraq war, which was roiling oil markets, ended.</Paragraph> <Paragraph position="18"> 11 See L6bner (1985), Barker (1991), and Poesio (1994) for discussions of lexical conditions on bridging references. IV. Idiom. This class includes indirect references, idiomatic expressions, and metaphorical uses.</Paragraph> <Paragraph position="19"> (19) A recession or new OPEC blowup could put oil markets right back in the soup.</Paragraph> </Section> <Section position="2" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 3.2 Experimental Conditions </SectionTitle> <Paragraph position="0"> First of all, we classified the definite descriptions included in 20 randomly chosen articles from the Wall Street Journal contained in the subset of the Penn Treebank corpus included in the ACL/DCI CD-ROMJ 2 All together, these articles contain 1,040 instances of definite description use. The results of our analysis are summarized in Next, we asked two subjects to perform the same task. Our two subjects in this first experiment were graduate students in linguistics. The two subjects were given the instructions in Appendix A. They had to assign each definite description to one of the classes described in Section 3.1: I. anaphoric (same head), II. associative, III. larger situation/unfamiliar, and IV. idiom. The subjects could also express V. doubt about the classification of the definite description. Since the classes I-III are not mutually exclusive, we instructed the subjects to resolve conflicts according-to a preference ranking, i.e., to choose a class with higher preference when two classes seemed equally applicable. The ranking was (from most preferred to least preferred): 1) anaphoric (same head), 2) larger situation/unfamiliar, and 3) associative. The annotators were given one text to familiarize themselves with the task before starting with the annotation proper.</Paragraph> </Section> <Section position="3" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 3.3 Results 3.3.1 The Distribution of Definite Descriptions in Classes. The results of the first </SectionTitle> <Paragraph position="0"> annotator (henceforth, Annotator A) are shown in Table 2, and those of the second annotator (henceforth, Annotator B) in Table 3.</Paragraph> <Paragraph position="1"> As the tables indicate, the annotators assigned approximately the same percentage of definite descriptions to each of the five classes as we did; however, the classes do not always include the same elements. This can be gathered by the confusion matrix in Table 4, where an entry mx,y indicates the number of definite descriptions assigned to class x by subject A and to class y by subject B.</Paragraph> <Paragraph position="2"> 112 The texts in question are w0203, w0207, w0209, w0301, w0305, w0725, w0760, w0761, w0765, w0766, w0767, w0800, w0803, w0804 w0808, w0820, w1108, w1122, w1124, and w1137.</Paragraph> <Paragraph position="3"> In order to measure the agreement in a more precise way, we used the Kappa statistic (Siegel and Castellan 1988), recently proposed by Carletta as a measure of agreement for discourse analysis (Carletta 1996). We also used a measure of per-class agreement that we introduced ourselves. We discuss these results below, after reviewing briefly how K is computed.</Paragraph> <Paragraph position="4"> to assign items to one of a set of nonordered classes. The test computes a coefficient K of agreement among coders, which takes into account the possibility of chance agreement. It is dependent on the number of coders, number of items being classified, and number of choices of classes to be ascribed to items.</Paragraph> <Paragraph position="5"> The kappa coefficient of agreement between c annotators is defined as:</Paragraph> <Paragraph position="7"> where P(A) is the proportion of times the annotators agree and P(E) is the proportion Computational Linguistics Volume 24, Number 2 Table 5 An example of the Kappa test.</Paragraph> <Paragraph position="8"> Definite Description ASH ASS LSU S 1. the third quarter 0 0 3 1 2. the abrasives, engineering materials and petroleum services concern 0 2 1 0.33 3. The company 0 3 0 1 4. the year-earlier quarter 0 2 1 0.33 5. the tax credit 3 0 0 1 6. the engineering materials segment 1 1 1 0 7. the possible sale of all or part of Eastman Christensen 0 0 3 1 8. the nine months 0 0 3 1 9. the year-earlier period 0 2 1 0.33 10. the company 3 0 0 1 11. the company 3 0 0 1 12. the company 3 0 0 1 13. the company 3 0 0 1</Paragraph> <Paragraph position="10"> of times that we would expect the annotators to agree by chance. When there is complete agreement among the raters, K = 1; if there is no agreement other than that expected by chance, K = 0. According to Carletta, in the field of content analysis-where the Kappa statistic originated--K > 0.8 is generally taken to indicate good reliability, whereas 0.68 < K < 0.8 allows tentative conclusions to be drawn.</Paragraph> <Paragraph position="11"> We will illustrate the method for computing K proposed in Siegel and Castellan (1988) by means of an example from one of our texts, shown in Table 5.</Paragraph> <Paragraph position="12"> The first column in Table 5 (Definite description) shows the definite description being classified. The columns ASH, ASS, and LSU stand for the classification options presented to the subjects (anaphoric (same head), associative, and larger situation/unfamiliar, respectively). The numbers in each nq entry of the matrix indicate the number of classifiers that assigned the description in row i to the class in column j. The final colunm (labeled S) represents the percentage agreement for each definite description; we explain below how this percentage agreement is calculated. The last row in the table shows the total number of descriptions (N), the total number of descriptions assigned to each class and, finally, the total percentage agreement for all descriptions (Z).</Paragraph> <Paragraph position="13"> The equations for computing Si, PE, PA, and K are shown in Table 6. In these formulas, c is the number of coders; Si the percentage agreement for description i (we show $1 and $2 as examples); m the number of categories; T the total number of classification judgments; PE the percentage agreement expected by chance; PA the total agreement; and K the Kappa coefficient.</Paragraph> <Paragraph position="14"> :3.3.3 Value of K for the First Experiment. For the first experiment, K = 0.68 if we count idioms as a class, K = 0.73 if we take them out. The overall coefficient of agreement between the two annotators and our own analysis is K = 0.68 if we count idioms, K -~ 0.72 if we ignore them.</Paragraph> <Paragraph position="15"> measure the agreement per class, i.e., to understand where annotators agreed the most</Paragraph> <Paragraph position="17"> and where they disagreed the most. The confusion matrix does this to some extent, but only works for two annotators--and therefore, for example, we couldn't use it to measure agreement on classes between the two annotators and ourselves.</Paragraph> <Paragraph position="18"> We computed what we called per-class percentage of agreement for three coders (the two annotators and ourselves) by taking the proportion of pairwise agreements relative to the number of pairwise comparisons, as follows: whenever all three coders ascribe a description to the same class, we count six pairwise agreements out of six pairwise comparisons for that class (100%). If two coders ascribe a description to class obtained are presented in Table 7. The figures indicate better agreement on anaphoric same head and larger situation/unfamiliar definite descriptions, worse agreement on the other classes. (In fact, the percentages for idioms and doubts are very low; but these classes are also too small to allow us to draw any conclusions.)</Paragraph> </Section> <Section position="4" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 3.4 Discussion of the Results </SectionTitle> <Paragraph position="0"> 3.4.1 Distribution. One of the most interesting results of this first experiment is that a large proportion of the definite descriptions in our corpus (48.37deg, according to our own annotation; more, according to our two annotators) are not related to an Computational Linguistics Volume 24, Number 2 antecedent previously introduced in the text. Surprising as it may seem, this finding is in fact just a confirmation of the results of other researchers. Fraurud (1990) reports that 60.9% of definite descriptions in her corpus of 11 Swedish texts are first-mention, i.e., do not corefer with an entity already evoked in the text; 13 Gallaway (1996) found a distribution similar to ours in (English) spoken child language.</Paragraph> <Paragraph position="1"> low agreement among annotators. The reason for this disagreement was not so much annotators' errors as the fact, already mentioned, that the classes are not mutually exclusive. The confusion matrix in Table 4 indicates that the major classes of disagreements were definite descriptions classified by annotator A as larger situation and by annotator B as associative, and vice versa. One such example is the government in (20). This definite description could be classified as larger situation because it refers to the government of Korea, and presumably the fact that Korea has a government is shared knowledge; but it could also be classified as being associative on the predicate Koreans. 14 (20) For the Parks and millions of other young Koreans, the long-cherished dream of home ownership has become a cruel illusion. For the government, it has become a highly volatile political issue.</Paragraph> <Paragraph position="2"> We will analyze the reasons for the disagreement in more detail in relation to our second experiment, in which we also asked the annotators to indicate the antecedent of definite descriptions.</Paragraph> <Paragraph position="3"> in this experiment, we were able to confirm the correlation observed by Hawkins between the syntactic structure of certain definite descriptions and their classification as discourse-new. Factors that strongly suggest that a definite description is discourse-new (and in fact, presumably hearer-new as well) include the presence of modifiers such as first or best, and of a complement for NPs of the form the fact that ... or the conclusion that .... 15 Postnominal modification of any type is also a strong indicator of discourse novelty, suggesting that most postnominal clauses serve to establish a referent in the sense discussed in the previous section. In addition, we observed a previously unreported (to our knowledge) correlation between discourse novelty and syntactic constructions such as appositions, copular constructions, and comparatives. The following examples from our corpus illustrate the correlations just mentioned: (21) a. Mr. Ramirez, who arrived late at the Sharpshooter with his crew because he had started early in the morning setting up tanks at another site, just got the first raise he can remember in eight years, to $8.50 an hour from $8.</Paragraph> <Paragraph position="4"> b. Mr. Dinkins also has failed to allay Jewish voters' fears about his association with the Rev. Jesse Jackson, despite the fact that few local 13 As mentioned above, Fraurud's first-mention class consists of Prince's discourse-new, inferrables, and containing inferrables. 14 As discussed above, this problem with Hawkins's and Prince's classification schemes had already been Poesio and Vieira A Corpus-based Investigation of Definite Description Use non-Jewish politicians have been as vocal for Jewish causes in the past 20 years as Mr. Dinkins has.</Paragraph> <Paragraph position="5"> c. They wonder whether he has the economic know-how to steer the city through a possible fiscal crisis, and they wonder who will be advising him.</Paragraph> <Paragraph position="6"> d. The appetite for oil-service stocks has been especially strong, although some got hit yesterday when Shearson Lehman Hutton cut its short-term investment ratings on them.</Paragraph> <Paragraph position="7"> e. After his decisive primary victory over Mayor Edward I. Koch in September, Mr. Dinkins coasted, until recently, on a quite comfortable lead over his Republican opponent, Rudolph Giuliani, the former crime buster who has proved a something \[sic\] of a bust as a candidate.</Paragraph> <Paragraph position="8"> f. &quot;The bottom line is that he is a very genuine and decent guy&quot;, says Malcolm Hoenlein, a Jewish community leader.</Paragraph> <Paragraph position="9"> In addition, we observed a correlation between larger situation uses of definite descriptions (discourse-new, and often hearer-old) and certain syntactic expressions and lexical items. For example, we noticed that a large number of uses of definite descriptions in the corpus used for this first experiment referred to temporal entities such as the year or the month, or included proper names in place of the head noun or in pre-modifier position, as in the Querecho Plains of New Mexico and the Iran-Iraq war. Although these definite descriptions would have been classified by Hawkins as larger situation uses, in many cases they couldn't really be considered hearer-old or unused: what seems to be happening in these cases is that the writer assumed the reader would use information about the visual form of words, or perhaps lexical knowledge, to infer that an object of that name existed in the world.</Paragraph> <Paragraph position="10"> We evaluated the strength of these correlations by means of a computer simulation (Vieira and Poesio 1997). The system attempts to classify the definite descriptions found in texts syntactically annotated according to the Penn Treebank format. The system classifies a definite description as unfamiliar using heuristics based on the syntactic and lexical correlations just observed, i.e., if either (i) it includes an unexplanatory modifier, (ii) it occurs in an apposition or a copular construction, or (iii) it is modified by a relative clause or prepositional phrase. A definite description is classified as larger situation if its head noun is a temporal expression such as year or month, or if its head or premodifiers are head nouns. The implementation revealed that some of the correlations are very strong: for example, the agreement between the system's classification and the annotators' on definite descriptions with a nominal complement, such as the fact that ... varied between 93% and 100% depending on the annotator; and on average, 70% of temporal expressions such as the year were interpreted as larger situation by the annotators.</Paragraph> <Paragraph position="11"> All of this suggests that in using definite descriptions, writers may not just make assumptions about their readers' knowledge; they may also rely on their readers' ability to use lexical or syntactic cues to classify a definite description as discourse-new even when these readers don't know about the particular object referred to already. This observation is consistent with Fraurud's hypothesis that interpreting definite descriptions involves two processes--deciding whether a definite description relates to some entity in the discourse or not, and searching the antecedent--and that the two processes are fairly independent. Our findings also suggest that the classification process may rely on more than just lexical cues, as Fraurud seems to assume (taking up a suggestion in L6bner \[1985\]; see below).</Paragraph> </Section> </Section> <Section position="4" start_page="1192" end_page="1192" type="metho"> <SectionTitle> 4. The Second Experiment </SectionTitle> <Paragraph position="0"> In order to address some of the questions raised by Experiment 1 we set up a second experiment. In this second experiment we modified both the classification scheme and what we asked the annotators to do.</Paragraph> <Section position="1" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 4.1 Revisions to the Annotators&quot; Task </SectionTitle> <Paragraph position="0"> One concern we had in designing this second experiment was to understand better the reasons for the disagreement among annotators observed in the first experiment. In particular, we wanted to understand whether the classification disagreements reflected disagreements about the final semantic interpretation. Another difference between this new experiment and the first one is that we structured the task of deciding on a classification for a definite description around a series of questions originating a decision tree, rather than giving our subjects an explicit preference ranking. A third aspect of the first experiment we wanted to study more carefully was the distribution of definite descriptions, in particular, the characteristics of the large number of definite descriptions in the larger situation/unfamiliar class. Finally, we chose truly naive subjects to perform the classification task.</Paragraph> <Paragraph position="1"> In order to get a better idea of the extent of agreement among annotators about the semantic interpretation of definite descriptions, we asked our subjects to indicate the antecedent in the text for the definite descriptions they classified as anaphoric or associative. This would also allow us to test how well subjects did with a linking type of classification like the one used in MUC-6. We also replaced the anaphoric (same head) class we had in the first experiment with a broader coreferent class including all cases in which a definite description is coreferential with its antecedent, whether or not the head noun was the same: e.g., we asked the subjects to classify as coreferent a definite like the house referring back to an antecedent introduced as a Victorian home, which would not have counted as anaphoric (same head) in our first experiment.</Paragraph> <Paragraph position="2"> This resulted in a taxonomy that was at the same time more semantically oriented and closer to Hawkins's and Prince's classification schemes: our broadened coreferent class coincides with Hawkins's anaphoric and Prince's textually evoked classes, whereas the resulting, narrower associative class (that we called bridging references) coincides with Hawkins's associative anaphoric and Prince's class of inferrables. Our intention was to see whether the distinctions proposed by Hawkins and Prince would result in a better agreement among annotators than the taxonomy used in our first experiment, i.e., whether the subjects would be more in agreement about the semantic relation between a definite description and its antecedent than they were about the relation between the head noun of the definite description and the head noun of its antecedent.</Paragraph> <Paragraph position="3"> The larger situation/unfamiliar class we had in the first experiment was split back into two classes, as in Hawkins's and Prince's schemes. We did this to see whether indeed these two classes were difficult to distinguish; we also wanted to get a clearer idea of the relative importance of the two kinds of definites that we had grouped together in the first annotation. The two classes were called Larger situation and Unfamiliar.</Paragraph> </Section> <Section position="2" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 4.2 Experimental Conditions </SectionTitle> <Paragraph position="0"> We used three subjects for Experiment 2. Our subjects were English native speakers, graduate students of mathematics, geography, and mechanical engineering at the University of Edinburgh; we will refer to them as C, D, and E below. They were asked to annotate 14 randomly selected Wall Street Journal articles, all but one of them dif- null ferent from those used in Experiment 1, and containing 464 definite descriptions in total. 16 Unlike in our first experiment, we did not suggest any relation between the classes and the syntactic form of the definite descriptions in the instructions. The subjects were asked to indicate whether the entity referred to by a definite description (i) had been mentioned previously in the text, else if (ii) it was new but related to an entity already mentioned in the text, else (iii) it was new but presumably known to the average reader, or, finally, (iv) it was new in the text and presumably new to the average reader.</Paragraph> <Paragraph position="1"> When the description was indicated as discourse-old (i) or related to some other entity (ii), the subjects were asked to locate the previous mention of the related entity in the text. Unlike the first experiment, the subjects did not have the option of classifying a definite description as Idiom; we instructed them to make a choice and write down their doubts. The written instructions and the script given to the subjects can be found in Appendix B. As in Experiment 1, the subjects were given one text to practice before starting with the analysis of the corpus. They took, on average, eight hours to complete the task.</Paragraph> </Section> <Section position="3" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 4.3 Results </SectionTitle> <Paragraph position="0"> The distribution of definite descriptions in the four classes according to the three coders is shown in Table 8.</Paragraph> <Paragraph position="1"> We had 283 cases of complete agreement among annotators on the classification (61%): 164 cases of complete agreement on coreferential definite descriptions, 7 cases of complete agreement on bridging, 65 cases of complete agreement on larger situation, and 47 cases of complete agreement on the unfamiliar class.</Paragraph> <Paragraph position="2"> As in Experiment 1, we measured the K coefficient of agreement among annotators; the result for annotators C, D, and E is K = 0.58 if we consider the definite descriptions marked as doubts (in which case we have 464 descriptions and five classes), K = 0.63 if we leave them out (430 descriptions and the four classes I-IV).</Paragraph> <Paragraph position="3"> We also measured the extent of agreement among subjects on the antecedents for coreferential and bridging definite descriptions. A total of 164 descriptions were classified as coreferential by all three coders; of these, 155 (95%) were taken by all coders to refer to the same entity (although not necessarily to the same mention of that entity).</Paragraph> <Paragraph position="4"> 16 The texts are w0766, wsj_0003, wsj_0013, wsj_0015, wsj_0018, wsj_0020, wsj_0021, wsj_0022, wsj_0024, wsj_0026, wsj_0029, wsj_0034, wsj_0037, and wsj_0039.</Paragraph> <Paragraph position="5"> Computational Linguistics Volume 24, Number 2 There were only 7 definite descriptions classified by all three annotators as bridging references; in 5 of these cases (71%) the three annotators also agreed on a textual antecedent (i.e., on the discourse entity to which the bridging reference was related to).</Paragraph> </Section> <Section position="4" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 4.4 Discussion </SectionTitle> <Paragraph position="0"> scriptions among discourse-new, on the one side, and coreferential with bridging references, one the other, is roughly the same in Experiment 2 as in Experiment 1, and roughly the same among annotators. The average percentage of discourse-new descriptions (larger situation and unfamiliar together) is 46%, against an average of 50% in the first experiment. Having split the discourse-new class in two in this experiment, we got an indication of the relative importance of the hearer-old and hearer-new subclasses-about half of the discourse-new uses fall in each of these classes--but only very approximate, since the first two annotators classified the majority of these definite descriptions as larger situation, whereas the last annotator classified the majority as unfamiliar.</Paragraph> <Paragraph position="1"> As expected, the broader definition of the coreferent class resulted in a larger percentage of definite descriptions being included in this class (an average of 45%), and a smaller percentage being included in the bridging reference class. Considering the difference between the relative importance of the same-head anaphora class in the first experiment and of the coreferent class in the second experiment we can estimate that approximately 15% of definite descriptions are coreferential and have a different head from their antecedents.</Paragraph> <Paragraph position="2"> ment 2 was not very high: 61% total agreement, which gives K = 0.58 or K = 0.63, depending on whether we consider doubts as a class. 17 This value is worse than the one we obtained in Experiment 1 (K = 0.68 or K = 0.73); in fact, this value of K goes below the level at which we can tentatively assume agreement among the annotators.</Paragraph> <Paragraph position="3"> There could be several reasons for the fact that agreement got worse in this second experiment. Perhaps the simplest explanation is that we were just using more classes.</Paragraph> <Paragraph position="4"> In order to check whether this was the case, we merged the classes larger situation and unfamiliar back into one class, as we had in the Experiment 1: that is, we recomputed K after counting all definite descriptions classified as either larger situation or unfamiliar as members of the same class. And indeed, the agreement figures went up from K = 0.63 to K = 0.68 (ignoring doubts) when we did so, i.e., within the &quot;tentative&quot; margins of agreement according to Carletta (1996) (0.68 <_ x < 0.8).</Paragraph> <Paragraph position="5"> The remaining difference between the level of agreement obtained in this experiment and that obtained in the first one (K = 0.73, ignoring doubts) might have to do with the annotators, with the difficulty of the texts, or with using a syntactic (same head) as opposed to a semantic notion of what counts as coreferential; we are inclined to think that the last two explanations are more likely. For one thing, we found very few examples of true mistakes in the annotation, as discussed below. Secondly, we observed that the coefficient of agreement changes dramatically from text to text: in this second experiment, it varies from K = 0.42 to K = 0.92 depending on the text, and if we do not count the three worst texts in the second experiment, we get again K = 0.73. Third, going from a syntactic to a semantic definition of anaphoric definite description resulted in worse agreement both for coreferential and for bridging ref17 It is difficult to decide what is the best way to treat cases marked as doubts--whether to take them out or to include them as a separate class--so we give both figures below.</Paragraph> <Paragraph position="6"> erences: looking at the per-class figures, one can see that we went from a per-class agreement on anaphoric definite descriptions in Experiment 1 of 88% to a per-class agreement on coreferential definites of 86% in Experiment 2; and the per-class agreement for associative definite descriptions of 59% went down rather dramatically to a per-class agreement of 31% on bridging descriptions.</Paragraph> <Paragraph position="7"> The good result obtained by reducing the number of classes led us to try to find a way of grouping definite descriptions into classes that would result in an even better agreement. An obvious idea was to try with still fewer classes, i.e., just two. We first tried the binary division suggested by Fraurud: all coreferential definite descriptions 7 on one side (subsequent-mention), and all other definite descriptions on the other (first-mention). Splitting things this way did result in an agreement of K = 0.76, i.e., almost a good reliability, although not quite as strong an agreement as we would have expected. The alternative of putting in one class all discourse-related definite descriptions--coreferential and bridging references--and putting larger situation and unfamiliar definite descriptions in a second class resulted in a worse agreement, although not by much (K = 0.73).</Paragraph> <Paragraph position="8"> This suggests that our subjects did reasonably well at distinguishing first-mention from subsequent-mention entities, but not at drawing more complex distinctions. They were particularly bad at distinguishing bridging references from other definite descriptions: dividing the classifications into bridging definites, on the one hand, and all other definite descriptions, on the other, resulted in a very low agreement (K = 0.24).</Paragraph> <Paragraph position="9"> We obtained about the same results by computing the per-class percentage of agreement discussed in Section 3. The rates of agreement for each class thus obtained are presented in Table 9. Again, we find that the annotators found it easier to agree on co-referential definite descriptions, harder to agree on bridging references; the percentage agreement on the classes larger situation and unfamiliar taken individually is much lower than the agreement on the class larger situation/unfamiliar taken as a whole.</Paragraph> <Paragraph position="10"> The results in Table 9 confirm the indications obtained by computing agreement for a smaller number of classes: our subjects agree pretty much on coreferential definite descriptions, but bridging references are not a natural class. We discuss the cases of disagreement in more detail next.</Paragraph> <Paragraph position="11"> among annotators: about classification, and about the identification of an antecedent.</Paragraph> <Paragraph position="12"> There were 29 cases of complete classification disagreement among annotators, i.e., cases in which no two annotators classified a definite description in the same way, and 144 cases of partial disagreement. All four of the possible combinations of total disagreement were observed, but the two most common combinations were BCU (bridging, coreferential, and unfamiliar) and BLU (bridging, larger situation, and unfamiliar); all six combinations of partial disagreements were also observed. As we Computational Linguistics Volume 24, Number 2 do not have the space to discuss each case in detail, we will concentrate on pointing out what we take to be the most interesting observations, especially from the perspective of designing a corpus annotation scheme for anaphoric expressions.</Paragraph> <Paragraph position="13"> We found very few true mistakes. We had some problems due to the presence of idioms such as they had to pick up the slack or on the whole. But in general, most of the disagreements were due to genuine problems in assigning a unique classification to definite descriptions.</Paragraph> <Paragraph position="14"> The mistakes that our annotators did make were of the form exemplified by (22).</Paragraph> <Paragraph position="15"> In this case, all three annotators indicate the same antecedent (the potential payoff) for the definite description the rewards, but whereas two of them classify the rewards as coreferential, one of them classifies it as bridging. What seems to be happening here and in similar cases is that even though we asked the subjects to classify semantically, they ended up using a notion of relatedness that is more like the notion of associative in Experiment 1. (We found 10 such cases of partial disagreement between bridging and coreferential in which all three subjects indicated the same antecedent for the definite description.) (22) New England Electric System bowed out of the bidding for Public Service Co. of New Hampshire, saying that the risks were too high and the potential payoff too far in the future to justify a higher offer.</Paragraph> <Paragraph position="16"> &quot;When we evaluated raising our bid, the risks seemed substantial and persistent over the next five years, and the rewards seemed a long way out.&quot; A particularly interesting version of this problem appears in the following example, when two annotators took the verb to refund as antecedent of the definite description the refund, but one of them interpreted the definite as coreferential with the eventuality, the other as bridging.</Paragraph> <Paragraph position="17"> (23) Commonwealth Edison Co. was ordered to refund about $250 million to its current and former ratepayers for illegal rates collected for cost overruns on a nuclear power plant.</Paragraph> <Paragraph position="18"> The refund was about $55 million more than previously ordered by the Illinois Commerce Commission and trade groups said it may be the largest ever required of a state or local utility.</Paragraph> <Paragraph position="19"> As could be expected by the discussion of the K results above, the most common disagreements (35 cases of partial disagreement out of 144) were between the classes larger situation and unfamiliar. One typical source of disagreement was the introductory use of definite descriptions, common in newspapers: thus, for example, some of our annotators would classify the Illinois Commerce Commission as larger situation, others as unfamiliar. In many cases in which this form of ambiguity was encountered, the definite description worked effectively as a proper name: the world-wide supercomputer law, the new US trade law, or the face of personal computing.</Paragraph> <Paragraph position="20"> Rather surprisingly, from a semantic perspective, the second most common form of disagreement was between the coreferential and bridging classes. In this case, the problem typically was that different subjects would choose different antecedents for a certain definite description. Thus, in example (23), the third annotator indicated $250 million as the antecedent for the refund, and classified the definite description as Poesio and Vieira A Corpus-based Investigation of Definite Description Use coreferential. A similar example is (24), in which two of the annotators classified the spinoff as bridging on spinoff Cray Computer Corp., whereas the third classified it as coreferential with the pending spinoff.</Paragraph> <Paragraph position="21"> (24) The survival of spinoffCray Computer Corp. as a fledgling in the supercomputer business appears to depend heavily on the creativity--and longevity--of its chairman and chief designer, Seymour Cray.</Paragraph> <Paragraph position="22"> Documents filed with the Securities and Exchange Commission on the pending spinoff disclosed that Cray Research Inc. will withdraw the almost $100 million in financing it is providing the new firm if Mr. Cray leaves or if the product-design project he heads is scrapped.</Paragraph> <Paragraph position="23"> While many of the risks were anticipated when Minneapolis-based Cray Research first announced the spinoff in May, the strings it attached to the financing hadn't been made public until yesterday.</Paragraph> <Paragraph position="24"> An example of total (BLU) disagreement is the following: (25) Mr. Rapanelli recently has said the government of President Carlos Menem, who took office July 8, feels a significant reduction of principal and interest is the only way the debt problem may be solved.</Paragraph> <Paragraph position="25"> In this case, we can see that all three interpretations are acceptable: we may take the definite description the government of President Carlos Menem, who took office July 8, either as a case of bridging reference on the previously mentioned Argentina, or as a larger situation use, or as a case of unfamiliar definite description, especially if we assume that this latter class coincides with Prince's containing inferrables.</Paragraph> <Paragraph position="26"> In conclusion, our figures can be seen as an empirical verification of Fraurud's and Prince's hypothesis that the classification disagreements among annotators depend to a large extent on the task they are asked to do, rather than reflecting true differences in semantic intuitions.</Paragraph> <Paragraph position="27"> about the antecedent of a definite description.</Paragraph> <Paragraph position="28"> We have already discussed the most common case of antecedent disagreement: the case in which a definite description could equally well be taken as co-referential with one discourse entity or as bridging to another. For example, in an article in which the writer starts discussing Aetna Life & Casualty, and then goes on mentioning major insurers, either discourse entity could then serve as antecedent for the subsequent definite description the insurer, depending on whether the definite description is classified as coreferential or bridging.</Paragraph> <Paragraph position="29"> Perhaps the most interesting cases of disagreement about the antecedent are examples such as (26). One subject indicated parts of the factory as the antecedent; another indicated the/factory; and the third indicated areas of the factory.</Paragraph> <Paragraph position="30"> (26) About 160 workers at affactory that made paper for the Kent filters were exposed to asbestos in the 1950s. Areas of the factory were particularly Computational Linguistics Volume 24, Number 2 dusty where the crocidolite was used. Workers dumped large burlap sacks of the imported material into a huge bin, poured in cotton and acetate fibers and mechanically mixed the dry fibers in a process used to make filters. Workers described &quot;clouds of blue dust&quot; that hung over parts of the factory, even though exhaust fans ventilated the area.</Paragraph> <Paragraph position="31"> What's interesting about this example is that the text does not provide us with enough information to decide about the correct interpretation; it is as if the writer didn't think it necessary for the reader to assign an unambiguous interpretation to the definite description. Similar cases of underspecified definite descriptions have been observed before (e.g., Nunberg's John shot himself in the foot \[1978\] or I'm going to the store mentioned in Clark and Marshall \[1981\]) but no real account has been given of the conditions under which they are possible.</Paragraph> <Paragraph position="32"> sible it is to annotate corpora for anaphoric information. We observed two problems about the task of classifying definite descriptions: first, neither of the more complex classification schemes we tested resulted in a very good agreement among annotators; and second, even the task of identifying the antecedent of discourse-related definite descriptions (i.e., coreferential and bridging) is problematic--we only obtained an acceptable agreement in the case of coreferential definite descriptions, and it was difficult for our annotators to choose a single antecedent for a definite description when both bridging and coreference were allowed. These results indicate that annotating corpora for anaphoric information may be more difficult than expected. The task of indicating a unique antecedent for bridging definite descriptions appears to be especially challenging, for the reasons discussed above (multiple equally good antecedents and referential underspecification, for example).</Paragraph> <Paragraph position="33"> On the positive side, we have two observations: our subjects did reasonably well at distinguishing first-mention from subsequent-mention antecedents, and at identifying the antecedent of a subsequent-mention definite description. A classification scheme based on this distinction (such as Fraurud's) that just asked subjects to indicate an antecedent for subsequent-mention definite descriptions may have a chance of resulting in a standardized annotation. Even in this case, however, the agreement we observed was not very high, but better results may be obtained with more training.</Paragraph> <Paragraph position="34"> The possibility we are exploring is that these results might get better if annotators are given computer support in the form of a semiautomatic classifier--i.e., a system capable of suggesting to annotators a classification for definite descriptions, including possibly an indication of how reliable the classification might be. We briefly discuss below our progress in this direction so far.</Paragraph> <Paragraph position="35"> ous work (e.g., Fraurud \[1990\]) that a great number of definite descriptions in texts are discourse-new: in our second experiment we found an equal number of discourse-new and discourse-related definite descriptions, although many of the definite descriptions classified as discourse-new could be seen as associative in a loose sense. Interestingly, this suggests that each of the competing hypotheses about the licensing conditions for definite descriptions--the uniqueness and the familiarity theory accounts--accounts satisfactorily for about half of the data.</Paragraph> <Paragraph position="36"> Poesio and Vieira A Corpus-based Investigation of Definite Description Use Of the existing theories of definite descriptions, the one that comes closest to accounting for all of the uses of definite descriptions that we observed is L6bner's (1985). L6bner proposes that the defining property of definite descriptions, from a semantic point of view, is that they indicate that the head noun complex denotes a functional concept, i.e., a function which, according to L6bner, can take one, two, or three arguments. He argues that some head noun complexes denote such a function on purely lexical semantic grounds: this is the case, for example, of the head noun complexes in the father of Mr. Smith, the first man to sail to America and the ffact that life started on Earth; he calls these definite descriptions semantic definites. In other cases, such as the dog, the head noun by itself would not denote a function, but a sort: in these cases, according to L6bner, the use of a definite description is only felicitous if context indicates the function to be used. This latter class of pragmatic definites includes the best-known cases of familiar definites--anaphoric, immediate and visible situation, and larger situation--as well as some cases classified by Hawkins as unfamiliar and by Prince as containing inferrables. L6bner does not discuss the conditions under which a writer can assume that the reader can recognize that context creates a functional concept out of a sortal one, but his account could be supplemented by Clark and Marshall's theory of what may count as a basis for a mutual knowledge induction schema (Clark and Marshall 1981). TM 5.1.3 Consequences for Processing Theories. Given that first-mention definite descriptions are so numerous, and that recognizing them does not depend on common-sense knowledge alone, we conclude that any general theory of definite description interpretation should include methods for recognizing such definites. The architecture of our own classifier (see below) is also consistent with Fraurud's hypothesis that these methods are not just used when no suitable antecedent can be found, but more extensive investigations will be needed before we can conclude that this architecture significantly outperforms other ones.</Paragraph> <Paragraph position="37"> The presence of such a large number of discourse-new definite descriptions is also problematic for the idea that definite descriptions are interpreted with respect to the global focus (Grosz 1977; Grosz and Sidner 1986). A significant percentage of the larger situation definite descriptions encountered in our corpus cannot be said to be in the globai focus in any significant sense: as we observed above, in many of these cases the writer seems to rely on the reader's capability to add a new object such as the Illinois Commerce Commission to her or his model of the world, rather than expecting that object to be already present.</Paragraph> </Section> <Section position="5" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 5.2 A (Semi)Automatic Classifier </SectionTitle> <Paragraph position="0"> As already mentioned, we are in the course of implementing a system capable of performing the classification task semiautomatically (Vieira 1998). This system would help the human classifiers by suggesting possible classifications, and possible antecedents in the case of discourse-related definite descriptions.</Paragraph> <Paragraph position="1"> Our system implements the dual-processing strategy discussed above. On the one hand, it attempts to resolve anaphoric same head definite descriptions by maintaining a simple discourse model and searching back into this model to find all possible antecedents of a definite description (using special matching heuristics to deal with preand postmodification). On the other, it uses heuristics to identify unfamiliar and larger situation definite descriptions on the basis of syntactic information and very little lex18 L6bner's theory still does not account for generic uses of definite descriptions.</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 24, Number 2 ical information about nouns that take complements. The current order of application of the resolution and classification steps has been determined by empirical testing, and has been compared with that suggested by decision-tree learning techniques.</Paragraph> <Paragraph position="3"> We trained a version of the system on the corpus used for the first experiment, and then compared its classification of the corpus used for the second experiment with that of our three subjects. 19 We developed two versions of the system: one that only attempts to classify subsequent-mention and discourse-new definite descriptions (Vieira and Poesio 1997), and one that also attempts to classify bridging references (Poesio, Vieira, and Teufel 1997).</Paragraph> <Paragraph position="4"> The first version of the system finds a classification for 318 definite descriptions out of the 464 in our test data (the articles used in the second experiment). The agreement between the system and the three annotators on the two classes first-mention and subsequent-mention is K = 0.70 overall (K = 0.77 for the three annotators on the converted annotation), if all definite descriptions to which the system cannot assign a classification are treated as first-mention; the coefficient of agreement is K = 0.78 if we do not count the definite descriptions that the system cannot classify (K = 0.81 for the annotators on just those definite descriptions).</Paragraph> <Paragraph position="5"> The version of the system that also attempts to recognize bridging references has a worse performance, which is not surprising given the problems our subjects had in classifying bridging descriptions. This version of the system finds a classification for 355 descriptions out of 464, and its agreement with the three annotators is K = 0.63 if the cases that the system cannot classify are not counted (K = 0.70 for the three annotators on three categories with just these definites); K = 0.57 if we count the cases that the system does not classify as discourse-new (for 447 descriptions); and K = 0.63 again if we count the cases that the system does not classify as bridging (again, 447 descriptions).</Paragraph> </Section> <Section position="6" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> 5.3 Future Work </SectionTitle> <Paragraph position="0"> We collected plenty of data about definite descriptions that we are still in the process of analyzing. One issue we are studying at the moment is what to do with bridging references: how to classify them if at all, and how to process them. We also intend to study L6bner's hypothesis about the role played by the distinction between sortal and relational head nouns in determining the type of process involved in the resolution of a definite description, possibly by finding a way to ask our subjects to recognize these distinctions. We also plan to study the issue of generic definites.</Paragraph> <Paragraph position="1"> An obvious direction in which to extend this study is by looking at other kinds of anaphoric expressions such as pronouns and demonstratives. We are doing preliminary studies in this direction.</Paragraph> <Paragraph position="2"> Finally, we would like to emphasize that although this study is the most extensive investigation of definite description use in a corpus that we know of (we looked at a total of more than 1,400 definite descriptions in 33 texts, i.e., almost three times as many as in Fraurud's study), in practice we still got very little data on many of the uses of definite descriptions, so some caution is necessary in interpreting these results. The problem is that the kind of analysis we performed is extremely time consuming: it will be crucial in the future to find ways of performing this task that will allow us to analyze more data, possibly with the help of computer simulations.</Paragraph> <Paragraph position="3"> 19 As the two classification schemes were different, the comparison involved a conversion of the annotations produced in the second experiment into ones using the scheme used in the first experiment. Poesio and Vieira A Corpus-based Investigation of Definite Description Use Appendix A: Instructions to the Annotators (First Experiment) Classification of uses of &quot;the&quot;-phrases You will receive a set of texts to read and annotate. From the texts, the system will extract and present you &quot;the'-phrases and will ask you for a classification. You must choose one of the following classes: 1. ANAPHORIC (same noun): For anaphoric &quot;the'-phrases the text presents an antecedent noun phrase which has the same noun of the given &quot;the&quot;-phrase. The interpretation of the given &quot;the&quot;-phrase is based on this previous noun-phrase. 2. ASSOCIATIVE: For associative &quot;the'-phrases the text presents an antecedent noun phrase which has a different noun for the interpretation of the given &quot;the&quot;phrase. The antecedent for the &quot;the'-phrase in this case may a) allow an inference towards the interpretation of the &quot;the'-phrase, b) be a synonym, c) be an associate such as part-of, is-a, etc.</Paragraph> <Paragraph position="4"> d) a proper name 3. LARGER SITUATION/UNFAMILIAR: For larger situation use of &quot;the'-phrases you do not find an explicit antecedent in the text, because the reference is based on basic common knowledge: a) first occurrences of proper names (subsequent occurrences must be considered as anaphoric), b) reference to times, c) community common knowledge; d) proper names in premodifier position.</Paragraph> <Paragraph position="5"> Also for unfamiliar uses of &quot;the&quot;-phrases the text does not provide an antecedent. The &quot;the'-phrase refers to something new to the text. The help for the interpretation may be given together with the &quot;the'-phrase as in e) restrictive relative clauses (the ... that ...- RC in general) f) associative clauses (the ... of ...- PP in general) g) NP complements (the fact that .... the conclusion that... ) h) unexplanatory modifiers (the first .... the best ... ) i) appositive structures (James Dean, the actor) j) copulas (the actor is James Dean) 4. IDIOM: &quot;The&quot;-phrases can be used just as idiomatic expressions, indirect references or metaphorical uses.</Paragraph> <Paragraph position="6"> 5. DOUBT: When you are in doubt about the classification: a comment on your doubt is requested.</Paragraph> </Section> </Section> <Section position="5" start_page="1192" end_page="1192" type="metho"> <SectionTitle> PREFERENCE ORDER FOR THE CLASSIFICATION: In spite of the fact that definites </SectionTitle> <Paragraph position="0"> often fall in more than one class of use, the identification of a unique class is required.</Paragraph> <Paragraph position="1"> In order to make the choices uniform, priority is to be given to anaphoric situations.</Paragraph> <Paragraph position="2"> According to this ordering, cases like &quot;the White House&quot; or &quot;the government&quot; are anaphoric rather than larger situation, when it has already occurred once in the text.</Paragraph> <Paragraph position="3"> When a &quot;the&quot;-phrase seems to belong both to larger sit./unfamiliar and associative classes, preference is given to larger sit./unfamiliar.</Paragraph> <Section position="1" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> Examples </SectionTitle> <Paragraph position="0"> \[Examples from the corpus were given as in Section 3.\]</Paragraph> </Section> </Section> <Section position="6" start_page="1192" end_page="1192" type="metho"> <SectionTitle> 1.: ANAPHORIC </SectionTitle> <Paragraph position="0"> There is an antecedent in the text which has the same descriptive noun of the &quot;the'-phrase.</Paragraph> </Section> <Section position="7" start_page="1192" end_page="1192" type="metho"> <SectionTitle> 2.: ASSOCIATIVE </SectionTitle> <Paragraph position="0"> There is an antecedent in the text which has a different noun, but it is a synonym or associate to the description.</Paragraph> <Paragraph position="1"> Summary</Paragraph> </Section> <Section position="8" start_page="1192" end_page="1192" type="metho"> <SectionTitle> WHEN THE REFERENT FOR THE DESCRIPTION IS KNOWN OR NEW:(3,4) 3.: LARGER SIT./UNFAMILIAR </SectionTitle> <Paragraph position="0"> The &quot;the&quot;-phrase is novel in the text, unique identifiable, or based on common knowledge or is given with its referent</Paragraph> </Section> <Section position="9" start_page="1192" end_page="1192" type="metho"> <SectionTitle> 4.: IDIOM </SectionTitle> <Paragraph position="0"> The &quot;the&quot;-phrase is an idiomatic expression 1. (a) a house: the house 2. (a) something has changed: the change (b) a home: the house (c) a house: the door (d) Kadane Co.: the company 3. (a) the White House (first occurrence) (b) the third quarter (c) the nation (d) the Iran-Iraq war (e) the woman he likes (f) the door of the house (g) the fact that (h) the first, the best, the highest, the tallest ... (i) James Dean, the actor (j) the actor is James Dean 4. (a) back into the soup Appendix B: Instructions to the Subjects (Second Experiment)</Paragraph> <Section position="1" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> Text Annotation of Definite Descriptions </SectionTitle> <Paragraph position="0"> This material provides you with instructions, examples and some training for the text-annotation task. The task consists of reading newspaper articles and analyzing occurrences of DEFINITE DESCRIPTIONS, which are expressions starting with the deftnite article THE. We will call these expressions DDs or DD. DDs describe things, ideas or entities which are talked about in the text. The things, ideas or entities being described by DDs will be called ENTITIES. You should look at the text, carefully in order to indicate whether the ENTITY was mentioned before in the text and if so, to indicate where. You will receive a set of texts and their corresponding tables to l in. There are basically four cases to be considered: Poesio and Vieira A Corpus-based Investigation of Definite Description Use 1. Usually DDs pick up an entity introduced before in the text. For instance, in the sequence: &quot;Mrs. Park is saving to buy an apartment. The housewife is saving harder than ever.&quot; the ENTITY described by the DD &quot;the housewife&quot; was mentioned before as &quot;Mrs. Park&quot;.</Paragraph> <Paragraph position="1"> 2. If the ENTITY itself was not mentioned before but its interpretation is based on, dependent on, or related to some other idea or thing in the text, you should indicate it. For instance, in the sequence: &quot; The Parks wanted to buy an apartment but the price was very high.</Paragraph> <Paragraph position="2"> the ENTITY described by the DD the price is related to the idea expressed by an apartment in the text.</Paragraph> <Paragraph position="3"> 3. It may also be the case that the DD was not mentioned before and is not related to something in the text, but it refers to something which is part of the common knowledge of the writer and readers in general. (The texts to be analyzed are Wall Street Journal articles - location and time, for instance, are usually known to the general reader from sources which are outside the text). Example: &quot;During the past 15 years housing prices increased nearly fivefold'.</Paragraph> <Paragraph position="4"> here, the ENTITY described by the DD the past 15 years is known to the general reader of the Wall Street Journal and was not mentioned before in the text.</Paragraph> <Paragraph position="5"> 4. Or it may be the case that the DD is self-explanatory or it is given together with its own identification. In these cases it becomes clear to the general reader what is being talked about even without previous mention in the text or without previous common knowledge of it. For instance: &quot;The proposed legislation is aimed at rectifying some of the inequities in the current land-ownership system.&quot; the ENTITY described here is new in the text, and is not part of the knowledge of readers but the DD the inequities in the current land-ownership system is self-explanatory. null The texts will be presented to you in the following format: on the left, the text with its DDs in evidence; on the right, the keys (number of the sentence/number of DD) and the DD to be analyzed. The key is for internal control only, but it may help you to find DDs in the table you have to fill in.</Paragraph> <Paragraph position="6"> Text 0 1 Y. J. Park and her family scrimped for four years to buy a tiny apartment here, but found that the closer they got to saving the $40,000 they originally needed, the more the price rose.</Paragraph> </Section> </Section> <Section position="10" start_page="1192" end_page="1192" type="metho"> <SectionTitle> 3 Now the 33-year-old housewife, whose </SectionTitle> <Paragraph position="0"> husband earns a modest salary as an assistant professor of economics, is saving harder than ever.</Paragraph> <Paragraph position="1"> 9 During the past 15 years, the report showed, housing prices increased nearly fivefold. 22 The proposed legislation is aimed at rectifying some of the inequities in the current land-ownership system.</Paragraph> <Paragraph position="2"> (1/1) the price (3/2) the 33-year-old housewife (9/3) the past 15 years (22/4) the inequities in the current land-ownership system Each case (1 to 4, above) is to be indicated on the table according to the following (see examples in the table below): Whenever you find a previous mention in the text of the DD you should mark the column LINK: 1. Mark &quot;=&quot; if the ENTITY described was mentioned before. 2. Mark &quot;R&quot; if the ENTITY described is new but it is related/based/dependent on something mentioned before).</Paragraph> <Paragraph position="3"> In the case of both 1 and 2 you should provide the sentence number where the previous/related mention is and write down the previous/related mention of it (see example in the table below).</Paragraph> <Paragraph position="4"> If the entity was not previously mentioned in the text and it is not related to something mentioned before, then mark the column NO LINK: 3. Mark &quot;K&quot; if it is something of writer/readers' common knowledge. 4. Mark &quot;D&quot; if it is new in the text and the readers have no previous knowledge about it but the description is enough to make readers identify it. the 33-year-old housewife = 1/Y.J. Park the past 15 years K the inequities in the current land-ownership system -- D In case of doubt just leave the line in blank and comment at the back of the page using the key number to identify the DD you are commenting on.</Paragraph> <Section position="1" start_page="1192" end_page="1192" type="sub_section"> <SectionTitle> Examples </SectionTitle> <Paragraph position="0"> Next we present some examples and further explanation for each one of the four cases that are being considered.</Paragraph> <Paragraph position="1"> Case 1 - LINK (=) For case no. 1 you may find a previous mention that may be equal or different from the DD (for instance, the government - the government, a report - the report, and Poesio and Vieira A Corpus-based Investigation of Definite Description Use three bills - the proposed legislation in the examples below); distances from previous mentions and DDs may also vary.</Paragraph> <Paragraph position="2"> * Meanwhile, the government's Land Bureau reports that only about a third of Korean families own their own homes. Last week, the government took three bills to the National Assembly.</Paragraph> <Paragraph position="3"> * Last May, a government panel released a report on the extent and causes of the problem. During the past 15 years, the report showed, housing prices increased nearly fivefold.</Paragraph> <Paragraph position="4"> * Last week, the government took three bills to the National Assembly. The proposed legislation is aimed at rectifying some of the inequities in the current land-ownership system.</Paragraph> <Paragraph position="5"> Case 2 - LINK (R) Here are cases of DDs which are related to something that was present in the text. If you ask for the examples below, &quot;Which government, population, nation is that?&quot;,&quot;Which blame is that?&quot; the answer is given by something previously mentioned in the text (Koreans, and the increase of housing prices, respectively) 2o * For the Parks and millions of other young Koreans, the long-cherished dream of home ownership has become a cruel illusion. For the government, it has become a highly volatile political issue. In 1987, a quarter of the population owned 91% of the nation's 71,895 square kilometers of private land.</Paragraph> <Paragraph position="6"> * During the past 15 years, the report showed, housing prices increased nearly fivefold. The report laid the blame on speculators, who it said had pushed land prices up ninefold.</Paragraph> <Paragraph position="7"> Case 3 - NO LINK (K) These cases of DDs are based on the common reader's knowledge. The texts to be analyzed are Wall Street Journal articles - location and time, for instance, are usually known to the general reader from sources which are outside the text 21 * For example, officials at Walnut Creek office learned that the Amfac Hotel near the San Francisco airport, which is insured by Aetna, was badly damaged when they saw it on network television news.</Paragraph> <Paragraph position="8"> * Adjusters who had been working on the East Coast say the insurer will still be processing claims from that storm through December.</Paragraph> <Paragraph position="9"> Case 4 - NO LINK (D) These cases of DDs are self-explanatory or accompanied by their identification. For instance if you ask &quot;Which difficulty is that?&quot;, &quot;Which fact is that?&quot;, &quot;Which know-how is that?&quot; etc. for the examples below, the answer is given by the DD itself. In the last example the DD is accompanied by its explanation.</Paragraph> <Paragraph position="10"> * Because of the difficulty of assessing the damages caused by the earthquake, Aetna pulled together a team of its most experienced claims adjusters from around the country.</Paragraph> <Paragraph position="11"> 20 Note that DDs like the blame, the government, the population, which are case 2 in their first occurrence, are to be considered case 1 in possible posterior occurrences. 21 Note that a DD like &quot;the government&quot; may belong to case 2 as exemplified, but it may refer to the U.S.A. in another text, without any explicit mention of U.S.A. in the text, since it is the country where the newspaper is produced. In such a situation the DD &quot;the government&quot; belongs to case 3. It may also be the case that the entity is part of the readers' knowledge but was mentioned before, in this situation it belongs to case 1.</Paragraph> <Paragraph position="12"> Computational Linguistics Volume 24, Number 2 * They wonder whether he has the economic know-how to steer the city through a possible fiscal crisis.</Paragraph> <Paragraph position="13"> * Mr. Dinkins also has failed to allay Jewish voters' fears about his association with the Rev. Jesse Jackson, despite the fact that few local non-Jewish politicians have been as vocal for Jewish causes in the past 20 years as Mr. Dinkins has.</Paragraph> <Paragraph position="14"> * But racial gerrymandering is not the best way to accomplish that essential goal.</Paragraph> <Paragraph position="15"> * The first hybrid corn seeds produced using this mechanical approach were introduced in the 1930s and they yielded as much as 20% more corn than naturally pollinated plants.</Paragraph> <Paragraph position="16"> * The Citizens Coalition for Economic Justice, a public-interest group leading the charge for radical reform, wants restrictions on landholdings, high taxation of capital gains, and drastic revamping of the value-assessment system on which property taxes are based.</Paragraph> </Section> </Section> class="xml-element"></Paper>