File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/w99-0209_concl.xml

Size: 3,482 bytes

Last Modified: 2025-10-06 13:58:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0209">
  <Title>Orthographic Co-Reference Resolution Between Proper Nouns Through the Calculation of the Relation of &amp;quot;Replicancia&amp;quot;</Title>
  <Section position="7" start_page="65" end_page="65" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> The algorithm conceived does not calculate the co-reference, but the replicancia between proper names instead. Replicancia and co-reference do not coincide neither extensionally nor intensionally. As a consequence, referents of nouns linked by the replicancia relation are not bound by an identity relation, among other things because replicancia is not an equivalence relation, meanwhile coreference it is. The result is that one class of replicancia may contain names which refer to different entities and, however, not contain names which refer to the entity associated to the class.</Paragraph>
    <Paragraph position="1"> Despite what it has been said, as the calculation of replicancia is much more simple than the calculation of coreferences, it is interesting, and even convenient, to calculate replicancia between nouns in limited contexts.</Paragraph>
    <Paragraph position="2"> Table I-1 in Annex I shows the results of the experiment. Under the line showing the total figures, we have added three lines with the average, median and mean deviation of the data obtained with each of the 100 documents analyzed. We can notice that the median take values between eight and nine points above average, which means that in most documents the co-reference between proper nouns is successfully decided. The negative burden comes from the variance, close to 25% of the recall and precision total values, which, along with the difference between average and median, forces us to think that most of mistakes are concentrated in some documents, in which precision and recall can be much smaller than expected.</Paragraph>
    <Paragraph position="3"> In Figure 5.1 we have represented the histogram of the group of values obtained for F measure, values included in the right column in table I-1. It can be noted that more than half of the documents subject to evaluation obtain an F-measure over 0.9, and more than two thirds are over 0.8.</Paragraph>
    <Paragraph position="4"> The statistical analysis of the system's quality measures is not enough to guarantee the representativity of the sample. It is also necessary to analyze its composition. The 100 documents of the sample were obtained aleatorily (random choice) during January 1998. The five sources used are different but all of them are relevant newspapers. Moreover, we have selected news under two different sections: domestic and international. The diversity of the documents which form the sample and the homogeneity of the measures obtained strengthen the hypothesis which states the results validity.</Paragraph>
    <Paragraph position="5">  measure The last five linles of table I-1, Annex I, register the total results distributed according to the documents origin.</Paragraph>
    <Paragraph position="6"> The mistakes detected are mainly due to two causes: the presence of co-referents which are not replicantes and the presence of replicantes which are not co-referents, as sometimes it occurs with initials. Other errors, not attributable to the algorithm, are due to faults in the automatic system we have used for the extraction of proper nouns.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML