File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0701_concl.xml

Size: 2,205 bytes

Last Modified: 2025-10-06 13:54:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0701">
  <Title>Overlap Features</Title>
  <Section position="6" start_page="7" end_page="7" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> The problem of cross-document person name disambiguation is of growing concern in many areas of natural language processing. We have presented a two-step methodology for the disambiguation of automatically extracted concept-instance pairs.</Paragraph>
    <Paragraph position="1"> Our approach first applies a Maximum Entropy model to all concept-instance pairs that share the same instance name. The output probabilities of this model are then inputted to a modified agglomerative clustering algorithm that partitions the pairs according to the individuals to which they refer.</Paragraph>
    <Paragraph position="2"> This algorithm not only allows for a dynamically set number of referents, but also, outperforms two baseline methods.</Paragraph>
    <Paragraph position="3"> A clear example of the success of this algorithm can be seen in the output of the system for the instance set for Michael Jackson (Appendix A, Table 2). Here, a name that refers to many individuals is fairly well partitioned into appropriate clusters.</Paragraph>
    <Paragraph position="4"> With the instance set for Sonny Bono (Appendix A, Table 1), however, we can see why this task is so challenging. Here, although, Sonny Bono only refers to one individual, the system finds (like many of the rest of us) that the likelihood of a singer also being a politician is so low that the name must refer to two different people. While this assumption is often true (as is the case with Paul Simon), we would have hoped that information from our web and fame features would have overridden the system's bias in this circumstance.</Paragraph>
    <Paragraph position="5"> In future work we will examine how other features may be useful in attacking such hard cases.</Paragraph>
    <Paragraph position="6"> Also, we will examine how this technique can be applied more generally to problems that exist between non-identical, but similar names (e.g. Bill Clinton vs. William Jefferson Clinton).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML