File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-2025_intro.xml
Size: 2,823 bytes
Last Modified: 2025-10-06 14:03:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2025"> <Title>Unsupervised Discrimination and Labeling of Ambiguous Names</Title> <Section position="3" start_page="0" end_page="145" type="intro"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> A number of previous approaches to name discrimination have employed ideas related to context vectors. (Bagga and Baldwin, 1998) proposed a method using the vector space model to disambiguate references to a person, place, or event across multiple documents. Their approach starts by using the CAMP system to nd related references within a single document. For example, it might determine that he and the President refers to Bill Clinton. CAMP creates co-reference chains for each entity in a single document, which are then extracted and represented in the vector space model. This model is used to nd the similarity among referents, and thereby identify the same referent that occurs in multiple documents.</Paragraph> <Paragraph position="1"> (Mann and Yarowsky, 2003) take an approach to name discrimination that incorporates information from the World Wide Web. They propose to use various contextual characteristics that are typically found near and within an ambiguous proper-noun for the purpose of disambiguation. They utilize categorical features (e.g., age, date of birth), familial relationships (e.g., wife, son, daughter) and associations that the entity frequently shows (e.g. country, company, organization). Such biographical information about the entities to be disambiguated is mined from the Web using a bootstrapping method.</Paragraph> <Paragraph position="2"> The Web pages containing the ambiguous name are assigned a vector depending upon the extracted features and then these vectors are grouped using agglomerative clustering.</Paragraph> <Paragraph position="3"> (Pantel and Ravichandran, 2004) have proposed an algorithm for labeling semantic classes, which can be viewed as a form of cluster. For example, a semantic class may be formed by the words: grapes, mango, pineapple, orange and peach. Ideally this cluster would be labeled as the semantic class of fruit. Each word of the semantic class is represented by a feature vector. Each feature consists of syntactic patterns (like verb-object) in which the word occurs. The similarity between a few features from each cluster is found using point-wise mutual information (PMI) and their average is used to group and rank the clusters to form a grammatical template or signature for the class. Then syntactic relationships such as Noun like Noun or Noun such as Noun are searched for in the templates to give the cluster an appropriate name label. The output is in the form of a ranked list of concept names for each semantic class.</Paragraph> </Section> class="xml-element"></Paper>