File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0401_metho.xml
Size: 21,197 bytes
Last Modified: 2025-10-06 14:09:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0401"> <Title>A Novel Machine Learning Approach for the Identification of Named Entity Relations</Title> <Section position="5" start_page="2" end_page="2" type="metho"> <SectionTitle> NEOF The order of the named entities of a relevant relation. NEVPF </SectionTitle> <Paragraph position="0"> The relative position between the verbs and the named entities of a relevant relation. The verbs of a relevant relation mean that they occur in a sentence where the relation is embedded.</Paragraph> </Section> <Section position="6" start_page="2" end_page="2" type="metho"> <SectionTitle> NECF </SectionTitle> <Paragraph position="0"> The context of named entities. The context only embodies a word or a character preceding or following the current named entity.</Paragraph> </Section> <Section position="7" start_page="2" end_page="2" type="metho"> <SectionTitle> VSPF </SectionTitle> <Paragraph position="0"> The verbs are located in the same sentence or different sentences in which there is a relevant relation.</Paragraph> </Section> <Section position="8" start_page="2" end_page="2" type="metho"> <SectionTitle> NEPPOF </SectionTitle> <Paragraph position="0"> The relative order between parts-of-speech of particles and named entities. The particles occur within the sentences where the relation is embedded.</Paragraph> </Section> <Section position="9" start_page="2" end_page="2" type="metho"> <SectionTitle> NEPF </SectionTitle> <Paragraph position="0"> The parts-of-speech of the named entities of a relevant relation.</Paragraph> </Section> <Section position="10" start_page="2" end_page="2" type="metho"> <SectionTitle> NECPF </SectionTitle> <Paragraph position="0"> The parts-of-speech of the context for the named entities associated with a relation.</Paragraph> </Section> <Section position="11" start_page="2" end_page="2" type="metho"> <SectionTitle> SPF </SectionTitle> <Paragraph position="0"> The sequence of parts-of-speech for all sentence constituents within a relation range.</Paragraph> </Section> <Section position="12" start_page="2" end_page="2" type="metho"> <SectionTitle> VVF </SectionTitle> <Paragraph position="0"> The valence expression of verbs in the sentence(s) where there is a relation embedded.</Paragraph> </Section> <Section position="13" start_page="2" end_page="2" type="metho"> <SectionTitle> NECTF </SectionTitle> <Paragraph position="0"> The concepts of the named entities of a relevant relation from HowNet (Dong and Dong, 2000).</Paragraph> </Section> <Section position="14" start_page="2" end_page="3" type="metho"> <SectionTitle> VCTF </SectionTitle> <Paragraph position="0"> The concepts of the verbs of a relevant relation from HowNet.</Paragraph> <Paragraph position="1"> Table 2. Feature Category In 13 features, three features (NECF, NECPF and NEPF) belong to morphological features, three features (NEOF, SPF and SGTF) are grammatical features, four features (NEPPOF, NESPF, NEVPF and VSPF) are associated with not only morphology but also grammar, and three features (NECTF, VCTF and VVF) are semantic features.</Paragraph> <Paragraph position="2"> Every feature describes one or more properties of a relation. Through the feature similarity calculation, the quantitative similarity for two relations can be obtained, so that we can further determine whether a candidate relation is a real relation. Therefore, the feature definition plays an important role for the relation identification. For instance, NECF can capture the noun Ke Chang (the guest field, it means that the guest team attends a competition in the host team's residence.) and also determine that the closest NE by this noun is Yan Dong Hong Yuan Dui (the Guangdong Hongyuan Team). On the other hand, NEOF can fix the sequence of two relationrelated NEs. Thus, another NE Yan Zhou Tai Yang Shen Dui (the Guangzhou Taiyangshen Team) is determined.</Paragraph> <Paragraph position="3"> Therefore, these two features reflect the properties of the relation HT_VT.</Paragraph> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.2 Relation and Non-Relation Patterns </SectionTitle> <Paragraph position="0"> A relation pattern describes the relationships between an NER and its features. In other words, it depicts the linguistic environment in which NERs exist.</Paragraph> <Paragraph position="1"> Definition 2 (Relation Pattern): A relation pattern (RP) is defined as a 14-tuple: RP = (NO, RE, SC, SGT, NE, NEC, VERB, PAR, NEP, NECP, SP, VV, NECT, VCT) where NO represents the number of a RP; RE is a finite set of relation expressions; SC is a finite set for the words in the sentence group except for the words related to named entities; SGT is a sentence group type; NE is a finite set for named entities in the sentence group; NEC is a finite set that embodies the context of named entities; VERB is a finite set that includes the sequence numbers of verbs and corresponding verbs; NEP is a finite set of named entities and their POS tags; NECP is a finite set which contains the POS tags of the context for named entities; SP is a finite set in which there are the sequence numbers as well as corresponding POS tags and named entity numbers in a sentence group; VV is a finite set comprehending the posi- null tion of verbs in a sentence and its valence constraints from Lexical Sports Ontology which is developed by us; NECT is a finite set that has the concepts of named entities in a sentence group; and VCT is a finite set which gives the concepts of verbs in a sentence group.</Paragraph> <Paragraph position="2"> According to the news from Xinhua News Agency Beijing on March 26th: National Football Tournament (the First B League) today held five competitions of the second round, The Guangdong Hongyuan Team defeats the Guangzhou Taiyangshen Team by 3: 0 in the guest field, becoming the only team to win both matches, and temporarily occupying the first place of the entire competition.</Paragraph> <Paragraph position="4"> (2, Yue ), (3, 26 ), (4, Ri )}), ..., (NE2-2, 26, TN, {(1, Yan Zhou ), (2, Tai Yang Shen ), (3, Dui )})};</Paragraph> <Paragraph position="6"> Analogous to the definition of the relation pattern, a non-relation pattern is defined as follows: Definition 3 (Non-Relation Pattern): A non-relation pattern (NRP) is also defined as a 14-tuple: NRP = (NO, NRE, SC, SGT, NE, NEC, VERB, PAR, NEP, NECP, SP, VV, NECT, VCT), where NRE is a finite set of non-relation expressions which specify the nonexistent relations in a sentence group. The definitions of the other elements are the same as the ones in the relation pattern. For example, if we build an NRP for the above sentence group in Example 2, the NRE is listed in the following:</Paragraph> <Paragraph position="8"> In this sentence group, the named entity (CT) Quan Guo Zu Qiu Jia BLian Sai (National Football Tournament (the First B League)) does not bear the relation CP_LOC to the named entity (LN) Bei Jing (Beijing).</Paragraph> <Paragraph position="9"> This LN only indicates the release location of the news from Xinhua News Agency.</Paragraph> <Paragraph position="10"> As supporting means, the non-NER patterns also play an important role, because in the NER pattern library we collect sentence groups in which the NER exists. If a sentence group only includes non-NERs, obviously, it is excluded from the NER pattern library. Thus the impact of positive cases cannot replace the impact of negative cases. With the help of non-NER patterns, we can remove misidentified non-NERs and enhance the precision of NER identification.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.3 Similarity Calculation </SectionTitle> <Paragraph position="0"> In the learning, the similarity calculation is a kernel measure for feature selection.</Paragraph> <Paragraph position="1"> Definition 4 (Self-Similarity): The self-similarity of a kind of NERs or non-NERs in the corresponding library can be used to measure the concentrative degree of this kind of relations or non-relations. The value of the self-similarity is between 0 and 1.</Paragraph> <Paragraph position="2"> If the self-similarity value of a kind of relation or non-relation is close to 1, we can say that the concentrative degree of this kind of relation or non-relation is very &quot;tight&quot;. Conversely, the concentrative degree of that is very &quot;loose&quot;.</Paragraph> <Paragraph position="3"> The calculation of the self-similarity for the same kind of NERs is equal to the calculation for the average similarity of the corresponding relation features. Suppose R(i) is a defined NER in the NER set (1 [?] i [?] 14). The average similarity for this kind of NERs is defined as follows:</Paragraph> <Paragraph position="5"> ) denotes the relation similarity between the same kind of relations, R(i)</Paragraph> <Paragraph position="7"> . 1 [?] j, k [?] m, j [?] k; m is the total number of the relation R(i) in the NER pattern library. The</Paragraph> <Paragraph position="9"> ) is the sum of calculated relation pair number. They can be calculated using the following formulas:</Paragraph> <Paragraph position="11"> is a feature in the feature set (1 [?] t [?] 13). Sum f is the total number of features. The calculation formulas of Sim (R(i)</Paragraph> <Paragraph position="13"> ) is shown as follows: Notice that the similarity calculation for non-NERs is the same as the above calculations. Before describing the learning algorithm, we want to define some fundamental conceptions related to the algorithm as follows: Definition 5 (General-Character Feature): If the average similarity value of a feature in a relation is greater than or equal to the self-similarity of this relation, it is called a General-Character Feature (GCF). This feature reflects a common characteristic of this kind of relation.</Paragraph> <Paragraph position="14"> Definition 6 (Individual-Character Feature): An Individual-Character Feature (ICF) means its average similarity value in a relation is less than or equal to the self-similarity of this relation. This feature depicts an individual property of this kind of relation.</Paragraph> <Paragraph position="15"> Definition 7 (Feature Weight): The weight of a selected feature (GCF or ICF) denotes the important degree of the feature in GCF or ICF set. It is used for the similarity calculation of relations or non-relations during relation identification.</Paragraph> <Paragraph position="17"> . 1 [?] j, k [?] m, j [?] k; m is the total number of the relation R(i) in the NER pattern library. Sum</Paragraph> <Paragraph position="19"> ) is the sum of calculated relation pair numbers, which can be calculated by the formula (3).</Paragraph> <Paragraph position="20"> Definition 8 (Identification Threshold): If a candidate relation is regarded as a relation in the relation pattern library, the identification threshold of this relation indicates the minimal similarity value between them. It is calculated by the average of the sum of average similarity values for selected features: null</Paragraph> <Paragraph position="22"> where n is the size of selected features, 1 [?] t [?] n.</Paragraph> <Paragraph position="23"> Finally, the PNCBL algorithm is described as follows: 1) Input annotated texts; 2) Transform XML format of texts into internal data format; 3) Build NER and non-NER patterns; 4) Store both types of patterns in hash tables and construct indexes for them; 5 5) Compute the average similarity for features and self-similarity for NERs and non-NERs; 6) Select GCFs and ICFs for NERs and non-NERs respectively; 7) Calculate weights for selected features; 8) Decide identification thresholds for every NER and non-NER; 9) Store the above learning results.</Paragraph> </Section> </Section> <Section position="15" start_page="3" end_page="6" type="metho"> <SectionTitle> 4 Relation Identification </SectionTitle> <Paragraph position="0"> Our approach to NER identification is based on PNCBL, it can utilize the outcome of learning for further identifying NERs and removing non-NERs.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.1 Optimal Identification Tradeoff </SectionTitle> <Paragraph position="0"> During the NER identification, the GCFs of NER candidates match those of all of the same kind of NERs in the NER pattern library. Likewise, the ICFs of NER candidates compare to those of non-NERs in the non-NER pattern library. The computing formulas in this procedure are listed as follows:</Paragraph> <Paragraph position="2"> in the non-NER pattern library. 1 [?] j1 [?] Sum (R(i)) and 1 [?] j2 [?] Sum (NR(i)). Sum (R(i)) and Sum (NR(i)) are the total number of R(i) in the NER pattern library and that of NR(i) in non-NER pattern library respectively. w</Paragraph> <Paragraph position="4"> ) mean the weight of the k1-th GCF for the</Paragraph> <Paragraph position="6"> and that of the k2-th ICF for the non-NER</Paragraph> <Paragraph position="8"> separately. null In matching results, we find that sometimes the similarity values of a number of NERs or non-NERs matched with NER candidates are all more than the identification threshold. Thus, we have to utilize a voting method to achieve an identification tradeoff in our approach. For an optimal tradeoff, we consider the final identification performance in two aspects: i.e., recall and precision. In order to enhance recall, as many correct NERs should be captured as possible; on the other hand, in order to increase precision, misidentified non-NERs should be removed as accurately as possible.</Paragraph> <Paragraph position="9"> The voting refers to the similarity calculation results between an NER candidate and NER / non-NER patterns. It pays special attention to circumstances in which both results are very close. If this happens, it exploits multiple calculation results to measure and arrive at a final decision. Additionally, notice that the impact of non-NER patterns is to restrict possible misidentified non-NERs. On the other hand, the voting assigns different thresholds to different NER candidates (e.g. HT_VT, WT_LT, and DT_DT or other NERs). Because the former three NERs have the same kind of NEs, the identification for these NERs is more difficult than for others. Thus, when voting, the corresponding threshold should be set more strictly.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.2 Resolving NER Conflicts </SectionTitle> <Paragraph position="0"> In fact, although the voting is able to use similarity computing results for yielding an optimal tradeoff, there still remain some problems to be resolved.</Paragraph> <Paragraph position="1"> The relation conflict is one of the problems, which means that contradictory NERs occur in identification results. For example: (i) The same kind of relations with different argument position: e.g., the relations HT_VT, HT_VT(ne1, no1; ne2, no2) and HT_VT(ne2, no2; ne1, no1) occur in an identification result at the same time.</Paragraph> <Paragraph position="2"> (ii) The different kinds of relations with same or different argument positions: e.g., the relations WT_LT and DT_DT, WT_LT(ne1, no1; ne2, no2) and DT_DT(ne1, no1; ne2, no2) appear simultaneously in an identification result.</Paragraph> <Paragraph position="3"> The reason for a relation conflict lies in the simultaneous and successful matching of a pair of NER candidates whose NEs are the same kind.</Paragraph> <Paragraph position="4"> They do not compare and distinguish themselves further. Considering the impact of NER and non-NER patterns, we organize the conditions to remove one of the relations, which has lower average similarity value with NER patterns or higher average similarity value with non-NER patterns.</Paragraph> </Section> <Section position="3" start_page="3" end_page="6" type="sub_section"> <SectionTitle> 4.3 Inferring Missing NERs </SectionTitle> <Paragraph position="0"> Due to a variety of reasons, some relations that should appear in an identification result may be missing. However, we can utilize some of the identified NERs to infer them. Of course, the prerequisite of the inference is that we suppose identified NERs are correct and non-contradictory. For all identified NERs, we should first examine whether they contain missing NERs. After determining the type of missing NERs, we may infer them - containing the relation name and its arguments. For instance, in an identification result, two NERs are: PS_ID (ne1, no1; ne2, no2) and PS_TM (ne1, no1; ne3, no3) In the above NER expressions, ne1 is a personal name, ne2 is a personal identity, and ne3 is a team name, because if a person occupies a position, i.e., he / she has a corresponding identity in a sports team, that means the position or identity belongs to this sports team. Accordingly, we can infer the following NER: ID_TM (ne2, no2; ne3, no3)</Paragraph> </Section> </Section> <Section position="16" start_page="6" end_page="7" type="metho"> <SectionTitle> 5 Experimental Results and Evaluation </SectionTitle> <Paragraph position="0"> The main resources used for learning and identification are NER and non-NER patterns. Before learning, the texts from the Jie Fang Daily in 2001 were annotated based on the NE identification. During learning, both pattern libraries are established in terms of the annotated texts and Lexical Sports Ontology. They have 142 (534 NERs) and 98 (572 non-NERs) sentence groups, respectively. To test the performance of our approach, we randomly choose 32 sentence groups from the Jie Fang Daily in 2002, which embody 117 different NER candidates.</Paragraph> <Paragraph position="1"> For evaluating the effects of negative cases, we made two experiments. Table 3 shows the average and total average recall, precision, and F-measure for the identification of 14 NERs only by positive case-based learning. Table 4 demonstrates those by PNCBL. Comparing the experimental results, among 14 NERs, the F-measure values of the seven NERs (PS_ID, ID_TM, CP_TI, WT_LT, PS_CP, CP_DA, and DT_DT) in Table 4 are higher than those of corresponding NERs in Table 3; the F-measure values of three NERs (LOC_CPC, TM_CP, and PS_CP) have no variation; but the F-measure values of other four NERs (PS_TM, This is a local newspaper in Shanghai, China. CP_LOC, TM_CPC, and HT_VT) in Table 4 are lower than those of corresponding NERs in Table 3. This shows the performances for half of NERs are improved due to the adoption of both positive and negative cases. Moreover, the total average F-measure is enhanced from 63.61% to 70.46% as a whole.</Paragraph> <Paragraph position="2"> for 14 NERs by PNCBL Finally, we have to acknowledge that it is difficult to compare the performance of our method to others because the experimental conditions and corpus domains of other NER identification efforts are quite different from ours. Nevertheless, we would like to use the performance of Chinese NER identification using memory-based learning (MBL) (Zhang and Zhou, 2000) for a comparison with our approach in Table 5. In the table, we select similar NERs in our domain to correspond to the three types of the relations (employee-of, product-of, and location-of). From the table we can deduce that the identification performance of relations for PNCBL is roughly comparable to that of the MBL.</Paragraph> </Section> <Section position="17" start_page="7" end_page="7" type="metho"> <SectionTitle> (PNCBL&I vs. MBL&I) 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we propose a novel machine learning and identification approach PNCBL&I. This approach exhibits the following advantages: (i) The defined negative cases are used to improve the NER identification performance as compared to only using positive cases; (ii) All of the tasks, building of NER and non-NER patterns, feature selection, feature weighting and identification threshold determination, are automatically completed. It is able to adapt the variation of NER and non-NER pattern library; (iii) The information provided by the relation features deals with multiple linguistic levels, depicts both NER and non-NER patterns, as well as satisfies the requirement of Chinese language processing; (iv) Self-similarity is a reasonable measure for the concentrative degree of the same kind of NERs or non-NERs, which can be used to select general-character and individual-character features for NERs and non-NERs respectively; (v) The strategies used for achieving an optimal NER identification tradeoff, resolving NER conflicts, and inferring missing NERs can further improve the performance for NER identification; (vi) It can be applied to sentence groups containing multiple sentences. Thus identified NERs are allowed to cross sentences boundaries.</Paragraph> <Paragraph position="1"> The experimental results have shown that the method is appropriate and effective for improving the identification performance of NERs in Chinese.</Paragraph> </Section> class="xml-element"></Paper>