File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0401_intro.xml

Size: 6,008 bytes

Last Modified: 2025-10-06 14:03:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0401">
  <Title>A Novel Machine Learning Approach for the Identification of Named Entity Relations</Title>
  <Section position="4" start_page="1" end_page="2" type="intro">
    <SectionTitle>
2 Definition of Relations
</SectionTitle>
    <Paragraph position="0"> An NER may be a modifying / modified, dominating / dominated, combination, collocation or even cross-sentence constituent relationship between NEs. Considering the distribution of different kinds of NERs, we define 14 different NERs based on six identified NEs in the sports domain shown in Table 1.</Paragraph>
    <Paragraph position="1">  In order to further indicate the positions of NEs in an NER, we define a general frame for the above NERs and give the following example using this description: Definition 1 (General Frame of NERs):  Ke Chang Yi 3Bi 0 Ji Bai Yan Zhou Tai Yang Shen Dui . The Guangdong Hongyuan Team defeated the Guangzhou Taiyangshen Team by 3: 0 in the guest field. In the sentence we observe that there exist two NERs. According to the general frame, the first NER description is HT_VT( Yan Zhou Tai Yang Shen Dui (Guangzhou Taiyangshen Team), 1-1-2; Yan Dong Hong Yuan Dui (Guangdong Hongyuan Team), 1-1-1) and the other is WT_LT( Yan Dong Hong Yuan Dui (Guangdong  The underlining of Chinese words means that an NE consists of these words. Hongyuan Team), 1-1-1; Yan Zhou Tai Yang Shen (Guangzhou Taiyangshen Team), 1-1-2).</Paragraph>
    <Paragraph position="2"> In this example, two NERs represent dominating / dominated and collocation relationships separately: namely, the first relation HT_VT gives the collocation relationship for the NE &amp;quot;Guangdong Hongyuan Team&amp;quot; and the noun &amp;quot;guest field&amp;quot;. This implies that &amp;quot;Guangdong Hongyuan Team&amp;quot; is a guest team. Adversely, &amp;quot;Guangzhou Taiyangshen Team&amp;quot; is a host team; the second relation WT_LT indicates dominating / dominated relationship between &amp;quot;Guangdong Hongyuan Team&amp;quot; and &amp;quot;Guangzhou Taiyangshen Team&amp;quot; by the verb &amp;quot;defeat&amp;quot;. Therefore, &amp;quot;Guangdong Hongyuan Team&amp;quot; and &amp;quot;Guangzhou Taiyangshen Team&amp;quot; are the winning and losing team, respectively.</Paragraph>
    <Paragraph position="3">  other occasions.</Paragraph>
    <Paragraph position="4"> HT_VT The home and visiting teams in a sports competition. WT_LT The winning and losing team name in a sports match. DT_DT The names of two teams which draw a match. TM_CP A team participates in a sports competition. TM_CPC It indicates where a sports team comes from. ID_TM The position of a person employed by a sports team. CP_DA The staged date for a sports competition.</Paragraph>
    <Paragraph position="5"> CP_TI The staged time for a sports competition.</Paragraph>
    <Paragraph position="6"> CP_LOC It gives the location where a sports match is held. LOC_ CPC The location ownership (LOC belongs to CPC). 3 Positive and Negative Case-Based</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
Learning
</SectionTitle>
      <Paragraph position="0"> The positive and negative case-based learning (PNCBL) belongs to supervised statistical learning methods (Nilsson, 1996). Actually, it is a variant of memory-based learning (Stanfill and Waltz, 1986; Daelemans, 1995; Daelemans et al., 2000). Unlike memory-based learning, PNCBL does not simply store cases in memory but transforms case forms into NER and non-NER patterns. Additionally, it stores not only positive cases, but also negative ones. Here, it should be clarified that the negative case we mean is a case in which two or more NEs do not stand in any relationships with each other, i.e, they bear non-relationships which are also investigated objects in which we are interested.</Paragraph>
      <Paragraph position="1"> During the learning, depending on the average similarity of features and the self-similarity of NERs (also non-NERs), the system automatically selects general or individual-character features (GCFs or ICFs) to construct a feature set. It also determines different feature weights and identification thresholds for different NERs or non-NERs. Thus, the learning results provide an identification references for the forthcoming NER identification.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Relation Features
</SectionTitle>
      <Paragraph position="0"> Relation features, by which we can effectively identify different NERs, are defined for capturing critical information of the Chinese language. According to the features, we can define NER / non- null NER patterns. The following essential factors motivate our definition for relation features: * The relation features should be selected from multiple linguistic levels, i.e., morphology, grammar and semantics (Cardie, 1996); * They can help us to identify NERs using positive and negative case-based machine learning as their information do not only deal with NERs but also with non-NERs; and * They should embody the crucial information of Chinese language processing (Dang et al., 2002), such as word order, the context of words, and particles etc.</Paragraph>
      <Paragraph position="1"> There are a total of 13 relation features shown in Table 2, which are empirically defined according to the above motivations. It should be explained that in order to distinguish feature names from element names of the NER / non-NER patterns, we add a capital letter &amp;quot;F&amp;quot; in the ending of feature names. In addition, a sentence group in the following definitions can contain one or multiple sentences. In other words, a sentence group must end with a stop, semicolon, colon, exclamation mark, or question mark.</Paragraph>
      <Paragraph position="2">  The named entities of a relevant relation are located in the same sentence or different sentences.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML