File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2102_metho.xml

Size: 18,315 bytes

Last Modified: 2025-10-06 14:07:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2102">
  <Title>Named Entity Chunking Techniques in Supervised Learning for Japanese Named Entity Recognition</Title>
  <Section position="5" start_page="1510" end_page="1510" type="metho">
    <SectionTitle>
2 Japanese Named Entity
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
Recognition
2.1 Task of the IREX Workshop
</SectionTitle>
      <Paragraph position="0"> The task of named entity recognition of the IREX workshop is to recognize eight named entity types in Table 1 (IREX Conmfittee, 1999).</Paragraph>
      <Paragraph position="1"> The organizer of the IREX workshop provided 1,174 newspaper articles which include 18,677 named entities as tire training data. In the formal run (general domain) of the workshop, the participating systems were requested to recognize 1,510 nanmd entities included in the held-out 71 newspaper articles.</Paragraph>
    </Section>
    <Section position="2" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
2.2 Segmentation Boundaries of
Morphemes and Named Entities
</SectionTitle>
      <Paragraph position="0"> In the work presented here, we compare the segmentation boundaries of named entities in tire IREX workshop's training corpus with those of supervised learning technique mainly because it is easy to implement and quite straightibrward to extend a supervised lem'ning version to a milfimally supervised version (Collins and Singer, 1999; Cucerzan and Yarowsky, 1999). We also reported in (Utsuro and Sassano, 2000) the experimental results of a minimally supervised version of Japanese named entity recognition.</Paragraph>
      <Paragraph position="1">  morphemes which were obtained through morphological analysis by a Japanese morphological attalyzer BREAKFAST (Sassano et al., 1997). 2 Detailed statistics of the comparison are provided in 'Fable 2. Nearly half of the named entities have bmmdary mismatches against the morI)hemes and also almost 90% of the named entities with boundary mismatches can be tiecomposed into more than one morpheme. Fig-ure 1 shows some examples of such cases, a</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1510" end_page="1510" type="metho">
    <SectionTitle>
3 Chunking and Tagging Named
Entities
</SectionTitle>
    <Paragraph position="0"> In this section, we formalize the problem of named entity chunking in Japanese named entity recognition. We describe ~t novel technique as well as those proposed in the previous works on nan ted entity recognition. The novel technique incorporates richer contextual information as well as p~tterns of constituent morphemes within ~ named entity, compared with the techniques proposed in previous research on named entity recognition and base noun phrase chunking.</Paragraph>
    <Section position="1" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
3.1 Task Definition
</SectionTitle>
      <Paragraph position="0"> First, we will provide out&amp;quot; definition of the task of Japanese named entity chunking. Suppose '~The set of part-of-speech tags of lllU.~AKFAST consists of about 300 tags. mmAKFaST achieves 99.6% part-of-speech accuracy against newspaper articles.</Paragraph>
      <Paragraph position="1"> aIn most cases of the &amp;quot;other boundary mismatch&amp;quot; in Table 2, one or more named entities have to be recognized as a part of a correctly analyzed morpheme and those cases are not caused by errors of morphological analysis. One frequent example of this type is a Japanese verbal noun &amp;quot;hou-bei (visiting United States)&amp;quot; which consists of two characters &amp;quot;hou (visitin.q)&amp;quot; and &amp;quot;bet (United States)&amp;quot;, where &amp;quot;bet (United States)&amp;quot; has to be recognized as &lt;LOCATION&gt;. \Ve believe that 1)ouudary mismatches of this type can be easily solved by employink a supervised learning technique such as the decision list learning method.</Paragraph>
      <Paragraph position="2">  that a sequen('e of morl)hemes is given as 1)elow: null Left; l{,ight ( Context ) (Named Entity) ( Context; )</Paragraph>
      <Paragraph position="4"> Then, given tht~t the current t)osition is at the morpheme M .N1': the task of tanned elltity ehllllkillg is to assign a, C\]luukillg state (to })e described in Section 3.2) as well ~rs a nmned entity type to the morl)helne Mi NE at tim current position, considering the patterns of surrounding morl)hemes. Note that in the SUl)ervised learning phase we can use the (:lmnking iuibnnation on which morphemes constitute a ngune(l entity, and whi(-h morphemes are in the lefl;/right contexts of tit(; named entity.</Paragraph>
    </Section>
    <Section position="2" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
3.2 Encoding Schemes of Named
Entity Chunking States
</SectionTitle>
      <Paragraph position="0"> In this t)at)er, we evalu~te the following two s('hemes of encoding ctmnking states of nalned entities. EXalnples of these encoding s(:hemes are shown in Table 3.</Paragraph>
      <Paragraph position="1">  The Inside/Outside scheme of encoding chunking states of base noun phrases was studied in Ibmlshaw and Marcus (1995). This scheme distinguishes the tbllowing three states: 0 the word at the current position is outside any base holm phrase. I the word at the current position is inside some base holm phrase. B the word at the current position marks the beginning of ~ base noml t)hrase that immediately fop lows another base noun phrase. We extend this scheme to named entity chunking by further distinguishing each of the states I and B into eight named entity types. 4 Thus, this scheme distinguishes 2 x 8 + 1 = 17 states.</Paragraph>
      <Paragraph position="2">  The Start/End scheme of encoding clmnking states of nmned entities was employed in Sekine e,t al. (1998) and Borthwick (1999). This scheme distinguishes the, following four states for each named entity type: S the lllOlTt)\]lellle at the (:urreld; position nmrks the l)eginldng of a lUl.in(xt (;lltity consisting of more than one mor1)\]mme. C l;he lnOrl)heme ~I; the cm'r(mt t)osi tion marks the middle of a mmmd entity (:onsisting of more tlmn one lilOrt)hellle. E -- the illOf t)heme, at the current position ram:ks the ending of a n~mmd entity consisting of more than one morl)heme. U - the morpheme at the current t)osition is a named entity consisting of only one, mort)heine. The scheme ;dso considers one ad(litional state for the position outside any named entity: 0 t;he mort)heine at the current position is outside any named entity. Thus, in our setting, this scheme distinguishes 4 x 8 + 1 = 33 states.</Paragraph>
    </Section>
    <Section position="3" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
a.3 Preceding/Subsequent Morphemes
as Contextual Clues
</SectionTitle>
      <Paragraph position="0"> In this l)aper, we ewfluate the following two lllodels of considering preceding/subsequent 4\Ve allow the, state :c_B for a named entity tyt)e x only when the, morl)hcme at t, he current 1)osition marks the 1)egimdng of a named entity of the type a&amp;quot; that immediately follows a nmned entity of the same type x.</Paragraph>
      <Paragraph position="1">  morphemes as contextual clues to named entity clmnking/tagging. Here we provide a basic outline of these models, and the details of how to incorporate them into the decision list learning framework will be described in Section 4.2.2.</Paragraph>
      <Paragraph position="2"> 3.a.1 3-gram Model In this paper, we refer to the model used in Sekine et al. (1998) and Borthwick (1999) as a 3-gram model. Suppose that the current position is at the morpheme M0, as illustrated below. Then, when assigning a chunking state as well as a named entity type to the morpheme M0, the 3-gram model considers the preceding single morpheme M-1 as well as the subsequent single morpheme M1 as the contextual clue.</Paragraph>
    </Section>
    <Section position="4" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
Left Current Right
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> The major disadvantage of the 3-gram model is that in the training phase it does not take into account whether or not the l)receding/subsequent morphemes constitute one named entity together with the mort)heine at the current position.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="1510" end_page="1510" type="metho">
    <SectionTitle>
a.a.2 Variable Length Model
</SectionTitle>
    <Paragraph position="0"> In order to overcome this disadvantage of the 3-gram model, we propose a novel model, namely the &amp;quot;Variable Length Model&amp;quot;, which incorporates richer contextual intbrmation as well as patterns of constituent morl)hemes within a named entity. In principle, as part of the training phase this model considers which of the preceding/subsequent morphenms constitute one named entity together with the morpheme at the current position. It also considers several morphemes in the lefl;/right contexts of the named entity. Here we restrict this model to explicitly considering the cases of named entities of the length up to three morphenms and only implicitly considering those longer than three morphemes. We also restrict it to considering two morphemes in both left and right contexts of the named entity.</Paragraph>
    <Section position="1" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
4.1 Decision List Learning
</SectionTitle>
      <Paragraph position="0"> A decision list (Rivest, 1987; Yarowsky, 1994) is a sorted list of decision rules, each of which decides the wflue of a decision D given some evidence E. Each decision rule in a decision list is sorted in descending order with respect to some preference value, and rules with higher preference values are applied first when applying the decision list to some new test; data.</Paragraph>
      <Paragraph position="1"> First, the random variable D representing a decision w, ries over several possible values, and the random w~riable E representing some evidence varies over '1' and '0' (where '1' denotes the presence of the corresponding piece of evidence, '0' its absence). Then, given some training data in which the correct value of the decision D is annotated to each instance, the conditional probabilities P(D = x I E = 1) of observing the decision D = x under the condition of the presence of the evidence E (E = 1) are calculated and the decision list is constructed by the tbllowing procedure.</Paragraph>
      <Paragraph position="2"> 1. For each piece of evidence, we calculate the Iw of likelihood ratio of the largest; conditional probability of the decision D = :rl (given the presence of that piece of evidence) to the second largest conditional probability of the decision D =x2:</Paragraph>
      <Paragraph position="4"> Then~ a decision list is constructed with pieces of evidence sorted in descending order with respect to their log of likelihood ratios, where the decision of the rule at each line is D = xl with the largest conditional probability) '~Yarowsky (1994) discusses several techniques for avoiding the problems which arise when an observed count is 0. lq-om among those techniques, we employ tlm simplest ram, i.e., adding a small constant c~ (0.1 &lt; &lt; 0.25) to the numerator and denominator. With this inodification, more frcquent evidence is preferred when several evidence candidates exist with the same  2. The final line of a decision list; ix defined as % default', where the log of likelihood ratio is calculated D&lt;)m the ratio of the largest; marginal l)robability of the decision D = x t to the second largest marginal l)rol)at)ility of the decision D =x2:</Paragraph>
      <Paragraph position="6"> ity.</Paragraph>
    </Section>
    <Section position="2" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
4.2 Decision List Learning for
Chunking/Tagging Named Entities
4.2.1 Decision
</SectionTitle>
      <Paragraph position="0"> For each of the two schemes of enco(li1~g chunking states of nalned entities descrit)ed in Section 3.2, as the l)ossible values of the &lt;teeision D, we consider exactly the same categories of chunking states as those described in Section 3.2.</Paragraph>
      <Paragraph position="1">  The evidence E used in the decision list learning is a combination of the tbatures of preceding/subsequent inorphemes as well as the morpheme at; the current position. The following describes how to form the evidence E fi)r 1)oth the a-gram nlodel and varial)le length model.</Paragraph>
      <Paragraph position="2"> 3-,gram Model The evidence E ret)resents a tut)le (F-l, F0, F1 ), where F-1 and F1 denote the features of immediately t)receding/subsequent morphemes M_~ and M1, respectively, F0 the featm:e of the morpheme 54o at the current position (see Fonnuta (1) in Section 3.3.1). The definition of the possible values of those tbatures F_l, F0, and 1'~ are given below, where Mi denotes the roof 1)\]mnm itself (i.e., including its lexicM tbrm as well as part-of-sl)eech), C,i the character type (i.e., JaI)anese (hiragana or katakana), Chinese (kanji), numbers, English alphabets, symbols, and all possible combinations of these) of Mi, Ti the part-of-st)eech of Mi: F_ 1 ::m_ \]~//--1 I (C-1, T-l) I T-t Inull mlsmoothed conditional probability P(D = x \[ E = 1). Yarowsky's training Mgoritl,m also ditfcrs somewhat in his use of the ratio *'(~D=,d*~-j)' which is equivalent in the case of binary classifications, and also by the interpolation between the global probalfilities (used here) and tl,e residual prol)abilities further conditional on higher-ranked patterns failing to match in the list.</Paragraph>
      <Paragraph position="4"> As the evidence E, we consider each possible coml)ination of the values of those three f'ealures. null Variable Length Model The evidence E rel&gt;resents a tuple (FL,FNu, FIt), where FL and Fl~ denote the features of the morphemes ML_2ML1 and Mff'M~ ~ in the left/right contexts of the current named entity, respectively, FNE the features of the morphemes MN~&amp;quot;&amp;quot; &amp;quot; MNE &amp;quot; &amp;quot;&amp;quot; MNEm(_&lt;3) constituting the current named entity (see Formula (2) in Section 3.3.2). The definition of the possible values of those features 1 L, FNI,:, and FI~ arc given below, where F NI~ denotes the feature of the j-th constituent morpheme M .NJ~ within the current nalne(1 entity, and a k/l NI~ is the morl)heme at the cm'ren~ i)osition:  As the evidence E, we consider each possit)le (:oml)ination of the wfiues of those three fbatures, except that the tbllowing three restrictions are applied.</Paragraph>
      <Paragraph position="5"> 1. In the cases where the current named entity consists of up to three mort)heroes , as the possible values of the feature FNIi in the definition (3), we consider only those which are consistent with the requirement that each nlort)heme M NE is a constituent of the cun'ent named entity. For exainple, suppose that the cun'ent named entity consists of three morphemes, where the current position is at the middle of those constituent morphemes as below:  Then, as the possible values of the feature FN\],;, we consider only the tbllowing ibm': rN. ::= \[ F.g U.g  2. II1 the cases where the eurrellt ilalned entity consists of more than three morphemes, only the three constituent morphemes are regarded as within the current named entity and the rest are treated as if they were outside the named entity. For exampie, suppose that the current named entity consists of four morphemes as below:  Iit this case, the fourth constitnent morpheme M N1c is treated as if it were in the right context of the current named entity  3. As the evidence E, among the possible  combination of the values of three t'eatures /~,, ENId, and F/t, we only accept those in which the positions of the morphemes are continuous, and reject those discontimmus combinations. For example, in the case of Formula (4:) above, as the evidence E, we accel)t the combination (Mq, M 'My , ull), while we r( iect (ML1, M~EM~ 1':, 1,ull).</Paragraph>
    </Section>
    <Section position="3" start_page="1510" end_page="1510" type="sub_section">
      <SectionTitle>
4.3 Procedures for Training and
Testing
</SectionTitle>
      <Paragraph position="0"> Next we will briefly describe the entire processes of learning the decision list tbr etmnking/tagging named entities as well as applying it to chunking/tagging unseen named entities.</Paragraph>
      <Paragraph position="1">  In the training phase, at the positions where the corresponding morpheme is a constitnent of a named entity, as described in Section 4.2, each allowable combination of features is considered as the evidence E. On the other hand, at the positions where the corresponding morpheme is outside any named entity, the way the combination of t~at;ures is considered is diflbrent in the variable length model, in that the exception \]. in the previous section is no longer applied. Theretbre, all the possible wflues of the feature FNB in Definition (3) are accepted. Finally, the frequency of each decision D and evidence E is counted and the decision list is learned as described in Section 4.1.</Paragraph>
      <Paragraph position="2">  When applying the decision list to chunking/tagging nnseen named entities, first, at each morpheme position, the combination of features is considered as in the case of the non-entity position in the training phase. Then, the decision list is consulted and all the decisions of the rules with a log of likelihood ratio above a certain threshold are recorded. Finally, as in the case of previous research (Sekine et al., 1998; Berthwick, 1999), the most appropriate sequence of the decisions that are consistent throughout the whole sequence is searched for. By consistency of the decisions, we mean requirements such as that the decision representing the beginning of some named entity type has to be followed by that representing the middle of the same entity type (in the case of Start/End encoding). Also, in our case, the appropriateness of the sequence of the decisions is measured by the stun of the log of likelihood ratios of 1;t1(; corresponding decision rules.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML