File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2167_metho.xml

Size: 5,840 bytes

Last Modified: 2025-10-06 14:07:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2167">
  <Title>Japanese Named Entity Extraction Evaluation - Analysis of Results -</Title>
  <Section position="5" start_page="0" end_page="1106" type="metho">
    <SectionTitle>
2 IREX NE
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Task
</SectionTitle>
      <Paragraph position="0"> Named Entity extraction involves finding Named Entities, such as names of organizations, persons, locations, and artifimts, time expressions, and numeric expressions, such as money and percentage expressions. It is one of the hasic techniques used in IR and IE. At the ewfluation, participants were asked to identit\[y NE expressions as correctly as possible. In order to avoid a copyright I)robleIn, we made a tool to convert a tagged text to a set of tag off'set information and wc only exchanged tag ott;et intbrlnation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="1106" type="sub_section">
      <SectionTitle>
2.2 Definition
</SectionTitle>
      <Paragraph position="0"> The definition of NE's is given in an 18-page document, which is available through the II1EX homepage (IREX Homepage, 1999). There are 8 kinds of NE's shown in 'lhfl)le 1. In order to avoid requiring a unique decision ~br ambiguous cases where even a lnnnan could not tag unambiguously, we introduced a tag</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1106" end_page="1107" type="metho">
    <SectionTitle>
NE Examl)le
ORGANIZATION The Diet, 1REX Commit;tee
PERSON Sekine, \Y=akanohana
LOCATION Japan, '\]bkyo, Mt.Fuji
ARTIFACT Pentiuln II, Nobel Prize
</SectionTitle>
    <Paragraph position="0"> DATE March 5, 1.965; Yesterday TIME 11 PM, nfidnight MONEY 100 yen, $12,345 PERCENT 1.0%, a half ~i~ble 1: NE Classes  within the OPTIONAL tag, it; is just ignored for th.e scoring. The defilfition was created 1)ased on the MUC/MET definition; however, the process of lnaking the definition was not easy. In particulaI', the definition of the newly introduced NE tyl)e &amp;quot;artifact&amp;quot; was Colitroversial. W'e admit that more consideration is needed to make a clem'er definition of the NE typos.</Paragraph>
    <Paragraph position="1"> Comparing the NE task in Japanese to that in English, one of the ditIiculties comes from the fact that there is no word delinfiter in Japanese. Sysl;elns have to identity the })oundaries of expressions. This will 1)ecome complicated when we want to tag a sul)string of what ix generally considered a ,Japanes(~ wor(t, l/or (~xaml)le , il.t .Jal)allese there is a word &amp;quot;Ratnich+-&amp;quot; which means &amp;quot;Visil; 3apa.n&amp;quot; and consists of two Chinese eh.aracters, &amp;quot;Ra+-&amp;quot; (Visit;)and &amp;quot;Nichi&amp;quot; (abbreviation of .Japan). Although mmly word segreenters identif~y it as a single, word, we expect to extrtmt only &amp;quot;Nichi&amp;quot; as a local;ion. '\]'his is a tricky prol)lem, as opposed to the ease in English where a word is the unit of NE candidates.</Paragraph>
    <Section position="1" start_page="1106" end_page="1106" type="sub_section">
      <SectionTitle>
2+3 Runs and Data
</SectionTitle>
      <Paragraph position="0"> There were three kinds of NE exercises, the dry run, a restricted (hmlMn tbrmal rtm, and a general domain tbl'mal 1'1111, which will be explained later. Also we created three kinds of training (h~ta: the dry run trailfing data, the CI{.L_NE data and the formal run domain restricted trail&gt; ing data. Td)le 2 shows the size of each data set.</Paragraph>
      <Paragraph position="1"> Note that CRL_NE (lata l)elongs to the Colllnlttnication ll.esearch Laboratory (CI{L), but it is ronces ill the generM dolnaill evMuation and 2.1% in the restricted domain e, valuation (the t.ypes of the evaluation will be explained later).</Paragraph>
      <Paragraph position="2"> ineht(ted ill the tat)le, because the data was created by IREX participants, using the definition of II{EX-NE, +rod distributed through I\]{,EX.</Paragraph>
      <Paragraph position="3">  ~n or(let to ensure the, fairness of the exercise in the formal \]'un~ we used newspaper articles which no one had ew~r seen. We, set the date to fl'eeze the system development (April 13, 1999).</Paragraph>
      <Paragraph position="4"> The date for the evahtation was set one month after that (lat;e (May 13 to \]7, 1999) so that we could select the test m'ticles fl'om the 1)cried t)etween those dates. \Y=e thank the Mainichi Newspaper CorI)oration for provi(ling this data for us t\]:ee of charge.</Paragraph>
    </Section>
    <Section position="2" start_page="1106" end_page="1107" type="sub_section">
      <SectionTitle>
2.4 Restricted domain
</SectionTitle>
      <Paragraph position="0"> in the fbrmal run, in order to study system portability and the effect of domains on NE perfoilllanc(',, we had two kinds of evaluation: rest;rioted domain and general domMn. In the general domain ewthtation, w(, selected articles regardless of dolnain. The domain of the restricted domain evaluation was a.lmouneed one month before the develolmmnt freeze date. It; was an &amp;quot;arrest;&amp;quot; domain defined as follows and 211 the articles in the restricted domain are selected based on the definition.</Paragraph>
      <Paragraph position="1"> 77re articles arc 'related to an e'ucnt &amp;quot;,,frost&amp;quot;. The event is defined as th, c a'r'rc.st of a .suspect o1' s'~t,5'pects by police, National \])olicc, State police of other police forces including the o'ncs of foreign countries. It includes articles mentionirtg an arrest event in the past. It: excludes articles which have only i'n:formation about requesting an arrest warrant, art accusation or sending the pape'rs pc'training to a case to an Attorney's OJJicc.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML