XML Viewer - p01-1041

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1041_intro.xml

Size: 1,297 bytes

Last Modified: 2025-10-06 14:01:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1041">
  <Title>Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Methodology
</SectionTitle>
    <Paragraph position="0"> Our RG+DT system (Fig. 1) generates a recognition rule from each NE in the training data. Then, the rule is refined by decision tree learning. By applying the refined recognition rules to a new document, we get NE candidates. Then, non-overlapping candidates are selected by a kind of longest match method.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Generation of recognition rules
</SectionTitle>
      <Paragraph position="0"> In our method, each tokenized NE is converted to a recognition rule that is essentially a sequence of part-of-speech (POS) tags in the NE. For instance, OO-SAKA-GIN-KOU (= Osaka Bank) is tokenized into two words: OO-SAKA:all-</Paragraph>
      <Paragraph position="2"> where location-name and common-noun are POS tags. In this case, we get the following recognition rule. Here, '*' matches anything.</Paragraph>
      <Paragraph position="3"> *:*:location-name, *:*:common-noun</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>

Download Original XML