File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1041_intro.xml
Size: 1,297 bytes
Last Modified: 2025-10-06 14:01:14
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1041"> <Title>Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Methodology </SectionTitle> <Paragraph position="0"> Our RG+DT system (Fig. 1) generates a recognition rule from each NE in the training data. Then, the rule is refined by decision tree learning. By applying the refined recognition rules to a new document, we get NE candidates. Then, non-overlapping candidates are selected by a kind of longest match method.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Generation of recognition rules </SectionTitle> <Paragraph position="0"> In our method, each tokenized NE is converted to a recognition rule that is essentially a sequence of part-of-speech (POS) tags in the NE. For instance, OO-SAKA-GIN-KOU (= Osaka Bank) is tokenized into two words: OO-SAKA:all-</Paragraph> <Paragraph position="2"> where location-name and common-noun are POS tags. In this case, we get the following recognition rule. Here, '*' matches anything.</Paragraph> <Paragraph position="3"> *:*:location-name, *:*:common-noun</Paragraph> </Section> </Section> class="xml-element"></Paper>