File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2025_intro.xml
Size: 2,567 bytes
Last Modified: 2025-10-06 14:01:43
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2025"> <Title>Decision List NE Learning HMM NE Learning Concept-based Seeds</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> ORGANIZATION (ORG), and LOCATION </SectionTitle> <Paragraph position="0"> (LOC). [MUC-7 1998] There is considerable research on NE tagging using supervised machine learning [e.g. Bikel et al. 1997; Borthwick 1998]. To overcome the knowledge bottleneck of supervised learning, unsupervised machine learning has been applied to NE. [Cucchiarelli & Velardi 2001] discussed boosting the performance of an existing NE tagger by unsupervised learning based on parsing structures. [Cucerzan & Yarowsky 1999], [Collins & Singer 1999] and [Kim et al. 2002] presented various techniques using co-training schemes for NE extraction seeded by a small list of proper names or hand-crafted NE rules. NE tagging has two tasks: (i) NE chunking; (ii) NE classification. Parsingsupported unsupervised NE learning systems including ours only need to focus on NE classification, assuming the NE chunks have been constructed by the parser.</Paragraph> <Paragraph position="1"> This paper presents a new bootstrapping approach using successive learning and concept-based seeds. The successive learning is as follows. First, parsing-based NE rules are learned with high precision but limited recall. Then, these rules are applied to a large raw corpus to automatically generate a tagged corpus. Finally, a high-performance HMM-based NE tagger is trained using this corpus. null Unlike co-training, our bootstrapping does not involve iterative learning between the two learners, hence it suffers little from error propagation which is commonly associated with iterative learning.</Paragraph> <Paragraph position="2"> To derive the parsing-based learner, the system only requires a few common noun or pronoun seeds that correspond to the concept for the targeted NE, e.g. he/she/man/woman for PERSON NE. Such concept-based seeds share grammatical structures with the corresponding NEs, hence a parser is utilized to support bootstrapping. Since pronouns and common nouns occur more often than NE instances, the parsing-based NE rules can be learned in one iteration to avoid iterative learning. null The benchmarking shows that this system approaches the performance of supervised NE taggers for two of the three proper name NE types in MUC, namely, PER NE and LOC NE. This approach also supports tagging user-defined NE types.</Paragraph> </Section> class="xml-element"></Paper>