File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-4010_intro.xml

Size: 2,878 bytes

Last Modified: 2025-10-06 14:03:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-4010">
  <Title>Chinese Named Entity and Relation Identification System</Title>
  <Section position="3" start_page="0" end_page="37" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The investigation for Chinese information extraction is one of the topics of the project COLLATE (DFKI, 2002) dedicated to building up the German Competence Center for Language Technology. The presented work aims at investigating automatic identification of Chinese named entities (NEs) and their relations in a specific domain. Information Extraction (IE) is an innovative language technology for accurately acquiring crucial information from documents. NE recognition is a fundamental IE task, that detects some named constituents in sentences, for instance names of persons, places, organizations, dates, times, and so on. Based on NE recognition, the identification of Named Entity Relation (NER) can indicate the types of semantic relationships between identified NEs. e.g., relationships between person and employed organization; person and residing place; person and birthday; organization and seat, etc. The identified results for NEs and NERs can be provided as a resource for other application systems such as question-answering system. Therefore, these two IE tasks are selected as our investigation emphases.</Paragraph>
    <Paragraph position="1"> Chinese has a very different structure from western languages. For example, it has a large character set involving more than 48,000 characters; there is no space between words in written texts; and Chinese words have fewer inflections, etc. In the past twenty years there have been significant achievements in IE concerning western languages such as English. Comparing with that, the research on the relevant properties of Chinese for IE, especially for NER, is still insufficient.</Paragraph>
    <Paragraph position="2"> Our research focuses on domain-specific IE.</Paragraph>
    <Paragraph position="3"> We picked the sports domain, particularly, texts on soccer matches because the number and types of entities, relations and linguistic structures are representative for many applications.</Paragraph>
    <Paragraph position="4"> Based on the motivations above mentioned, our goals for the design and implementation of the prototype system called CHINERIS (Chinese Named Entity and Relation Identification System) are: * Establishing an IE computational model for Chinese web texts using hybrid technologies, which should to a great extent meet the requirements of IE for Chinese web texts; * Implementing a prototype system based on this IE computational model, which extracts information from Chinese web texts as accurately and quickly as possible; null * Evaluating the performance of this system in a specific domain.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML