File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1818_intro.xml

Size: 2,515 bytes

Last Modified: 2025-10-06 14:01:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1818">
  <Title>Chinese Base-Phrases Chunking</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recognizing simple and non-recursive base phrases is an important subtask for many natural language processing applications, such as information retrieval. Gee and Grosjean (Gee and Grosjean, 1983) showed psychological evidence that chunks like base phrases play an important role in human language understanding. CoNLL-2000's shared task identified many kinds of English base phrases, which are syntactically related non-overlapping groups of words (Tjong and Buchholz, 2000). The shared task has significantly heightened the progress in the techniques of English partial parsing. For Chinese processing, Zhao (1998) put forward a definition of Chinese baseNP that is a combination of determinative modifier and head noun (Zhao, 1998). Based on that research, Zhao et al. (2000) extended the concept of baseNP to seven types of Chinese base phrases. These base phrases may consist of words or other base phrases, but its constituents, in turn, should not contain any base phrases.</Paragraph>
    <Paragraph position="1"> In this paper, we put forward the new definition of Chinese base phrases, which are simple and non-recursive, similar to the CoNLL-2000's shared task. The definition enables us to resolve most local ambiguities and is very useful for NLP tasks such as name entity recognition and information extraction.</Paragraph>
    <Paragraph position="2"> We construct a hybrid model to recognize nine types of Chinese base phrases. Many researches in Chinese partial parsing (Zhou, 1996; Zhao, 1998; Sun, 2001) have shown that statistical learning is of great use for Chinese chunking, especially for large corpus. However, the lack of morphological hints in Chinese makes it necessary to use semantic and syntactic information such as context free grammar rules in Chinese processing. In our approach, viewing chunking as a tagging problem by encoding the chunk structure in new tags attached to each word, we use Memory-Based Learning (MBL) method to set a tag indicating type and position in a base phrase on each word. After which grammar rules are used to disambiguate the tags.</Paragraph>
    <Paragraph position="3"> Our test with a corpus of about 2 MB showed that the experiment achieves 94.4% in precision and 92.5% in recall.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML