File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/p03-2039_abstr.xml

Size: 801 bytes

Last Modified: 2025-10-06 13:43:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2039">
  <Title>Chinese Unknown Word Identification Using Character-based Tagging and Chunking</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML