File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-1207_abstr.xml

Size: 774 bytes

Last Modified: 2025-10-06 13:41:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1207">
  <Title>Statistically-Enhanced New Word Identification in a Rule-Based Chinese System</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
jiangz@ microsoft.tom
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents a mechanism of new word identification in Chinese text where probabilities are used to filter candidate character strings and to assign POS to the selected strings in a ruled-based system. This mechanism avoids the sparse data problem of pure statistical approaches and the over-generation problem of rule-based approaches. It improves parser coverage and provides a tool for the lexical acquisition of new words.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML