File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/w00-1207_abstr.xml
Size: 774 bytes
Last Modified: 2025-10-06 13:41:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1207"> <Title>Statistically-Enhanced New Word Identification in a Rule-Based Chinese System</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> jiangz@ microsoft.tom Abstract </SectionTitle> <Paragraph position="0"> This paper presents a mechanism of new word identification in Chinese text where probabilities are used to filter candidate character strings and to assign POS to the selected strings in a ruled-based system. This mechanism avoids the sparse data problem of pure statistical approaches and the over-generation problem of rule-based approaches. It improves parser coverage and provides a tool for the lexical acquisition of new words.</Paragraph> </Section> class="xml-element"></Paper>