File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/j96-3004_concl.xml

Size: 2,292 bytes

Last Modified: 2025-10-06 13:57:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="J96-3004">
  <Title>A Stochastic Finite-State Word-Segmentation Algorithm for Chinese</Title>
  <Section position="8" start_page="400" end_page="401" type="concl">
    <SectionTitle>
7. Conclusions
</SectionTitle>
    <Paragraph position="0"> Despite these limitations, a purely finite-state approach to Chinese word segmentation enjoys a number of strong advantages. The model we use provides a simple framework in which to incorporate a wide variety of lexical information in a uniform way. The use of weighted transducers in particular has the attractive property that the model, as it stands, can be straightforwardly interfaced to other modules of a larger speech or natural language system: presumably one does not want to segment Chinese text for its own sake but instead with a larger purpose in mind. As described in Sproat (1995), the Chinese segmenter presented here fits directly into the context of a broader finite-state model of text analysis for speech synthesis. Furthermore, by inverting the transducer so that it maps from phonemic transcriptions to hanzi sequences, one can apply the segmenter to other problems, such as speech recognition (Pereira, Riley, and Sproat 1994). Since the transducers are built from human-readable descriptions using a lexical toolkit (Sproat 1995), the system is easily maintained and extended.</Paragraph>
    <Paragraph position="1"> While size of the resulting transducers may seem daunting--the segmenter described here, as it is used in the Bell Labs Mandarin TTS system has about 32,000 states and 209,000 arcs--recent work on minimization of weighted machines and transducers (cf.</Paragraph>
    <Paragraph position="2"> 21 In Chinese, numerals and demonstratives cannot modify nouns directly, and must be accompanied by a classifier. The particular classifier used depends upon the noun.</Paragraph>
    <Paragraph position="3">  Computational Linguistics Volume 22, Number 3 Mohri \[1995\]) shows promise for improving this situation. The model described here thus demonstrates great potential for use in widespread applications. This flexibility, along with the simplicity of implementation and expansion, makes this framework an attractive base for continued research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML