File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1029_intro.xml

Size: 1,999 bytes

Last Modified: 2025-10-06 14:06:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1029">
  <Title>Compilation of Weighted Finite-State Transducers from Decision Trees</Title>
  <Section position="2" start_page="0" end_page="215" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Much attention has been devoted recently to methods for inferring linguistic models from data.</Paragraph>
    <Paragraph position="1"> One powerful inference method that has been used in various applications are decision trees, and in particular classification and regression trees (Breiman et al., 1984).</Paragraph>
    <Paragraph position="2"> An increasing amount of attention has also been focussed on finite-state methods for implementing linguistic models, in particular finite-state transducers and weighted finite-state transducers; see (Kaplan and Kay, 1994; Pereira et al., 1994, inter alia). The reason for the renewed interest in finite-state mechanisms is clear. Finite-state machines provide a mathematically well-understood computational framework for representing a wide variety of information, both in NLP and speech processing. Lexicons, phonological rules, Hidden Markov Models, and (regular) grammars are all representable as finite-state machines, and finite-state operations such as union, intersection and composition mean that information from these various sources can be combined in useful  and computationally attractive ways. The reader is referred to the above-cited papers (among others) for more extensive justification.</Paragraph>
    <Paragraph position="3"> This paper reports on a marriage of these two strands of research in the form of an algorithm for compiling the information in decision trees into weighted finite-state transducers. 1 Given this algorithm, information inferred from data and represented in a tree can be used directly in a system that represents other information, such as lexicons or grammars, in the form of finite-state machines.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML