File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1065_intro.xml
Size: 3,278 bytes
Last Modified: 2025-10-06 14:02:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1065"> <Title>FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Finite-state automata (FSA) methods proved to elegantly solve many difficult problems in the field of natural language processing. Among the most recent ones are full and lazy compilation of the search network for speech recognition (Mohri et al., 2000a), integrated speech translation (Vidal, 1997; Bangalore and Riccardi, 2000), speech summarization (Hori et al., 2003), language modelling (Allauzen et al., 2003) and parameter estimation through EM (Eisner, 2001) to mention only a few.</Paragraph> <Paragraph position="1"> From this list of different applications it is clear that there is a high demand for generic tools to create and manipulate FSAs.</Paragraph> <Paragraph position="2"> In the past, a number of toolkits have been published, all with different design principles. Here, we give a short overview of toolkits that offer an almost complete set of algorithms: + The FSM LibraryTM from AT&T (Mohri et al., 2000b) is judged the most efficient implementation, offers various semirings, on-demand computation and many algorithms, but is available only in binary form with a proprietary, non commercial license.</Paragraph> <Paragraph position="3"> + FSA6.1 from (van Noord, 2000) is implemented in Prolog. It is licensed under the terms of the (GPL, 1991).</Paragraph> <Paragraph position="4"> + The WFST toolkit from (Adant, 2000) is built on top of the Automaton Standard Template Library (LeMaout, 1998) and uses C++ template mechanisms for efficiency and flexibility, but lacks on-demand computation. Also licensed under the terms of the (GPL, 1991).</Paragraph> <Paragraph position="5"> This paper describes a highly efficient new implementation of a finite-state automata toolkit that uses on-demand computation. Currently, it is being used at the Lehrstuhl f&quot;ur Informatik VI, RWTH Aachen in different speech recognition and translation research applications. The toolkit will be available under an open source license (GPL, 1991) and can be obtained from our website http://www-i6.informatik.rwth-aachen.de.</Paragraph> <Paragraph position="6"> The remaining part of the paper is organized as follows: Section 2 will give a short introduction to the theory of finite-state automata to recall part of the terminology and notation. We will also give a short explanation of composition which we use as an exemplary object of study in the following sections. In Section 2.3 we will discuss the locality of algorithms defined on finite-state automata. This forms the basis for implementations using on-demand computations. Then the RWTH FSA toolkit implementation is detailed in Section 3. In Section 4.1 we will compare the efficiency of different toolkits. As a showcase for the flexibility we show how to use the toolkit to build a statistical machine translation system in Section 4.2. We conclude the paper with a short summary in Section 5 and discuss some possible future extensions in Section 6.</Paragraph> </Section> class="xml-element"></Paper>