File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1022_intro.xml
Size: 3,828 bytes
Last Modified: 2025-10-06 14:01:12
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1022"> <Title>Practical Issues in Compiling Typed Unification Grammars for Speech Recognition</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Language models to constrain speech recognition are a crucial component of interactive spoken language systems. The more varied the language that must be recognized, the more critical good language modeling becomes. Research in language modeling has heavily favored statistical approaches (Cohen 1995, Ward 1995, Hu et al. 1996, Iyer and Ostendorf 1997, Bellegarda 1999, Stolcke and Shriberg 1996) while hand-coded finite-state or context-free language models dominate the commercial sector (Nuance 2001, SpeechWorks 2001, TellMe 2001, BeVocal 2001, HeyAnita 2001, W3C 2001). The difference revolves around the availability of data. Research systems can achieve impressive performance using statistical language models trained on large amounts of domain-targeted data, but for many domains sufficient data is not available. Data may be unavailable because the domain has not been explored before, the relevant data may be confidential, or the system may be designed to do new functions for which there is no human-human analog interaction. The statistical approach is unworkable in such cases for both the commercial developers and for some research systems (Moore et al. 1997, Rayner et al. 2000, Lemon et al.</Paragraph> <Paragraph position="1"> 2001, Gauthron and Colineau 1999). Even in cases for which there is no impediment to collecting data, the expense and time required to collect a corpus can be prohibitive. The existence of the ATIS database (Dahl et al. 1994) is no doubt a factor in the popularity of the travel domain among the research community for exactly this reason.</Paragraph> <Paragraph position="2"> A major problem with grammar-based finite-state or context-free language models is that they can be tedious to build and difficult to maintain, as they can become quite large very quickly as the scope of the grammar increases. One way to address this problem is to write the grammar in a more expressive formalism and generate an approximation of this grammar in the format needed by the recognizer. This approach has been used in several systems, CommandTalk (Moore et al. 1997), RIALIST PSA simulator (Rayner et al. 2000), WITAS (Lemon et al.</Paragraph> <Paragraph position="3"> 2001), and SETHIVoice (Gauthron and Colineau 1999). While theoretically straight-forward, this approach is more demanding in practice, as each of the compilation stages contains the potential for a combinatorial explosion that will exceed memory and time bounds. There is also no guarantee that the resulting language model will lead to accurate and efficient speech recognition.</Paragraph> <Paragraph position="4"> We will be interested in this paper in sound approximations (Pereira and Wright 1991) in which the language accepted by the approximation is a superset of language accepted by the original grammar. While we conceed that alternative techniques that are not sound (Black 1989, (Johnson 1998, Rayner and Carter 1996) may still be useful for many purposes, we prefer sound approximations because there is no chance that the correct hypothesis will be eliminated. Thus, further processing techniques (for instance, N-best search) will still have an opportunity to find the optimal solution.</Paragraph> <Paragraph position="5"> We will describe and evaluate two compilation approaches to approximating a typed unification grammar with a context-free grammar. We will also describe and evaluate additional techniques to reduce the size and structural ambiguity of the language model.</Paragraph> </Section> class="xml-element"></Paper>