File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1008_intro.xml
Size: 5,427 bytes
Last Modified: 2025-10-06 14:03:18
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1008"> <Title>Generating statistical language models from interpretation grammars in dialogue systems</Title> <Section position="3" start_page="0" end_page="57" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Ideally when building spoken dialogue systems, we would like to use a corpus of transcribed dialogues corresponding to the specific task of the dialogue system, in order to build a statistical language model (SLM). However, it is rarely the case that such a corpus exists in the early stage of the development of a dialogue system. Collecting such a corpus and transcribing it is very time-consuming and delays the building of the actual dialogue system.</Paragraph> <Paragraph position="1"> An approach taken both in dialogue systems and dictation applications is to first write an interpretation grammar and from that generate an artificial corpus which is used as training corpus for the SLM (Raux et al, 2003; Pakhomov et al, 2001; Fosler-Lussier & Kuo, 2001). These models obtained from grammars are not as good as the ones built from real data as the estimates are artificial, lacking a real distribution. However, it is a quick way to get a dialogue system working with an SLM. When the system is up and running it is possible to collect real data that can be used to improve the model. We will explore this idea by generating a corpus from an interpretation grammar from one of our applications.</Paragraph> <Paragraph position="2"> A different approach is to compile the interpretation grammar into a speech recognition grammar as the Gemini and REGULUS compilers do (Rayner et al, 2000; Rayner et al, 2003). In this way it is assured that the linguistic coverage of the speech recognition and interpretation are kept in sync. Such an approach enables us to interpret all that we can recognize and the other way round. In the European-funded project TALK the Grammatical Framework (Ranta, 2005) has been extended with such a facility that compiles GF grammars into speech recognition grammars in Nuance GSL format (www.nuance.com).</Paragraph> <Paragraph position="3"> Speech recognition for commercial dialogue systems has focused on grammar-based approaches despite the fact that statistical language models seem to have a better overall performance (Gorrell et al, 2002). This probably depends on the time-consuming work of collecting corpora for training SLMs compared with the more rapid and straightforward development of speech recognition grammars. However, SLMs are more robust, can handle out-of-coverage output, perform better in difficult conditions and seem to work bet- null ter for naive users (see (Knight et al, 2001)) while speech recognition grammars are limited in their coverage depending on how well grammar writers succeed in predicting what users may say (Huang et al, 2001).</Paragraph> <Paragraph position="4"> Nevertheless, as grammars only output phrases that can be interpreted their output makes the following interpretation task easier than with the unpredictable output from an SLM (especially if the speech recognition grammar has been compiled fromtheinterpretationgrammarandtheseareboth in sync). In addition, the grammar-based approach in the experiments reported in (Knight et al, 2001) outperforms the SLM approach on semantic error rate on in-coverage data. This has lead to the idea of trying to combine both approaches, as shown in (Rayner & Hockey, 2003). This is also something that we are aiming for.</Paragraph> <Paragraph position="5"> Domain adaptation of SLMs is another issue in dialogue system recognition which involves re-using a successful language model by adapting it to a new domain i.e. a new application (Janiszek et al, 1998). If a large corpus is not available for the specific domain but there is a corpus for a collection of topics we could use this corpus and adapt the resulting SLM to the domain. One may assume that the resulting SLM based on a large corpuswithagoodmixtureoftopicsshouldbeableto null capture at least a part of general language use that doesnotvaryfromonedomaintoanother. Wewill explore this idea by using the Gothenburg Spoken Language Corpus (GSLC) (Allwood, 1999) and a newspaper corpus to adapt these to our MP3 domain. null We will consider several different SLMs based on the corpus generated from the GF interpretation grammar and compare their recognition performance with the baseline: a Speech Recognition Grammar in Nuance format compiled from the same interpretation grammar. Hence, what we could expect from our experiment, by looking at earlier research, is very low word error rate for our speech recognition grammar on in-grammar coverage but a lot worse performance on out-of-grammar coverage. The SLMs we are considering should tackle out-of-grammar utterances better and it will be interesting to see how well these models built from the grammar will perform on in-grammar utterances.</Paragraph> <Paragraph position="6"> This study is organized as follows. Section 2 introduces the domain for which we are doing language modelling and the corpora we have at our disposal. Section 3 will describe the different SLMs we have generated. Section 4 describes the evaluation of these and the results. Finally, we review the main conclusions of the work and discuss future work.</Paragraph> </Section> class="xml-element"></Paper>