File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-3008_metho.xml

Size: 8,586 bytes

Last Modified: 2025-10-06 14:09:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3008">
  <Title>ID Rec Features Classifier Error rates Classification Task In domain Out Av</Title>
  <Section position="4" start_page="29" end_page="30" type="metho">
    <SectionTitle>
3 Grammar-based speech understanding
</SectionTitle>
    <Paragraph position="0"> Clarissa uses a grammar-based recognition architecture. At the start of the project, we had two main reasons for choosing this approach over the more popular statistical one. First, we had no available training data. Second, the system was to be designed for experts who would have time to learn its coverage, and who moreover, as former military pilots, were comfortable with the idea of using controlled language.</Paragraph>
    <Paragraph position="1"> Although there is not much to be found in the literature, an earlier study in which we had been involved (Knight et al., 2001) suggested that grammar-based systems outperformed statistical ones for this kind of user. Given that neither of the above arguments is very strong, we wanted to implement a framework which would allow us to compare grammar-based methods with statistical ones, and retain the option of switching from a grammar-based framework to a statistical one if that later appeared justified. The Regulus and Alterf platforms, which we have developed under Clarissa and other earlier projects, are designed to meet these requirements.</Paragraph>
    <Paragraph position="2"> The basic idea behind Regulus (Regulus, 2005; Rayner et al., 2003) is to extract grammar-based language models from a single large unification grammar, using example-based methods driven by small corpora. Since grammar construction is now a corpus-driven process, the same corpora can be used to build statistical language models, facilitating a direct comparison between the two methodologies.</Paragraph>
    <Paragraph position="3"> On its own, however, Regulus only permits comparison at the level of recognition strings. Alterf (Rayner and Hockey, 2003) extends the paradigm to  the semantic level, by providing a trainable semantic interpretation framework. Interpretation uses a set of user-specified patterns, which can match either the surface strings produced by both the statistical and grammar-based architectures, or the logical forms produced by the grammar-based architecture.</Paragraph>
    <Paragraph position="4"> Table 1 presents the result of an evaluation, carried out on a set of 8158 recorded speech utterances, where we compared the performance of a statistical/robust architecture (SLM) and a grammar-based architecture (GLM). Both versions were trained off the same corpus of 3297 utterances. We also show results for text input simulating perfect recognition.</Paragraph>
    <Paragraph position="5"> For the SLM version, semantic representations are constructed using only surface Alterf patterns; for the GLM and text versions, we can use either surface patterns, logical form (LF) patterns, or both.</Paragraph>
    <Paragraph position="6"> The &amp;quot;Error&amp;quot; columns show the proportion of utterances which produce no semantic interpretation (&amp;quot;Reject&amp;quot;), the proportion with an incorrect semantic interpretation (&amp;quot;Bad&amp;quot;), and the total. Although the WER for the GLM recogniser is only slightly better than that for the SLM recogniser (6.27% versus 7.42%, 15% relative), the difference at the level of semantic interpretation is considerable (6.3% versus 10.2%, 39% relative). This is most likely accounted for by the fact that the GLM version is able to use logical-form based patterns, which are not accessible to the SLM version. Logical-form based patterns do not appear to be intrinsically more accurate than surface (contrast the first two &amp;quot;Text&amp;quot; rows), but the fact that they allow tighter integration between semantic understanding and language modelling is intuitively advantageous.</Paragraph>
  </Section>
  <Section position="5" start_page="30" end_page="31" type="metho">
    <SectionTitle>
4 Open microphone speech processing
</SectionTitle>
    <Paragraph position="0"> The previous section described speech understanding performance in terms of correct semantic interpretation of in-domain input. However, open microphone speech processing implies that some of the input will not be in-domain. The intended behaviour for the system is to reject this input. We would also like it, when possible, to reject in-domain input which has not been correctly recognised.</Paragraph>
    <Paragraph position="1"> Surface output from the Nuance speech recogniser is a list of words, each tagged with a confidence score; the usual way to make the accept/reject decision is by using a simple threshold on the average confidence score. Intuitively, however, we should be able to improve the decision quality by also taking account of the information in the recognised words.</Paragraph>
    <Paragraph position="2"> By thinking of the confidence scores as weights, we can model the problem as one of classifying documents using a weighted bag of words model. It is well known (Joachims, 1998) that Support Vector Machine methods are very suitable for this task.</Paragraph>
    <Paragraph position="3"> We have implemented a version of the method described by Joachims, which significantly improves on the naive confidence score threshold method.</Paragraph>
    <Paragraph position="4"> Performance on the accept/reject task can be evaluated directly in terms of the classification error. We can also define a metric for the overall speech understanding task which includes the accept/reject decision, as a weighted loss function over the different types of error. We assign weights of 1 to a false reject of a correct interpretation, 2 to a false accept of an incorrectly interpreted in-domain utterance, and 3 to a false accept of an out-of-domain utterance. This  captures the intuition that correcting false accepts is considerably harder than correcting false rejects, and that false accepts of utterances not directed at the system are worse than false accepts of incorrectly interpreted utterances.</Paragraph>
    <Paragraph position="5"> Table 2 summarises the results of experiments comparing performance of different recognisers and accept/reject classifiers on a set of 10409 recorded utterances. &amp;quot;GLM&amp;quot; and &amp;quot;SLM&amp;quot; refer respectively to the best GLM and SLM recogniser configurations from Table 1. &amp;quot;Av&amp;quot; refers to the average classifier error, and &amp;quot;Task&amp;quot; to a normalised version of the weighted task metric. The best SVM-based method (line 6) outperforms the best naive threshold method (line 2) by 5.4% to 7.0% on the task metric, a relative improvement of 23%. The best GLM-based method (line 6) and the best SLM-based method (line 5) are equally good in terms of accept/reject classification accuracy, but the GLM's better speech understanding performance means that it scores 22% better on the task metric. The best quadratic kernel (line 6) outscores the best linear kernel (line 4) by 13%. All these differences are significant at the 5% level according to the Wilcoxon matched-pairs test.</Paragraph>
  </Section>
  <Section position="6" start_page="31" end_page="31" type="metho">
    <SectionTitle>
5 Side-effect free dialogue management
</SectionTitle>
    <Paragraph position="0"> In an open microphone spoken dialogue application like Clarissa, it is particularly important to be able to undo or correct a bad system response. This suggests the idea of representing discourse states as objects: if the complete dialogue state is an object, a move can be undone straightforwardly by restoring the old object. We have realised this idea within a version of the standard &amp;quot;update semantics&amp;quot; approach to dialogue management (Larsson and Traum, 2000); the whole dialogue management functionality is represented as a declarative &amp;quot;update function&amp;quot; relating the old dialogue state, the input dialogue move, the new dialogue state and the output dialogue actions.</Paragraph>
    <Paragraph position="1"> In contrast to earlier work, however, we include task information as well as discourse information in the dialogue state. Each state also contains a back-pointer to the previous state. As explained in detail in (Rayner and Hockey, 2004), our approach permits a very clean and robust treatment of undos, corrections and confirmations, and also makes it much simpler to carry out systematic regression testing of the dialogue manager component.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML