File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3007_concl.xml
Size: 3,150 bytes
Last Modified: 2025-10-06 13:54:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3007"> <Title>Robustness Issues in a Data-Driven Spoken Language Understanding System</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> The spoken language understanding (SLU) system discussed in this paper is entirely statistically based. The recogniser uses a HMM-based acoustic model and an n-gram language model, the semantic parser uses a hidden vector state model and the dialogue act decoder uses Bayesian networks. The system is trained entirely from data and there are no heuristic rules. One of the major claims motivating the design of this type of system is that its fully-statistical framework makes it intrinsically robust and readily adaptable to new applications. The aim of this paper has been to investigate this claim experimentally via two sets of experiments using a system trained on the ATIS corpus.</Paragraph> <Paragraph position="1"> In the first set of experiments, the acoustic test data was corrupted with varying levels of additive car noise.</Paragraph> <Paragraph position="2"> The end-to-end system performance was then measured along with the individual component performances. It was found that although the addition of noise had a substantial effect on the word error rate, its relative influence on both the semantic parser slot/value retrieval rate and the dialogue act detection accuracy was somewhat less. Overall, the end-to-end error rate degraded relatively more slowly than word error rate and perhaps most importantly of all, there was no catastrophic failure point at which the system effectively stops working, a situation not uncommon in current rule-based systems.</Paragraph> <Paragraph position="3"> In the second set of experiments, the ability of the semantic decoder component to be adapted to another application was investigated. In order, to limit the issues to parameter mismatch problems, the new application chosen (Communicator) covered essentially the same set of concepts but was a rather different corpus with different user speaking styles and different syntactic forms. Overall, we found that moving a system trained on ATIS to this new application resulted in a 6% absolute drop in F-measure on concept accuracy (i.e. a 62% relative increase in parser error) and by extrapolation with the results in the ATIS domain, we infer that this would make the nonadapted system essentially unusable in the new application. However, when adaptation was applied using only 50 adaptation sentences, the loss of concept accuracy was mostly restored. Specifically, using log-linear adaptation, the out-of-domain F-measure of 83.3% was restored to 89.2% which is close to the in-domain F-measure of 89.5%.</Paragraph> <Paragraph position="4"> Although these tests are preliminary and are based on off-line corpora, the results do give positive support to the initial claim made for statistically-based spoken language systems, i.e. that they are robust and they are readily adaptable to new or changing applications.</Paragraph> </Section> class="xml-element"></Paper>