XML Viewer - n04-3011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-3011_metho.xml
Size: 4,295 bytes
Last Modified: 2025-10-06 14:08:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-3011">
  <Title>Use and Acquisition of Semantic Language Model</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 SSU MiPad
</SectionTitle>
    <Paragraph position="0"> [?] MiPad is a Web based PIM application that facilitates multimodal access to personal email, calendar, and contact information. MiPad users can combine speech commands with pen gestures to query PIM database, compose or modify email messages or appointments. We recently implemented a version of MiPad in HTML and SALT, taking the native support of SSU in SALT [?] A video demonstration of SSU MiPad is available for</Paragraph>
    <Paragraph position="2"> (Wang, 2002). Whenever a semantic object is detected, the PIM logic based on the current semantic parse is executed and the screen updated accordingly. The nature of SSU insures that the user receives immediate feedback on the process of SLU, and therefore can rephrase rejected and correct misrecognized speech segments. Studies (Wang, 2003) that contrast SSU with conventional turn taking based system show that, because SSU copes with spontaneous speech better, it elicits longer user utterances and hence fewer sentences are needed to complete a task. The highly interactive nature of SSU lends itself to more effective dynamic visual prompting, leading lower out of domain utterances. SSU also simplifies the confirmation strategy as every semantic object can be implicitly confirmed. Users have no trouble dealing with this strategy. In fact, users naturally correct and rephrase based on the immediate feedback, making their speech even more spontaneous. All these results are statistically significant. Finally and most intriguingly, users feel they accomplish tasks faster in the SSU system even though the through puts from both systems are statistically tied.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 SLM Learning
</SectionTitle>
    <Paragraph position="0"> SLU utilizes SLM to infer user's intention from speech.</Paragraph>
    <Paragraph position="1"> Before sufficient data make it practical to use machine learning techniques, SLM often has to be developed manually. The manual development process is laborintensive, requires expertise in linguistics and speech understanding, and often lacks good coverage because it is hard for a developer to anticipate all possible language constructions that different users may choose to express their minds. The manually developed model is therefore not robust to extra-grammaticality commonly found in spontaneous speech. An approach to address this problem is to employ a robust parser to loosen the constraints specified in the SLM, which sometimes results in unpredictable system behavior (Wang, 2001).</Paragraph>
    <Paragraph position="2"> The robust parser approach also mandates a separate understanding pass from speech recognition. The results tend to be suboptimal since the first pass, optimizing ASR word accuracy, does not necessarily lead to a higher overall SLU accuracy (Wang and Acero, 2003b).</Paragraph>
    <Paragraph position="3"> We have developed example-based grammar leaning algorithms to acquire SLM for speech understanding. It is shown (Wang and Acero, 2002) that a grammar learning algorithm may result in a semantic context free grammar that has better coverage than manually authored grammar. It is demonstrated (Wang and Acero, 2003a) that a statistical model can also be obtained by the learning algorithm, and the model itself is robust to extra-grammaticality in spontaneous speech. Therefore, a robust parser is no longer necessary. Most importantly, such a statistical SLM can be incorporated directly into the search algorithm for ASR, making a single pass, joint speech recognition and understanding process such as SSU possible. Because of that, the model can be trained directly to optimize the understanding accuracy. It is shown (Wang and Acero, 2003b) that the single pass approach achieved a 17% understanding accuracy improvement even though there is a signification word error rate increase, suggesting that optimizing ASR and SLU accuracy may indeed be two very different businesses after all.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML