File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/h93-1073_abstr.xml
Size: 3,809 bytes
Last Modified: 2025-10-06 13:47:46
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1073"> <Title>SESSION 13: NEW DIRECTIONS</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> SESSION 13: NEW DIRECTIONS </SectionTitle> <Paragraph position="0"> The three papers of Session 13 address issues differing from those in the remainder of the workshop. Two employ a methodology to discover preferences for speech input in a multi-modal interface. The third raises issues of processing human language without any assumption that the speech or text has been converted to an online sequence of ASCII characters (or other character codes).</Paragraph> <Paragraph position="1"> As an introduction to the first two papers, consider Wizard of Oz experiments such as used in collecting ATIS data. In such an experiment, the subject is asked to use a system to solve one or more problems. The &quot;system&quot; could be a person who simulates a proposed capability, for instance to determine language and interface properties for a proposed computer capability. Alternatively, the system might be an existing capability.</Paragraph> <Paragraph position="2"> Perhaps the first such experiment was performed by Ashok Malhotra (1975) to collect data that would suggest how varied (and challenging) textual queries would be in an interactive query application. Malhotra simulated the whole system, a very labor-intensive task.</Paragraph> <Paragraph position="3"> The first paper (&quot;Mode Preference in a Simple Data-Retrieval Task&quot;) employs fully implemented components to measure user preference for spoken input, versus filling a form, versus employing a scroll bar to look up telephone numbers in an online telephone book. The paper immediately got my attention with the following statement in the introduction, &quot;For activities in a workstation environment, formal comparisons of speech with other input modes have failed to demonstrate a clear advantage for speech on conventional aggregate measures of performance, such as time-to-completion . . . &quot;. The author's experiments demonstrate a flaw in the analysis of previous results and go on to measure a marked preference for speech input, even when speech may not give the best time-to-completion results.</Paragraph> <Paragraph position="4"> The second paper, &quot;A Simulation-Based Research involves a person behind the scenes (the wizard) simulating the system, though much is automated.</Paragraph> <Paragraph position="5"> The resulting environment being simulated for the user is quite rich, allowing both speech and handwriting input. Careful preparation of the experimental environment enabled automated support so that response to the user is streamlined, thereby allowing the user to move at his/her own pace. To illustrate the kind of studies the methodology supports, the authors show some results suggesting that syntactic ambiguity is less when filling out a form (rather than when producing unconstrained input) and is also less in handwriting than in speech.</Paragraph> <Paragraph position="6"> The third paper, &quot;Speech and Text-Image Processing in Documents,&quot; assumes minimal signal processing. For instance, they describe editing and indexing of audio forms rather than the text file resulting from continuous speech recognition. Similarly &quot;text-image&quot; processing, is the editing of the bitmap representation resulting from scanning a document in, rather than editing a sequence of bytes in some character code such as ASCII. One of the tools described is therefore aptly named &quot;Image Emacs&quot;. A third effort described in this paper is document image decoding, a framework for processing scanned-in documents.</Paragraph> </Section> class="xml-element"></Paper>