File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1025_metho.xml
Size: 7,088 bytes
Last Modified: 2025-10-06 14:07:34
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1025"> <Title>Exploring Speech-Enabled Dialogue with the Galaxy Communicator Infrastructure</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. HIGHLIGHTED PROPERTIES </SectionTitle> <Paragraph position="0"> The GCSI is a distributed hub-and-spoke infrastructure which allows the programmer to develop Communicator-compliant servers in C, C++, Java, Python, or Allegro Common Lisp. This system is based on message passing rather than CORBA- or RPC-style APIs. The hub in this infrastructure supports routing of messages consisting of key-value pairs, but also supports logging and rule-based scripting. Such an infrastructure has the following desirable properties: * The scripting capabilities of the hub allow the programmer to weave together servers which may not otherwise have been intended to work together, by rerouting messages and their responses and transforming their keys.</Paragraph> <Paragraph position="1"> * The scripting capabilities of the hub allow the programmer to insert simple tools and filters to convert data among formats.</Paragraph> <Paragraph position="2"> * The scripting capabilities of the hub make it easy to modify the message flow of control in real time.</Paragraph> <Paragraph position="3"> * The scripting capabilities of the hub and the simplicity of message passing make it simple to build up systems bit by bit.</Paragraph> <Paragraph position="4"> * The standard infrastructure allows the Communicator program to develop platform- and programming-language-independent service standards for recognition, synthesis, and other better-understood resources.</Paragraph> <Paragraph position="5"> * The standard infrastructure allows members of the Communicator program to contribute generally useful tools to other program participants.</Paragraph> <Paragraph position="6"> This demonstration will illustrate a number of these properties.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. DEMO CONFIGURATION AND CONTENT </SectionTitle> <Paragraph position="0"> By way of illustration, this demo will simulate a process of assembling a Communicator-compliant system, while at the same time exemplifying some of the more powerful aspects of the infrastructure. The demonstration has three phases, representing three successively more complex configuration steps. We use a graphical display of the Communicator hub to make it easy to see the behavior of this system.</Paragraph> <Paragraph position="1"> As you can see in Figure 1, the hub is connected to eight servers: We will use the flexibility of the GCSI, and the hub scripting language in particular, to change the path that messages follow among these servers.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Phase 1 </SectionTitle> <Paragraph position="0"> In phase 1, we establish audio connectivity. JDAS is MITRE's contribution to the problem of reliable access to audio resources. It is based on JavaSound 1.0 (distributed with JDK 1.3), and supports barge-in. We show the capabilities of JDAS by having the system echo the speaker's input; we also demonstrate the barge-in capabilities of JDAS bye showing that the speaker can interrupt the playback with a new utterance/input. The goal in building JDAS is that anyone who has a desktop microphone and the Communicator infrastructure will be able to use this audio server to establish connectivity with any Communicator-compliant recognizer or synthesizer.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Changing the message path </SectionTitle> <Paragraph position="0"> The hub maintains a number of information states. The Communicator hub script which the developer writes can both access and update these information states, and we can invoke &quot;programs&quot; in the Communicator hub script by sending messages to the hub. This demonstration exploits this capability by using messages sent from the graphical display to change the path that messages follow, as illustrated in Figure 2. In phase 1, the hub script routed messages from JDAS back to JDAS (enabled by the message named &quot;Echo&quot;). In the next phase, we will change the path of messages from JDAS and send them to a speech recognizer.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Phase 2 </SectionTitle> <Paragraph position="0"> Now that we've established audio connectivity, we can add recognition and synthesis. In this configuration, we will route the output of the preferred recognizer to the preferred synthesizer. When we change the path through the hub script using the graphical display, the preferred servers are highlighted. Figure 3 shows that the initial configuration of phase 2 prefers SUMMIT and Festival.</Paragraph> <Paragraph position="1"> The SUMMIT recognizer and the Festival synthesizer were not intended to work together; in fact, while there is a good deal of activity in the area of establishing data standards for various aspects of dialogue systems (cf. [3]), there are no programming-language-independent service definitions for speech. The hub scripting capability, however, allows these tools to be incorporated into the same configuration and to interact with each other. The remaining incompatibilities (for instance, the differences in markup between the recognizer output and the input the synthesizer expects) are addressed by the string server, which can intervene between the recognizer and synthesizer. So the GCSI makes it easy both to connect a variety of tools to the hub and make them interoperate, as well as to insert simple filters and processors to facilitate the interoperation.</Paragraph> <Paragraph position="2"> In addition to being able to send general messages to the hub, the user can use the graphical display to send messages associated with particular servers. So we can change the preferred recognizer or synthesizer. (as shown in Figure 4), or change the Festival voice (as shown in Figure 5). All these messages are configurable from the hub script.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Phase 3 </SectionTitle> <Paragraph position="0"> Now that we've established connectivity with recognition and synthesis, we can add parsing and generation (or, in this case, input paraphrase). Figure 6 illustrates the final configuration, after changing recognizer and synthesizer preferences. In this phase, the output of the recognizer is routed to the parser, which produces a structure which is then paraphrased and then sent to the synthesizer. So for instance, the user might say &quot;I'd like to fly to Tacoma&quot;, and after parsing and paraphrase, the output from the synthesizer might be &quot;A trip to Tacoma&quot;.</Paragraph> </Section> </Section> class="xml-element"></Paper>