File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1018_intro.xml
Size: 5,760 bytes
Last Modified: 2025-10-06 14:01:04
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1018"> <Title>Domain Portability in Speech-to-Speech Translation</Title> <Section position="3" start_page="1" end_page="2" type="intro"> <SectionTitle> 3. EXPERIMENT 1: EXTENSION OF SEMANTIC GRAMMAR RULES BY HAND AND BY AUTOMATIC LEARNING </SectionTitle> <Paragraph position="0"> Experiment 1 concerns extension of the coverage of semantic grammars in the medical domain. Semantic grammars are based on semantic constituents such as request information phrases (e.g., I was wondering::: ) and location phrases (e.g., in my right arm) rather than syntactic constituents such as noun phrases and verb phrases. In other papers [12, 5], we have described how our modular grammar design enhances portability across domains. The portable grammar modules are the cross-domain module, containing rules for things like greetings, and the shared module, containing rules for things like times, dates, and locations. Figure 1 shows a parse tree for the sentence How long have you had this pain? XDM indicates nodes that were produced by cross-domain rules. MED indicates nodes that were produced by rules from the new medical domain grammar.</Paragraph> <Paragraph position="1"> The preliminary doctor-patient grammar focuses on three medical situations: give-information+existence --giving information about the existence of a symptom (I have been getting headaches); give-information+onset - giving information about the onset of a symptom (The headaches started three months ago); and give-information+occurrence --giving information about the onset of an instance of the symptoms (The headaches start behind my ears). Symptoms are expressed as body-state (e.g., pain), body-object (e.g., rash), and body-event (e.g., bleeding).</Paragraph> <Paragraph position="2"> Our experiment on extendibility was based on a hand written seed grammar that was extended by hand and by automatic learning. The seed grammar covered the domain actions mentioned above, but did not cover very many ways to phrase each domain action. For example, it might have covered The headaches started</Paragraph> <Paragraph position="4"> three months ago but not I started getting the headachesthree months ago. The seed grammar was extended by hand and by automatic learning to cover a development set of 133 utterances. The result was two new grammars, a human-extended grammar and a machine-learned grammar, referred to as the extended and learned grammars in Table 1. The two new grammars were then tested on 132 unseen sentences in order to compare generality of the rules.</Paragraph> <Paragraph position="5"> Results are reported only for 83 of the 132 sentences which were covered by the current interlingua design. The remaining 49 sentences were not covered by the current interlingua design and were not scored. Results are shown in Table 1.</Paragraph> <Paragraph position="6"> The parsed test sentences were scored in comparison to a hand-coded correct interlingua representation. Table 1 separates results for six components of the interlingua: speech act, concepts, top-level arguments, top-level values, sub-level arguments, and sub-level values, in addition to the total interlingua, and the domain action (speech act and concepts combined). The components of the interlingua were described in Section 2.</Paragraph> <Paragraph position="7"> The scores for the total interlingua and domain action are reported as percent correct. The scores for the six components of the interlingua are reported as average percent precision and recall. For example, if the correct interlingua for a sentence has two concepts, and the parser produces three, two of which are correct and one of which is incorrect, the precision is 66% and the recall is 100%.</Paragraph> <Paragraph position="8"> Several trends are reflected in the results. Both the human-extended grammar and the machine-learned grammar show improved performance over the seed grammar. However, the human extended grammar tended to outperform the automatically learned grammar in precision, whereas the automatically learned grammar tended to outperform the human extended grammar in recall. This result is to be expected: humans are capable of formulating correct rules, but may not have time to analyze the amount of data that a machine can analyze. (The time spent on the human extended grammar after the seed grammar was complete was only five days.) Grammar Induction: Our work on automatic grammar induction for Experiment 1 is still in preliminary stages. At this point, we have experimented with completely automatic induction (no interaction with a user) of new grammar rules starting from a core grammar and using a development set of sentences that are not parsable according to the core grammar. The development sentences are tagged with the correct interlingua, and they do not stray from the concepts covered by the core grammar -- they only correspond to alternative (previously unseen) ways of expressing the same set of covered concepts. The automatic induction is based on performing tree matching between a skeletal tree representation obtained from the interlingua, and a collection of parse fragments that is derived from parsing the new sentence with the core grammar. Extensions to the existing rules are hypothesized in a way that would produce the correct interlingua representation for the input utterance.</Paragraph> <Paragraph position="9"> Figure 2 shows a tree corresponding to an automatically learned rule. The input to the learning algorithm is the interlingua (shown in bold boxes in the figure) and three parse chunks (circled in the figure). The dashed edges are augmented by the learning algorithm.</Paragraph> </Section> class="xml-element"></Paper>