File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2174_metho.xml

Size: 12,048 bytes

Last Modified: 2025-10-06 14:07:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2174">
  <Title>Chinese Generation in a Spoken Dialogue Translation System</Title>
  <Section position="3" start_page="1141" end_page="1141" type="metho">
    <SectionTitle>
2. Semantic Representation
</SectionTitle>
    <Paragraph position="0"> The most obvious characteristics of the selnantic representation are its independence of peculiarities of any language and its underspecification. But it lnUSt capture the speaker's intent. The whole semantic representation has up to four components as shown iu figure l: speaker tag, speech act, topic and arguments.</Paragraph>
    <Paragraph position="1"> The speaker tag is either &amp;quot;a&amp;quot; lbr agent or &amp;quot;c&amp;quot; for customer to indicate who is speaking. The speech act indicates the speaker's intent. The topic expresses the current focus. The arguments indicate other inforlnatiou which is necessary to express the entire meaning of the source sentence.  Both the topic and arguments are made up of attribute-value pairs in functional formalisms. The attribute can be any concept defined in the dolnain of hotel reservation. The value can be an atomic symbol or recursively an attribute-value pair. The symbol &amp;quot;^&amp;quot; in the topic expression indicate that the expression can appears zero to one time, while The symbol &amp;quot;*&amp;quot; iu the argument expression shows that the expression can appears zero to any times. And the attribute-value pairs are order free. Both topic and arguments are optional parts in the USR. Let us consider a complex semantic expression extracted from our corpus. It is shown in Example I: a: give-information: (available -- (room = (room- type = double ))) : (price = (quantity</Paragraph>
    <Paragraph position="3"> In Example 1, the speech act is giveinformation, which means that the agent is offering information to the customer. The topic indicates there are double rooms. The arguments list the prices of double rooms, which shows that there are two kinds of double rooms available. So the meaning of this representation is &amp;quot; We have two kinds of double rooms which cost 200 mad 240 dollars respectively&amp;quot;. From the USR, the kinds of rooms are not expressed explicitly in the format. Only from the composite value of the concept &amp;quot;price &amp;quot; can we judge there are two kinds of rooms because the price is different. This is only one example of underspecification, which needs inferences from the input and the domain knowledge.</Paragraph>
  </Section>
  <Section position="4" start_page="1141" end_page="1143" type="metho">
    <SectionTitle>
3. The Microplanner
</SectionTitle>
    <Paragraph position="0"> The input to our microplanner is the underspecified semantic representation. From the above semantic representation, we can see that it is underspecified because it lacks infornlation such as predicate-argument structure, cognitive status of referents, or restrictive/attribute fimction of semantic properties. Some of the non-specified pieces of ilfformation such as predicate/argument structure are essential to generate a correct translation of the source sentence. Fortmmtely, much of the information which is not explicitly represented can be inferred fiom default knowledge about the specific domain and the general world knowledge.</Paragraph>
    <Paragraph position="1"> The lnicroplanner includes two parts: sentence-level planning and phrase-level planning.  The sentence planner maps the semantic representation into predicate argument structure. And the phrase planner maps the concepts defined in the domain into Chinese phrases.</Paragraph>
    <Paragraph position="2"> In order to express rules, we design a format t'or them. The rules are represented as patternconstraints-action triples. A pattern is to be matched with part of the input on the sentence level and with the concepts on the phrase level. The constraints describe additional context-dependent requirements to be fulfilled by the input. And the action part describes the predicate argument structure and other information such as mood and sentence type. An example describiug a sentence-level rule is shown in Figure 2.</Paragraph>
    <Paragraph position="4"> First, we match the pattern part with the input USI&gt;,. If matched, the constraint is tested. In the example, the concept price lnust exist in the input.</Paragraph>
    <Paragraph position="5"> The action part describes the whole sentence structure such as predicate argument structure, sentence type, voice, mood. The symbol &amp;quot;#get&amp;quot; in the action part indicates thai the value can be obtained by accessing the phrase rules or the dictionary to colnplete the structure recursively.</Paragraph>
    <Paragraph position="6"> The &amp;quot;#get&amp;quot; expression has two parameters. The first parameter can be &amp;quot;concept&amp;quot; or &amp;quot;attribute&amp;quot; to indicate to access the dictionary and phrase rues respectively. The second parameter is a concept defined in the domain. In the example, the &amp;quot;#get&amp;quot; expression is used to get the value of the domain concepts room and price respectively. The symbol &amp;quot;optionah&amp;quot; indicates that the attribute-value pair behind it is optional. If the input has the concept, we fill it.</Paragraph>
    <Paragraph position="7"> After the sentence- and phrase-level phmning, we must access the Chinese dictionary to get the part-of-speech of the lexicon and other syntactic information. If the input is the representation in Example I, the result of the microplanning is shown in Figure 3.</Paragraph>
    <Paragraph position="9"> (at, x il iary =(lex =' I'1I'.1')))) lVigure 3 Microplanning Result for Example 1 In the above example, &amp;quot;cat&amp;quot; indicates the category of the sentence, plnases or words. &amp;quot;h'x&amp;quot; denotes the Chinese words. &amp;quot;case&amp;quot; describes the semantic roles of the arguments.</Paragraph>
    <Paragraph position="10"> Target language generation in dialogue translation systems imposes strong constraints on the whole generation. A prominent pmblena is the non-welformedness of the input. It forces the generation module to be robust to cope with the erroneous and incomplete input data. In this level, we design some general rules. The input is first to be matched with the specific rules. If there is no rules matched, we access the general rules to match with the input. In this way, although the input is somehow ill-formed, the output still includes the main information of the input. An example is shown in (2). The utterance is supposed for the custom to accept the single room offered by the agent. But the speech act is wrong because the speech act &amp;quot;ok&amp;quot; is only used to  indicate' that the custol-u and tile agenl has agreed</Paragraph>
    <Paragraph position="12"> Although example (2) is ill formed, it includes most information of the source sentence. Our robust generator can produce the sentence shown in (3).</Paragraph>
    <Paragraph position="13"> Cl'i-)kf/iJ~J: ( yes, a single roont ) (3)</Paragraph>
  </Section>
  <Section position="5" start_page="1143" end_page="1144" type="metho">
    <SectionTitle>
4. Syntactic realizatim
</SectionTitle>
    <Paragraph position="0"> The syntactic realizer proceeds from the microplannir~g result as shown in t&amp;quot;igure 3. The realizer is based on a flmctional uuificati,,m fornmlism.</Paragraph>
    <Paragraph position="1"> lit tMs module, we also introduce the template nlethod. If lhe input includes an attribute~wflue pair which uses &amp;quot;template&amp;quot; as file attribute, then rite wflue is taken as canned lexts or word strhws wilh slots. It will appear in the output without any modificati(m. So we can embed tile template into the surface realization without modifying tlw whoh: generation l)rocedure. When the hybrid method is used, the input is first matched with the templates defined. If matched, the inputs will go lo llle surface realizer directly, skiplfing tl,c microplanning process.</Paragraph>
    <Paragraph position="2"> The task of the Chinese realizer i:; as tollows:  , Define the sentence struclure (r) Provide ordering constraints among the syntactic constituents of the sentence (r) Select the functional words</Paragraph>
    <Section position="1" start_page="1143" end_page="1143" type="sub_section">
      <SectionTitle>
4.1 \]Intermediate Representation
</SectionTitle>
      <Paragraph position="0"> The intermediate representation(IR) is made up of feature structures. It corresponds to the predicate argument structure. The aim is to normalize the input of tile surface realizer. It is of considerable practical benefit to keep the rule basis as independent as possible front external conditions (such as the domain and output of tile preceding system).</Paragraph>
      <Paragraph position="1"> The intermediate representation includes three parts: predicate int&amp;quot;ormation, obligatory arguments and optional arguments. The predicate inR)rmation describes the top-level information in a clause includiug the main verb, lhe mood, the voice, and so on. The obligatory arguments are slots of roles that must be filled in a clause for it to be contplete. And the optional arguments specify the location, the time, the purpose of the event etc. They arc optional because they do not affect rite contpleteness of a clause. An example is shown in Figure 4. The input is for the sentence &amp;quot;{~J~ l'f\] ~\]l~ ~1{ 1'1 @)\ \[)iJ li!.~ ?&amp;quot; (Do you have single rooms now?). &amp;quot;agrs&amp;quot; antt 'degopt&amp;quot; in Figure 4 represent obligatory arguntents and optional arguments respectively.</Paragraph>
      <Paragraph position="3"> Fip, urc 4 F, xample Intermediate l),el)rescnlalion</Paragraph>
    </Section>
    <Section position="2" start_page="1143" end_page="1144" type="sub_section">
      <SectionTitle>
4.2 Chinese Reallizalion
</SectionTitle>
      <Paragraph position="0"> In tile synlaclic generation module, we use ihe \[unclional unification fommlism. At tile same lime, we make use of dlc systclnic viewpoirl/ of lhe systcrnic function grammar. The rule system is made up of many sub-.sysienls such as transitivily system, mood system, tense system and voice systcllt. The input 111ust depend on all of these systems to make difR:rent level decisions.</Paragraph>
      <Paragraph position="1"> In a spoken dialogue Iranslalion system, real= lime generation is tile basic requiremenl. As we see froln the input as shown in Figure 3, the inlmt to the syntaclic generation provides enough iuformation about sentence and phrase structme.</Paragraph>
      <Paragraph position="2"> Most of the informatiou in tile input ix instautiatcd, such as the verb, the subcategorization frame and the phrase members. So the generation engine can traverse the input in a top-down, depth-first fashion using tmification algorithm (Elhadad 1992). The whole syntactic generation process is described in Figure 5.</Paragraph>
      <Paragraph position="3"> The input is an intermediate representation and the output is Chinese texts. The sentence unification phase defines the sentence structure and orders the components anloDg, tile sentence.</Paragraph>
      <Paragraph position="4">  The phrase unification phase dcl'ines the phrase structure, orders the co~nponenls inside the phrases and adds the function words. Unlike English, Chinese has no morphological markers for tenses and moods. They arc expressed with fmlclional words. Selecting functiolml words correctly is crilical for Chillesc generation.</Paragraph>
      <Paragraph position="5"> ,qelltellCC\[ ~'t &amp;quot;~''~'-t ~ unifica|ion ~lst? -- -- !'-;~7~II1 I - tlni ficatiOll \]~CXt Figure 5 Sleps of the Synlacfic generator The whole unification procedure is: ,, Unify the input with the grammar at the sentence level.</Paragraph>
      <Paragraph position="6"> * identify the conslitules inside the inptll * Unify the constituents with tile grammar a! the phrase level recursively in a top-down, depth-first fashion.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML