XML Viewer - c90-2038

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2038_metho.xml
Size: 16,885 bytes
Last Modified: 2025-10-06 14:12:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2038">
  <Title>Incremental Sentence Production with a Parallel Marker-Passing Algorithm</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Activation of Lexical and Phrasal
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
ltIypolheses and Propositional Contents
</SectionTitle>
      <Paragraph position="0"> When a concept is recognizexl by the parsing process, an hypotheses for translation will be activated. Tile concept can be an individual concept, a phrase or a sentence. In our model, they are all represented as CC nodes, and each inshqnce of the concept is represented as a CI node. The basic process is for each of the activated CCs, LEX nodes 6 in the target language to be activated. There are fern' possible mappings between source language nodes and target language nodes which are activated; word-to-word, phrase-to-word, word-to-phrase, and phrase-to-phrase. In our model, hypotheses for sentences and phrases are represented as CSCs. From the viewpoint of generation, either LEX nodes representing words or CSC nodes representing phrases or entire sentences are activat~xl.</Paragraph>
      <Paragraph position="1"> LEX node activation: There are cases when a word or a phrase can be translated into a word in the target language. In figure 3a and c, the word LEXsc or the phrase CSCsL activates CC~. LEXlrL is activated as a hypothesis of translation for LEXsc or CSCsL interpreted as CC~. A G-Marker is createxl at LEXT,L containing a surface realization, cost, features, and an instance which the LEXlrc represents (CI). The G-Marker is passed up through an IS-A link. When a CCI does not have LEXlrL, a CC2 is activated and a LEX2.tL will be activated. Thus, the most specific word in the target language will be activated as a hypothesis.</Paragraph>
      <Paragraph position="2"> CSC node activation: When a CC can be represented by a phrase or sentence, a CSC node is activated and a 6LEX nodes are a kind of CSC which represent a lexical entry and phonological realization of the word.</Paragraph>
      <Paragraph position="3"> G-Marker which contains that phrase or sentence will be created. In figure 3b and d, LEXsL and CSCsL activates CCl which has CSCI.rL. In this case, CSClrL will be activated as a hypothesis to translate LEXsL or CSCsL interpreted as CC1. In particular, activation of CSCrL by CSCsL is interesting because it covers cases where two expressions can be translated only at phrasal or sentenial correspondence, not at the lexical level. Such cases are often found in greetings or canned phrases. It should be noted that CSCs represent either syntactic rules or eases of utterance. Assuming eases are acquired from legitimate utterances of native speakers, use of cases for a generation process should be preferred over purely syntactic formulation of sentences because use of cases avoids generation of sentences which are syntactically sound but never uttered by native speakers.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Syntactic and Lexical Selections
</SectionTitle>
    <Paragraph position="0"> Syntactic and lexical selections are conducted involving three processes: feature aggregation, constraint satisfaction, and competitive activation. Feature aggregation and constraint satisfaction correspond to a symbolic approach to syntactic and lexieal selection which guarantee grammaticality and local semantic accuracy of the generated sentences, and the competitive activation process is added in order to select tile best decision among multiple candidates. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Feature Aggregation
</SectionTitle>
      <Paragraph position="0"> Feature aggregation is an operation which combines features in the process of passing up G-Markers so that minimal features are carried up. Due to the hierarchical organization of the memory network, features which  need to be carried by G-Markers are different depending upon which level of abstraction is used for generation'. Given the fact that unification is a computationally expensive operation, aggregation is an efficient mechanism for propagating features because it ensures only minimal features are aggregated when features are unified, and aggregation itself is a cheap operation since it simply adds new features to existing features in the G-Marker. One other advantage of this mechanism is that the case-based process and the constraint-based process are treated in one mechanism because features required for each level of processing are incrementally added in G-Markers.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Constraint Satisfaction
</SectionTitle>
      <Paragraph position="0"> Constraint is a central notion in modern syntax theories.</Paragraph>
      <Paragraph position="1"> Each CSC has constraint equations which define the constraints imposed for that CSC depending on their level of abstraction 8. Feature structures and constraint equations interact at two stages. At the prediction stage, if a V-Marker placed on the first element of the CSC already contains a feature structure that is non-nil, the feature structure determines, according to the constraint equations, possible feature structures of G-Markers which subsequent elements of the CSC can accept. At a G-V-collision stage, a feature structure in the G-Marker is tested to see if it can meet what was anticipated. If the feature structure passes this test, information in the G-Marker and the V-Marker are combined and more precise predictions are made as to what will be acceptable in subsequent elements. Thus, the grammaticality of the generated sentences is guaranteed. Semantic restrictions are considered in this stage.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Competitive Activation
</SectionTitle>
      <Paragraph position="0"> The competitive activation process introduced either by a C-Marker-passing or by the connectionist network determines the final syntactic and lexical realization of the sentence. Here, we have adopted a costdegbased scheme as we have employed in parsing \[Kitano et. at., 1989a\]. In the cost-based scheme, the hypothesis with the least cost will be selected. This idea reflects our view that parsing and generation are dynamic processes in which the state of the system tends to a global minima, and that a cost represents dispersion of energy so that higher cost hypotheses are less likely to be taken as the state of the system. In the actual implementation, we compute a cost  contraint equations since they axe already instanfiated and the CSCs are indexed in the memory network.</Paragraph>
      <Paragraph position="1"> of each hypothesis which is determined by a C-Marker-passing scheme or a connectionist network.</Paragraph>
      <Paragraph position="2"> The C-Marker passing scheme puts C-Markers at contextually relevant nodes when a conceptual root node is activated. A G-Marker which goes through a node without a C-Marker will be added with larger cost than others. When there are multiple hypothesis for the specific CC node; i.e. when multiple CSCs are linked with the CC, we will add up the cost of each G-Marker used for each linearization combined with pragmatic constraints which may be assigned to each CSC, and the preference for each CSC, and the hypothesis with least cost will ~; selected as the translated result.</Paragraph>
      <Paragraph position="3"> The Connectionist Network will be adopted with some computational costs. When a connectionist network is fully deployed, every node in the network is connected with weighted links. A competitive excitation and inhibition process is performed to select one hypothesis. Final interpretation and translation in the target language are selected through a winner-take-all mechanism.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Committment and Ambiguities
</SectionTitle>
    <Paragraph position="0"> One of the most significant issues is how to resolve ambiguities of the parsing process as early as possible, so that the final translation hypothesis can be determined as early as possible. Since many sentences are ambiguous until, at least, the entire clause is analyzed, disambiguation necessarily imposes constraints upon scheduling of the generation process, However, it should be noted that the human interpreter does not start translating unless she/he is sure about what the sentence means. This allows our model to take a wait-and-see strategy when multiple hypotheses are present during processing of input utterances.</Paragraph>
    <Paragraph position="1"> However, when some ambiguities still remain., the generator needs to commit to one of the hypotheses, which may turn out to be false. This would be even complicated when a source language mid a target language have substantially different linguistic structures. For example, in English, negation comes before a verb, whereas Japanese negation comes after a verb, and the verb comes at the very end of a sentence. In such case, translation cannot be started until the verb, which comes the end of the sentence, was processed, and existance of negation after the verb is checked. Decision has to be made, for this case, to wait translation until these ambiguities are resolved by encountering a clause which follows the initial clause. Fortunately, most Japanese utterance consist of multiple clauses which makes simultaneous interpretation possible. In order to cope with these ambiguities, a simultaneous interpretation system should have capabilities such as (1) anticipating the possiblity of negation at the end, (2) incorporating some heuristics which recover 220 4 false translation to correct one, and (3) making decisions on when to start or wait translations. Theories of commitment in ambiguity resolution and generation are not established, yet, thus they are a subject of further investigations. One possible solution which we are investigating is to use probabilistic speed control of marker propagation as seem in \[Wu, 1989\] so that the best hypothesis is presented first. This would allow the generator to commit upon present hypothesis within its local decisions.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Psychological Plausibility
</SectionTitle>
    <Paragraph position="0"> Psychological studies of sentence production \[Garrett, 1975\] \[Garrett, 1980\] \[Levelt and Maassen, 1981\] \[Bock, 1982\] \[Bock, 1987\] and \[Kempen and Huijbers, 1983\] were taken into account in designing the model. In \[Kempen and Huijbers, 1983\], two independent retrieval processes are assumed, one accounting for abstract prephonologicalitems (L 1-items) and the other for phonological items (L2-items). The lexicalization in their model follows: (1) a simultaneous multiple Ll-item retrieval, (2) a monitoring process which watches the output of Ll-lexicalization to check that it is keeping within constraints upon utterance format, (3) retrieval of L2-items after waiting until the Ll-item has been checked by the monitor, and all other Ll-items become available. In our model, a CCs activation stage corresponds to multiple Ll-item retrieval, constraint checks by V-Markers correspond to the monitoring, and the realization stage which concatenates the surface string in a V-Marker corresponds to the L2-item retrieval stage. The difference between our model and their model is that, in our model, L2-items are already incorporated in G-Markers whereas they assume L2-items are accessed only after the monitoring. Phenomenologically, this does not make a significant difference because L2-items (phonological realization) in our model are not explicitly selected until constraints are met; atwhichpointthemonitoringis completed. However, this difference may be more explicit in tbe production of sentences because of the difference in the scheduling of the L2-itern retrieval and the monitoring. This is due to the fact that our model retains interaction between two levels as investigated by \[Bock, 1987\]. Our model also explains contradictory observations by \[Bock, 1982\] and \[Levo elt and Maassen, 1981\] because activation of CC nodes (Ll-iteras) and LEX nodes (L2-items) are separated with some interactions. Also, our model is consistent with a two-stage model \[Garrett, 1975\] \[Garrett, 1980\]. The functior~alandpositionallevels of processing in his model correspond to the parallel activation of CCs and CSCs, the V-Marker movement which is left to right, and the surface string concatenation during that movement.</Paragraph>
    <Paragraph position="1"> Studies of the planning unit in sentence production \[Ford and Holmes, 1978\] give additional support to the psychological plausibility of our model. They report that deep clause instead of surface clause is the unit of sentence planning. This is consistent to our model which employs CSCs, which account for deep propositional units and the realization of deep clauses as the basic units of sentence planning. They also report that people are planning the next clause while speaking the current clause. This is exactly what our model is performing, and is consistent with our observations from transcripts of simultaneous interpretation.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Relevant Studies
</SectionTitle>
    <Paragraph position="0"> Since most machine translation systems assume sequential parsing and generation, a simple extension of existing systems to combine speech recognition and synthesis would not suffice for interpreting telephony. The main problem is in previously existing systems' inability to attain simultaneous interpretation (whereas partial translation is performed while parsing is in progress), because in other systems a parser and a generator are independent modules, and the generation process is only invoked when the entire parse is completed and full semantic representation is given to the generator. Our model serves as an example of approaches counter to the modular approach, and attains simultaneous interpretation capability by employing incremental parsing and a generation model. Pioneer studies of parallel incremental sentence production are seem in \[Kempen and Hoekamp, 1987\] \[Kempen, 1987\]. They use a segment grammar which is composed of Node-Arc-Node building blocks to attain incremental formation of trees. Their studies parallel our model in many aspects. The segment grammar is a kind of semantic grammar since the arc label of each segment makes each segment a syntax/semantic object. Feature aggregation and constraint satisfaction by G-Markers and V-Markers in our model corresponds to a distributed unification \[De Smedt, 1989\] in the segment grammar. \[De Smedt, 1990\] reports extensively on their approach to incremental sentence generation which parallel to our model in many aspects.</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
8 Current Implementation
</SectionTitle>
    <Paragraph position="0"> The model of generation described in this paper has been implemented as a part of #DMDIALOG, a speech-to-speech dialog translation system developed at the Center for Machine Translation at Carnegie Mellon University.</Paragraph>
    <Paragraph position="1"> #DMDIALOG is implemented on an IBM RT-PC workstation using CMU CommonLisp run on Mach OS. Speech input and voice synthesis are done by connected hardware systems, currently, we are using Matsushita Institute's Japanese speech recognition hardware and DECTalk.</Paragraph>
    <Paragraph position="2"> Figure 4 is an example of how sentences with multideg pie clauses are translated simultaneously in giDMDIALOG.</Paragraph>
    <Paragraph position="3"> Although an input is shown as a word sequence, real run takes speech inputs and a phoneme sequence is used to interface between the speech recognition device and the software. Current implementation translates between Japanese and English and operates on the conference registration domain based on the corpus provided by the ATR Interpreting Telephony Research Laboratories. For more details of the generation scheme described in this paper, refer to \[Kitano, 1990\].</Paragraph>
    <Paragraph position="4"> Currently, we are designing a version of our model to be implemented on massively parallel machines: IXM \[Higuchi et. al., 1989\] and SNAP \[Moldovan et. al., 1989\].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML