XML Viewer - p89-1001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/p89-1001_metho.xml
Size: 15,608 bytes
Last Modified: 2025-10-06 14:12:19
<?xml version="1.0" standalone="yes"?>
<Paper uid="P89-1001">
  <Title>A TRANSFER MODEL USING A TYPED FEATURE STRUCTURE REWRITING SYSTEM WITH INHERITANCE</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A TRANSFER MODEL USING A TYPED FEATURE STRUCTURE
REWRITING SYSTEM WITH INHERITANCE
R6mi Zajac
ATR Interpreting Telephony Research Laboratories
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> We propose a model for transfer in machine translation which uses a rewriting system for typed feature structures. The grammar definitions describe transfer relations which are applied on the input structure (a typed feaane structure) by the interpreter to produce all possible transfer pairs. The formalism is based on the semantics of typed feature structures as described in \[AR-Kaci 84\].</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> We propose a new model for transfer in machine translation of dialogues. The goal is twofold: to develop a linguistically-based theory for transfer, and to develop a computer formalism with which we can implement such a theory, and which can be integrated with a unification-based parser. The desired properties of the grammar are (1) to accept as input a feature structure, (2) to produce as output a feature structure, (3) to be reversible, (4) to be as close as possible to current theories and formalisms used for linguistic description. From (1) and (2), we need a rewriting formalism where a rule takes a feature structure as input and gives a feature structure as output. From O), this formalism should be in the class of unification-based formalisms such as PROLOG, and there should be no distinction between input and output. From (4), as the theoretical basis of grammar development in ATR is HPSG \[Pollard and Sag 1987\], we want the formalism to be as close as possible to HPSG.</Paragraph>
    <Paragraph position="1"> To meet these requirements, a rewriting system for typed feature structures, based on the semantics of typed feature structures described in \[AR-Kaci 84\], has been implemented at ATR by Martin Emele and the author \[Emele and Zajac 89\].</Paragraph>
    <Paragraph position="2"> The type system has a lattice structure, and inheritance is achieved through the rewriting mechanism. Type definitions are applied by the interpreter on the input structure (a typed feature structure) using typed unification in a non-deterministic and monotonic way, until no constraint can be applied.</Paragraph>
    <Paragraph position="3"> Thus, the result is a set of all possible transfer pairs. compatible with the input and with the constraints expressed by the grammar. Thanks to the properties of the rewriting formalism, the transfer grammar is reversible, and can even generate all possible pairs for the grammar, given only the start symbol TRANSLATE.</Paragraph>
    <Paragraph position="4"> We give an outline of the model on a very simple example. The type inheritance mechanism is mainly used to classify common properties of the bilingual lexicon (sect. 1), and rewriting is fully exploited to describe the relation between a surface structure produced by a unification-based parser and the abstract structme used for transfer (sect. 2), and to describe the relation between Japanese and English structures (sect.</Paragraph>
    <Paragraph position="5"> 3). An example is detailed in sect. 4.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="2" type="metho">
    <SectionTitle>
1. LEXICAL TRANSFER AS A
HIERARCHY OF BILINGUAL LEXICAL
DEFINITIONS
</SectionTitle>
    <Paragraph position="0"> The type system is used to describe a hierarchy of concepts, where a sub-concept inherits all of the properties of its super-concepts. The use of type inheritance to describe the hierarchy of lexical types is advocated for example in \[Pollard and Sag 1987, &amp;quot; chap.8\].</Paragraph>
    <Paragraph position="1"> We use a type hierarchy to describe properties which are common to bilingual classes of the bilingual lexicon. The level of description of the bilingual lexicon is the logico-semantic level: a verb for example has a relational role and links different objects through semantic relations (agent, recipient, space-location .... ). Semantic relations in the bilingual lexicon are common to English and Japanese.</Paragraph>
    <Paragraph position="2"> Predicates can be classified according to the semantic relations they establish between objects. For example, predicates which have only an agent case are defined as Agent-Verbs, and verbs which also have a recipient role are defined as Agent-Recipient-Verbs, a sub-class of Agent-Verbs. On the leaves of the hierarchy, we find the actual bilingual entries, which describe only idiosyncratic properties, and thus are very simple.</Paragraph>
    <Paragraph position="3"> The translation relation defined by TRANSLATE is described in sect. 3. We shall concentrate on the propositional part PROP defined here as a disjunction of types:</Paragraph>
    <Paragraph position="5"> The simple hierarchy depicted graphically Figure 1 is written as follows:</Paragraph>
    <Paragraph position="7"> english: \[agent: #e-ag\], trans-ag: PR~P \[ japanese: #j-ag, english: #e-ag\] \].</Paragraph>
    <Paragraph position="8"> in This definition can be read: an Agent-Verb is-a Verb which has-properties agent for Japanese and English. We need to express how the arguments of a relation are translated. This is specified using a trar~late-ag slot with type symbol Pimp, which will be used during the rewriting process (see details in sect 3 and 4). Symbols prefixed with # are tags, which are used to represent co-references (~sharing&gt;O of slructures.</Paragraph>
    <Paragraph position="9"> In this clef'tuition, we have a one-to-one mapping between the agent argument, and at this level of representation (semantic relations), this simple case arises frequently. However, we must also describe mappings between structures which do not have such simple correspondence, such as idiomatic expressions. In that case, we have to describe the relation between predicate-argument structures in a more complex way, as shown for example in sect.4.</Paragraph>
    <Paragraph position="10">  AG-BEC-V = ~C--V \[ japanese: \[recipient: #j-recp\], english: \[recipient: #e-recp\], trans-recp: P~ \[japanese: #j-recp, english: #e-recp\] \].</Paragraph>
    <Paragraph position="11"> ;CJ-REC-OBJ-V ~ ~J-BEC-V \[japanese: \[object: #j-obj\], english: \[object: #e-obj\], trans-obj :PBOP \[ japanese: #j-obj, eng\] 18h: #e-obj \] \].</Paragraph>
    <Paragraph position="12"> NOUN- \[japanese:JN, english:EN\].</Paragraph>
    <Paragraph position="13">  The type system is interpreted using the rewriting mechanism described in \[Ait-Kaci 84\], which gives an operational semantics for type inheritance: a feature structure which has a type ~3--v for example is unified with the definition of this type: \[ japanese: \[agent: #j-ag\], english: \[agent: #e-ag\], trans-ag: PBOP \[ japanese: #j-ag, ~glish: #e-ag\] \] and the type symbol AG-V is replaced with the supertype VERB in the result of the unification. If type VERB has a deC'tuition, the structure is further rewritten, thus achieving the operational interpretation of inheritance. Disjunctions like Pt~Dp create a non-deterministic choice for further rewriting: the symbol E,I~Dp is replaced with the disjunction of symbols of the right-hand-side creating alternative paths in the rewriting process. This process of rewriting is applied on every  sub-structure of a structure to be evaluated, until no type symbol can be rewritten.</Paragraph>
    <Paragraph position="14"> As the rewriting system does not have any explicit control mechanism for rule application, whenever several rules are applicable all paths are explored, and all solutions are produced in a non deterministic way. This could be a drawback for a practical machine translation system, as only one translation should be produced in the end, and due to the non deterministic behavior of the system, this could also lead to severe efficiency problems. However, the system is primarily intended to be used as a tool for developing a linguistic model, and thus the production of all possible solutions is necessary in order to make a detailed study of ambiguities.</Paragraph>
    <Paragraph position="15"> Furthermore, according to the principles of second generation MT systems \[Ynvge 57, Vauquois 75, Isabelle and Macklovitch 86\], a transfer grammar should be purely contrastive, and should not include specific source or target language knowledge. As a result, the synthesis grammar should implement all necessary language specific constraints in order to rule out ungrammatical strucmr~ that could be produced after transfer, and make appropriate pragmatic decisions.</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
2. RELATING SURFACE AND ABSTRACT
SPEECH ACTS
</SectionTitle>
    <Paragraph position="0"> A problem in translating dialogues is to translate adequately the speaker's communicative strategy which is marked in the utterance, a problem that does not arise in text machine translation where a structural translation is generally found sufficient \[Kume et al.</Paragraph>
    <Paragraph position="1"> 88\]. Indirectness for example cannot be translated directly from the surface structure produced by a syntactic parser and needs to be further analyzed in terms independent of the peculiarities of the language \[Kogure et al. 1988\]. For example, take the representation produced by the parser for the sentence \[Yoshimoto and Kogure 1988\]:  The representation has already categorized to a certain extent surface speech acts types. The level of analysis produced by the parser is the level of semantic relations (relation, agent, recipient, object,...). The represonmfion reduced to relation fean~es is:</Paragraph>
    <Paragraph position="3"> The level of representation we want for transfer can be basically characterized by (1) an abstract speech act type (request, declaration, question, promise .... ), (2) a manner (direct, indirect,...), and (3) the propositional content of the speech act \[Kume et al. 88\]. A grammar, written in the same formalism, abstracts the meaning of the surface structm'e to: JhEA \[ speech-act -type: REQUEST, manner: I~DIRECT-ASKINC--POSSIBrLTTY, speaker: #~ker-J-SPF,~'~% hearer: #hea~r-J-~ s-act: JVC~elaticn: O~J~J-1, agent: #hearer, recipient: #speaker, object: ~-i\]\] and this is the input for the transfer module.</Paragraph>
  </Section>
  <Section position="6" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3. DEFINING THE TRANSFER RELATION
AT THE LOGICO-SEMANTIC LEVEL
</SectionTitle>
    <Paragraph position="0"> Each structure which represents an utterance has (I) an abswact speech act type, (2) a type of manner, and (3) a propositional content Each sub-structure of the propositional content has (I) a lexical head, (2) a set of syntactic featur~ (such as tense-aspect-modality, determination, gender .... ), and may have (3) a set of dependents which are analyzed as case roles (agent, time-location, condition .... ).</Paragraph>
    <Paragraph position="1"> The manner and abstract speech act categories are universals (or more exactly, common to this language pair for this corpus), and need not be translated: they are simply stated as identical by means of tag identity.</Paragraph>
    <Paragraph position="2"> The part which represents the propositional content is language dependant, and the translation relation defined between lexical heads, syntactic features and dependents of the heads is defined indirectly by means of transfer rules. Thus, this approach can be characterized as a mix of pivot and wansfer approaches  the translation relation.</Paragraph>
    <Paragraph position="3"> The definitions of the transfer grammar can be divided into three groups:  1) definitions that state equality of abstract speech act type and manner (the language independent parts), 2) lexical def'mitions that relate predicate-argument structures, 3) definitions that relate syntactic features (not yet  included in our grammar).</Paragraph>
    <Paragraph position="4"> sub-class of lexemes. For example, one can write directly SP~ instead of PROP in the trans-spk slot of the above definition. Another possibility for a mono-directional system is to access the bilingual lexicon using the Japanese entry during parsing. This means that the dictionaries of the system would have to be organized as a single integrated bilingual lexical rhtabas~.</Paragraph>
    <Paragraph position="5"> Starting from the abstract speech act description, we need only one definition for specifying the direct mapping of Abstract Speech Acts by tagging, which also introduces the type symbol PROP that will trigger the rewriting process for the transfer grmnmar:.</Paragraph>
    <Paragraph position="7"> \[speech-act-type: #sat, manner: #manner, speaker: #J-spk, hearer: #j-hrr, s-act: #j-act-u-PROP\] \], englimh: EASA \[speech-act-type: #sat, manner: #manner, speaker: #e-spk, hearer: #e-hrr, s-act: #e-act=EPROP\] \], trans-act: PI%0P \[ japanese: # j-act, english: #e-act \] \], trans-spk: PIK)P \[japanese: # j-spk, english: #e-spk\] \], trans-hrr: PROP \[japanese: #j-hrr, english: #e-hrr\] \] .</Paragraph>
    <Paragraph position="8"> In this simple example, the definition of the symbol PR3P contains the full bilingual dictionary. Unifying a structure with ~,l~Zi, means that a structure is unified with a very large disjunction of clef'tuitions. There are several possible ways to overcome this problem. One can use the hierarchical type system to restrict the set of candidates to a small sub-set of definitions and instead of using pROP, use the most adequate specific symbol for translating an argument: such a symbol can be viewed as the initial symbol of a sub-grammar which describes the transfer relation on a</Paragraph>
  </Section>
  <Section position="7" start_page="2" end_page="2" type="metho">
    <SectionTitle>
4. A STEP BY STEP EXAMPLE
</SectionTitle>
    <Paragraph position="0"> We give in this section a trace of a simple example for the sentence in Figure 4. For translating, we need to add to the definition of PRimP, the following bilingual lexical definitions: BOCK- hU3N\[japanese:HCN-l, english:BOOK-l\].</Paragraph>
    <Paragraph position="1"> -IggXlq\[japanese: TE-1, en~\]tqh:HAlXD-l\].</Paragraph>
    <Paragraph position="2">  A lexical definition introduces the PPJ3P symbol for the arguments of a predicate, and the translation relation is defined recursively between argument substructures. There could be one-to-one mapping between two substructures, but as in the example of 2~.X2H, the relation is in general not purely compositional, and not one-to-one, and argument description can be as refined as necessary. Here, the object TE-1 (&lt;~hand&gt;&gt;) is a part of the meaning of ~touch~ in this kind of construction, and the semantic relation that links the predicate and the object being touched is a spatial destination in Japanese (perceived as a goal or a target) and an object in English.</Paragraph>
    <Paragraph position="3"> INPUT : a structure representing a deep analysis of the sentence in Figure 4. The initial symbol that will be rewritten is ~.--'g (symbols to be rewritten are in bold face).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML