File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-3011_intro.xml
Size: 10,391 bytes
Last Modified: 2025-10-06 14:04:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3011"> <Title>INDEPENDENT TRANSFER USING GRAPH UNIFICATION</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2. Parsing </SectionTitle> <Paragraph position="0"> Unlike approaches such as Kaplan & al (1989), which produce bilingual descriptions in the course of parsing source language text, transfer in our system has a completed parse as a starting point. Currently, this parse is produced by a general-purpose parser, PEG of IBM T.J.Watson Research Center (Jensen 1986), which is not unification-based. However, its output is close enough to a directed graph to 'allow conversion into the form required by the transfer system using a simple conversion interface.</Paragraph> <Paragraph position="1"> It appears to us that this decoupling of parsing from transfer is a safe move. Knowledge of the target language is not likely to influence ~)arsing of the source language in any significant fashion .</Paragraph> <Paragraph position="2"> 60 1 3. lhe transfer system Our Iransfer system consists of two modules. A declarative module defines translation correst)ondenees of individual phrases, structures and features. The information is given in bilingual (or multilingual) transfer dictionaries. null An algorithmic modtde actually builds the correspondence structure out of the source language f-structure and the transfer dictionaries. This component ensures that all necessary alternatives are considered and the relevant information is incorporated into a correct location in the correspondence structure.</Paragraph> <Paragraph position="3"> We discuss these two modules in turn.</Paragraph> <Paragraph position="4"> 3.1. The transfer lexicon A leading idea of the lexicon system is the separation of four different lexicons as follows:</Paragraph> <Paragraph position="6"> DGLEX is a lexicon of general linguistic definitions of terms. There are two monolingtml lexicons, ELEX and FLEX, and a bilingual u'ansfer lexicon, TFLEX. The monolingual lexicons depend on DGLEX, and TFLEX can refer to the other three. No further dependencies are allowed. This increases the independence between tile component lexicons and makes them reusable for multi-lingual translation.</Paragraph> <Paragraph position="7"> The descriptions in both monolingual lexicons are kept independent of one gmother and linguistically motivated. Complex and ad hoc statements belong in TFLEX; it cannot be expected that all bilingual intertranslatability relations should follow linguistic generalizations. Correspondingly, we may distinguish two kinds of multi-word expressions. Language-internal idioms (e.g., keep tabs in English) are given in the monolingual lexicons, whereas the other type, which might be called &quot;transfer idioms&quot;, are referred m at the level of tnmsfer entries only (e.g., have access to, which translates into one Finnish verb).</Paragraph> <Paragraph position="8"> 3.2. The specification language The linguistic description language has two levels, an internal representation in terms of attribute value graphs, and a delinition language consisting of templates abbreviating such graphs. As examples of the latter, conskler the simple entries below.</Paragraph> <Paragraph position="9"> (i) (d~scuss v slmpleobj-e) (2) (keskustella v sJmpJeob\]-ela) (3) (d\]scuss (e (@ e::discnss)) The entries are from ELEX, FLEX, and TFLEX, respectively, and together they specify the transfer relation between English discus,; ~d its Finnish equivalent keskustella. (The transfer entry is shown expanded into graph form in fig. 4.) The graph formalism we use is a standard attribute value unification formalism except for the use of cyclic graphs. The graph specification language extends the template language used in D-PATR in the following respects: * Compile-time disjunction is included * Parametric templates are included 3.3. Transfer feature structures (TFS) The transfer relation between source and target language feature structures could be represented in different ways. Separate feature structures could be set up for the source language and the target language, and all explicit transfer relation between these two structures could be defined (Kaplan & al. 1989). in our system, there is only one larger transfer feature structure (TFS) which includes both feature structures and specifies the explicit transfer relation for intertranslatable phrases of source and target languages.</Paragraph> <Paragraph position="10"> The TFS contains extra levels ofa|tributes for the soume and target lar~guagc. Intertranslalable phrases form subdescriptions which have two altributes, one for each language. The values of these attributes are always trims- null mon features and especially component phrases which, in turn, are translations of each other.</Paragraph> <Paragraph position="11"> An example of a ~anslation relation expressed in one feature structure is given in fig. 1. This structure contains the feature descriptions of both the English and Finnish sentences and coreferential links that bind the corresponding units together.</Paragraph> <Paragraph position="12"> Monolingual feature representations can be read off the bilingual one by omitting all attribt, te-value pairs where tile attribute is the name of tile other language. The Finnish language subgraph of the previous example is given in fig. 2.</Paragraph> <Paragraph position="13"> 3.4. Transfer rules A transfer rule in this approach is formally just another transfer feature structure, similar to the bilingual structure. It is a partial specification of an acceptable intertranslatability relation. The rule is applied to a TFS by unifying it with a specified node in the &quot;ITS. The transfer process consists simply of adding of further information into a partially described instance of the transfer relation. There is no formal distinction between lexical and grammatical transfer rules. Examples of different types of rule are given in figures 3-5.</Paragraph> <Paragraph position="14"> Some aspects of our linguistic description will be briefly described. In monolingual lexicons, shills in grammatical function like the English active and passive are described as different Iinkings of arguments to grammatical functions, in this case, the subject and the object function. In transfer of complement-taking elements, we can then for the most part rely on the simple rule &quot;equate arguments&quot;, which resulLs in correct bilingual correspondences given the language-particular linkings. For example, the verb disc~s (fig. 4) rakes as its second argument a direct object in English but an oblique complement in Finnish, but this language-particular informatkm need not be recapitulated in the transfer entry.</Paragraph> <Paragraph position="15"> There are also translation equivalents whose arguments do not match, and these receive slightly more complex transfer rules where argument equations are expressed separately.</Paragraph> <Paragraph position="16"> Graph unification descriptions are particularly simple and effective where the relevant structures consist of predicates u~king a restricted number of unique argument types, such as subject, object, or sentential complement. Adjuncts, which may have multiple instantiations for each head, need a different treatment. Each of the adjuncts has a unique modifiend (modif = the modified word), #1\[E: 'T ..... o - t ~z,X. mXAM~ LE 62 3 which it may share with other adjuncts. We allow adjuncts to point back to the modifiend so as to let transfer rules refer to properties of the rnodifiend. This means that a TFS can be a cyclic graph. This is illustrated in fig. 6. 4. ~,~neroHon Since complex aspects of the transfer mapping are handled by the parser and the transfer system, generation in our model remains simple. It involves a recursive sort of the lexical entxies of the target language and the generation of morphologically inflected forms from sets of morphological features.</Paragraph> <Paragraph position="17"> The linearization component uses a set of unification based LP rules operating on information in the final Finnish feature structure. Discourse-related information relevant tot linearization is included in the feature structure. null For Finnish subjectless clause types, we use a transfer ntle thai requires equation of rite English subject with the Finnish discourse function THEMA. Depending on clause type, any one of the Finnish arguments may appear as a TI-iEMA (e.g., &quot;about it one-must discuss&quot;; see fig. 7). The linearization rule then places the THEMA before the finite verb, preserving, in effect, the characteristic information structure of the English sentence.</Paragraph> <Paragraph position="18"> our experience. In conclusion, we survey the properties of graph unification that have proved valuable.</Paragraph> <Paragraph position="19"> o Recursive structure of qTS: No limit to the complex~ ity of an entry. Multiword entries on a par with one word entries.</Paragraph> <Paragraph position="20"> Uniformity: Linguistic infommtion at different le~ vels represented in a uniform way. No dichotomy of lexical and structural transfer.</Paragraph> <Paragraph position="21"> Unification: Structure changing correspondences can be expressed through coindexing.</Paragraph> <Paragraph position="22"> Subsumption: Inheritance of definitions allows making generalisations across entries and lexicons.</Paragraph> <Paragraph position="23"> , Partial infornmtion: No requirement of complete~ ness of linguistc descriptions for transfer to work.</Paragraph> <Paragraph position="24"> Disjunctions eliminated by underspecification. No need to make translation related sense distinctions in monolingual lexicons.</Paragraph> <Paragraph position="25"> . Monotonicity: Entries remain valid when lexicon is extended and enriched. Enables incremental refinement of individual entries and grammatical corre~ spondences.</Paragraph> <Paragraph position="26"> * Commutativity and associativity: Entries remain valid when entries or sense definitions are rearranged or regrouped.</Paragraph> <Paragraph position="27"> tion of completeness of input is not essential for us. Nothing in principle rules out incremental transfer during parsing.</Paragraph> </Section> class="xml-element"></Paper>