XML Viewer - c00-1013

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1013_metho.xml
Size: 22,995 bytes
Last Modified: 2025-10-06 14:07:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1013">
  <Title>OBJECTS STAGES Source son\[ollCO MorphS source SyntS source NormS source NorlnS target SyntS target MorphS target Target sentence 1 Morphological analysis Pal'sing Norlnalization Transfer Expansioll Syntactic synthesis</Title>
  <Section position="2" start_page="0" end_page="85" type="metho">
    <SectionTitle>
1. Introductory Renmrks
</SectionTitle>
    <Paragraph position="0"> ETAP-3 is a multilmrposc NIA ~ environmmlt that was conceived in the 1980s and has been worked out in the Institute for lnl~mnation Transmission Problems, Russian Academy of Sciences (Apresian et al. 1992, l?,oguslavsky 1995). The theoretical foundation of ETAP-3 is tile Meaning C/=&gt; Text linguistic theory by Igor' Mel'6uk and the Integral Theory of Language by Jurij Apresian.</Paragraph>
    <Paragraph position="1"> Eq'AP-3 is a non-comlnercial environment primarily oriented at linguistic research rather than creating a marketable software product. The main focus of the research carried ()tit with I';TAP-3 is COlnputational modelling of natural languages. This attitude explains our effort to develop the models in a way as linguistically sound as possible. We strive to incorporate into the system much linguistic knowledge irrespective of whether this knowledge is essential for better text processing (e.g. machine translation) or not. In particular, we want our parser to produce what wc consider a correct syntactic representation of tim sentence - first of all because we believe that this interpretation is a true fact about tile natural language. We have had inany occasions to set that in the long run the iheorctical soundness and completeness of linguistic knowledge incorporated in an NIA ) application will pay.</Paragraph>
    <Paragraph position="2"> All NLP applications in F, TAP-3 are largely based on an original system of three-wdue logic and use an original formal language of linguistic descriptions, I~'Oi&gt;&amp;;T.</Paragraph>
    <Paragraph position="3">  by the UN University, is discussed in detail in Section 3.</Paragraph>
    <Paragraph position="4"> 2.1.1. ETAP-3 MT System The most important module of ETAP-3 is the MT system that serves five language pairs:</Paragraph>
    <Paragraph position="6"> By far the most advanced are the first two of these pairs. The system disposes of 50,000strong so-called combinatorial dictionaries of Russian and English that contain syntactic, derivational, semantic, subcategorization, and collocational information. The system relies on comprehensive grammars of the two languages.</Paragraph>
    <Paragraph position="7"> For the other language pairs smaller scale prototypes are available.</Paragraph>
    <Paragraph position="8"> ETAP-3 is able to present multiple translations when it encounters an ambiguity it cannot resolve. By default, the system produces one parse and one translation that it considers the most probable. If the user opts for multiple translation, the system remembers the unresolved ambiguities and provides all mutuany compatible parses and lexical choices.</Paragraph>
    <Paragraph position="9"> To give one example from the real output: the sentence They made a general remark that ....</Paragraph>
    <Paragraph position="10"> when submitted to the multiple translation option, yielded two Russian translations that correspond to radically different syntactic structures and lexical interpretations: (a) Oni sdelali obshchee zameehanie, chto... (= They made some comn-lon renlark that ...) and (b) Oni vynudili generala otmetit; chto... (= They forced some general to remark that ...).</Paragraph>
    <Paragraph position="11"> 2.1.2. Natural Language Interface to SQL Type</Paragraph>
    <Section position="1" start_page="83" end_page="83" type="sub_section">
      <SectionTitle>
Databases
</SectionTitle>
      <Paragraph position="0"> This ETAP-3 module translates freely worded human queries to a database from Russian or English into SQL expressions. It can also produce the reverse generation of a NL query from an SQL expression.</Paragraph>
      <Paragraph position="1"> 2.1.3. System of Synonymous Paraphrasing The module is designed for linguistic experiments in obtaining nmltiple meaningretaining paraphrases of Russian and English sentences. The paraphrasing is based on the concept of lexical functions, one of the important innovations of the Meaning C/=&gt; Text theory. The following example shows the kind of paraphrases that can be produced by the module: (1) The director ordered John to write a report The director gave John an order to write a report - John was ordered by the director to write a report - John received an orcler fonn the director to write a report.</Paragraph>
      <Paragraph position="2"> It is a very promising direction of linguistic research and developlnent that can be applied in a wide range of activities, including language learning and acquisition, authoring, and text planning. Besides that, lexical functions are used for ensuring adequate lexical choice in machine translation and in the UNL module.</Paragraph>
      <Paragraph position="3">  The module operates with Russian texts in which it finds a wide range of errors in grammatical agreement as well as case subcategorization and offers the user the correct version.</Paragraph>
      <Paragraph position="4"> 2.1.5. Computer-Aided Language Learning Tool The module is a standalone software application constructed as a dialogue type computer galne intended for advanced students of Russian, English, and German as foreign languages who wish to enrich their vocabulary, especially to master the collocations of these natural languages and their periphrastic abilities. The tool relies on the apparatus of lexical limctions. It can also be used native speakers of the three languages interested in increasing their command of the vocabulary (such as journalists, school teachers, or politicians).</Paragraph>
      <Paragraph position="5"> 2.1.6. Tree Bank Workbench This is the module that utilizes the ETAP-3 dictionaries, its morphological analyzer and the parser to produce a first-ever syntactically tagged corpus of Russian texts. It is a mixed type application that combines automatic parsing with human post-editing of tree structure.</Paragraph>
    </Section>
    <Section position="2" start_page="83" end_page="85" type="sub_section">
      <SectionTitle>
2.2. Major Features
</SectionTitle>
      <Paragraph position="0"> The following ate the most important features of the whole ETAP-3 environment and its modules:  Ill tile current version of ETAP-3, its modules that process NL senteuces are strictly rule-based. However, ill a series of recent experiments, tile MT module was supplenlenled by all example-based component of a translation menlory type and a statistical component that provides semiautonmtic extraction of translatiou equivalents tY=om bilingual text corpora (see lomdin &amp; Streiter 1999).</Paragraph>
      <Paragraph position="1"> ETAP-3 shares its stratificational feature with many other NLP systems. It is at tile level of tile normalized, or deep syntactic, structure that tile transfer flom tile source to tile target language takes place in MT.</Paragraph>
      <Paragraph position="2"> ETAP-3 makes use of syntactic dependency trees for sentence structure representation instead of constituent, or phrase, structure. Tile ETAP-3 system takes a lexicalistic stand ill tile sense that lexical data are considmed as important as gl'ammar infornlation. A dictionary entry contains, in addition to tile lemma name, information on syntactic and semantic features of tile word, its subcategorization flame, a default translation, and rules of various types, and wdues of lexical functions for which tile lemma is tile keyword. The word's syntactic t'eatures characterize its ability/nou-ability to participate ill specific syntactic constructions. A word can have several syntactic features selected from a total of more than 200 items. Semantic features arc needed to check tile semantic agreement between the words ill a sentence. Tile subeategorization frame shows the surface marking of tile word's arguments (in terms of case, prepositions, conjtmctions, etc.). Rules are an essential part of the dictionary entry. All the rules operating in ETAP-3 are distributed betwecn tile granmmr and tile dictionary. Grammar rules me more general and apply lo large classes of words, whereas tile rules listed or simply referred to in the dictionary are restricted ill their scope and only apply to small classes of words or even individual words. This organization of tile rules ensures the self-tuning of tile system to tile processing of each particular senteuce. In processing a sentence, only those dictionary rules are actiwlted that are explicitly referred to ill the dictionary entries of tile words making up tile sentence. A sample dictionary enlry fl'agment for tile English noun chance illustrates  introduces a semantically empty conjunction (that: a chance that we obtain a grant).</Paragraph>
      <Paragraph position="3"> Line \[40\] - a reference to the rule which introduces particle to (a chance to win).</Paragraph>
      <Paragraph position="4">  To give a general idea of how the ETAP-3 NLP operates, we show here the layout of the MT module (Fig. I). In a way, all the other modules can be viewed as this module's deriwttives.</Paragraph>
      <Paragraph position="5">  The ETAP-3 environment has been implemented on a PC under Windows NT 4.0 environment.</Paragraph>
      <Paragraph position="6"> The environment has a number of auxiliary tools, including a sophisticated lexicographer's toolkit that allows the developers and the users to effectively maintain and update the ETAP-3 dictionaries.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="85" end_page="87" type="metho">
    <SectionTitle>
3. The UNL Interface
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="85" end_page="86" type="sub_section">
      <SectionTitle>
3.1 Aims and scenario
</SectionTitle>
      <Paragraph position="0"> The UNL project has a very ambitious goal: to break down or at least to drastically lower the language barrier for the Internet users. With time and space limitations already overcome, the Internet community is still separated by language boundaries. Theoretically, this seems to be the only major obstacle standing in the way of international and interpersonal communication in the information society. This is why the problem of the language barrier on the Interact is perceived as one of the global problems of mankind, and a project aiming to solve this problem has been initiated under the UN auspices - by the Institute of Advanced Studies of the United Nations University.</Paragraph>
      <Paragraph position="1"> Started in 1996, the project curremly embraces 15 universities and research institutions fiom Brazil, China, Egypt, France, Germany, India, Indonesia, Italy, Japan, Jordan, Latvia, Mongolia, Russia, Spain, and Thailand.</Paragraph>
      <Paragraph position="2"> In the following years more groups are expected to join, so that in the long run all languages of the UN member states will be covered.</Paragraph>
      <Paragraph position="3"> The idea of the project is as follows. An interlingua has been developed which has sufficient expressive power to represent relevant information conveyed by natural languages. This interlingua entitled Universal Networking Language (UNL) has been proposed by H. Uchida (UNU/IAS). For each natural language, two systems should be developed: a &amp;quot;deconverter&amp;quot; capable of translating texts from UNL to this NL, and an &amp;quot;enconverter&amp;quot; which has to convert NL texts  into UNL. it sholtld be emphasized that the procedure of producing a UNL text ix not supposed to be fully autolnatic. It will be an interactive process with the labor divided between the COlnputer and a human expert (&amp;quot;writer&amp;quot;) in UNI+.</Paragraph>
      <Paragraph position="4"> This paradigm makes UNL radically different from conventional machine lranslation. Duo to the interactive oncoilversion, the UNL expression, which serves as inpul for generation, can be nlado as good as Clio wishes. The UNL writer will edit the rough result proposed by the OllConvorlor, corfect its errors, eliminate the renlaining ambiguities. He/she can run a deconvorior of his own language to lest the wtlidity of the UNL expression obtained alld then refine it again tin one is fully satisfied with the l'inal result.</Paragraph>
      <Paragraph position="5"> Anolhor ilnl)oriant distinction l'roill MT systonis is thai lhe inlorlirigua roprosenhilion of texts will be created and stored irrespectively of ils goiloration into particular languages. UNL Call be soon as all independent i-ileal-iS of iYloanillg ropreselllation. UNL doctlmonts Call 13e processed by indexing, retrieval and knowledge extraction tools without being converted to llattll'al lallguages. Gellcration \viii only be needed when the document has roached the htllll;_lll HSOl +.</Paragraph>
      <Paragraph position="6"> A doconvoftor and an enconvoi'tor for each lliligtlagC form ii IAlnguago Server residing in the hilernot. All language scrvolS will be cotlnoclod in the IJNL network. They \viii allow ally IlliOiTiOt user to doconvorl a UNI, docunleili found on the web into his/her native language, as well as to produce UNI, represelltatiOllS of the texts he/she wishes to nlako available to inultiethnic public.</Paragraph>
    </Section>
    <Section position="2" start_page="86" end_page="87" type="sub_section">
      <SectionTitle>
3.2 UNL language
</SectionTitle>
      <Paragraph position="0"> We cannot describe the UNL language here in all details: this topic deserves a special paper which will hopefully be written by the author of the language design - l)r. Hiroshi Uchida. We will only characterize it to the extent necessary for the description of our deconversion module.</Paragraph>
      <Paragraph position="1"> Full specification of UNL can be found at /lllp://WWW. tml. tax. ttnu. edu/.</Paragraph>
      <Paragraph position="2"> UNL is a comlmter language intended to represent infolmation in a way that allows to generate a text expressing this information in a very large number of nahtral languages. A UNL expression is an oriented hyper-graph that corresponds to a NL sentence in the amount of information conveyed. The arcs of t11o graph are interpreted as senmntic relations of the type agent, ob.ject, lime, place, inslrttment, manlier, etc. The nodes of the graph are special units, the so-called Universal Words (UW) interpreted as concepts, or groups of UWs. The nodes can be supplied with attributes which provide additional information on their use in lhc given sentence, e.g. @imperative, @generic, @future, @obligation.</Paragraph>
      <Paragraph position="3"> Each UW is represented as an t~,nglish word that can be optionally supplied with semantic specifications to restrict its meaning. In most cases, these specifications locate the concept in the knowledge base. It is done in the following way: UW A(icl&gt;B) ix interpreted as 'A is subsumed under the category B'. For example, the UW coach used without any restrictions denotes anything the English coach can denote, ll' eric wants to be more precise, one can use restrictions: coach(icl&gt;transl)ort ) denotes a bus, coaclz (icl&gt;lmman) denotes a trainer and coach (icl&gt;do) denotes the action of training, in a sense, the apparatus of restrictions allows to represent UWs as disambiguated l';nglish words. On ltle other hand, restrictions allow to denote concepts which are absent in I~;nglish. For cxmnple, in Russian there is a large group of motion words, whose meaning incorporates the idea of the mode of locomotion or tral/sportation: priletel' 'come by flying', prO@,/' 'come by ship', l)ril)olzti 'come by crawling', l)ril)eJlal ' 'come running', elc.</Paragraph>
      <Paragraph position="4"> l!nglish has no neutral words to denote these concepts. Still, on the basis of English one can constrttct lJWs that approximate required concepts, e.g. conw(met&gt;shil) ) is interpreted as 'come and the method o1' coming ix a ship'.</Paragraph>
      <Paragraph position="5"> IIere is an example of a UNL expression for the sentence (2) Howevel, hmgua,q,e dll/ferences are a barrier to the smoot/L/low of in.fomnation in our society. l';ach line is an expression of the kind rehttion(UWl, UW2). For simplicity, UWs are not supplied with restrictions.</Paragraph>
      <Paragraph position="6"> aoj (barrier. @entry. @present. @indef. @however, difference. @pl) rood(barrier. @entry. @present. @indef. @ however, Ilow. @dcl) mod(differencc.@pl, language) aoj(smoofli, flow. @del) meal(flow. @def, in fol'nmtion) scn(fk+w. @dcl, society) pos(society, we) P, ehttions used: ao i a relation that holds between a thing and its state, nmd - a relation  between a thing and its modifier, scn- a relation between an event or a state and its abstract location, pos - a relation between a thing and its possessor. Attributes: @entry - denotes the top node of the structure, @present - present tense, @def - definite NP, @pl - plural, @however - a modal meaning corresponding to English however.</Paragraph>
      <Paragraph position="7"> 3.3. UNL - Russian deeonversion by means of ETAP-3 As was shown in Section 1, ETAP-3 is a transfer-based system where the transfer is carried out at the level of the Normalized Syntactic Structure (NormSS). This level is best suited for establishing correspondence with UNL, as UNL expressions and NormSS show striking similarities. The most important of theln are as follows: 1. Both UNL expressions and NormSSs occupy an intermediate position between the surface and the semantic levels of representation. They roughly correspond to the so-called deep-syntactic level. At this level the meaning of the lexical items is not decomposed into the primitives, and the relations between the lexical items are  while a node of a UNL expression can be a subgraph; null 2. Nodes of a NormSS always correspond to one word sense, while UWs may either be broader or narrower than the corresponding English words: 2.1. they can cover a meaning area that corresponds to several different word senses at a time (see above); 2.2. they can correspond to a fi'ee word combination (e.g. computer-based or highquality); null 2.3. they can correspond to a word form (e.g. best which a form of good or well); 2.4. they can denote a concept that has no direct correspondence in English (see above).  3. A NormSS is the simplest of all connected graphs - a tree, while a UNL expression is a hyper-graph. Its arcs may form a loop and connect sub-graphs; 4. The relations between the nodes in a NormSS are purely syntactic and are not supposed to convey a meaning of their own, while the UNL relations denote semantic roles; 5. Attributes of a NormSS mostly correspond to  grammatical elements, while UNL attributes often convey a meaning that is expressed both in English and in Russian by means of lexical items (e.g. modals);</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="87" end_page="88" type="metho">
    <SectionTitle>
6. A NormSS contains information on the word
</SectionTitle>
    <Paragraph position="0"> order, while a UNL expression does not say anything to this effect.</Paragraph>
    <Paragraph position="1"> The NormSS of tile sentence (2) looks as follows: be, present  As UNL makes use of English lexical labels, it is expedient to bridge the gap between UNL and Russian via English NormSS which actually serves as an Intermediate Representation (IR). in this case tile UNL - Russian interface will be the simplest. After the English NormSS has been reached, conventional ETAP English-to-Russian machine translation mode of operation can be used.</Paragraph>
    <Paragraph position="2"> The UNL-to-Russian module carries out the following three steps:  1. Transfer from UNL to the intermediate representation (IR).</Paragraph>
    <Paragraph position="3">  2. Transfer fronl tile IR to tile Russian ilOllnalized syntactic structure (NorlriSS-1)@ 3. (\]eneration of a P, ussian sentence from the  NornlSS-R.</Paragraph>
    <Paragraph position="4"> Tile archilecture of tile UNL-Russian deconverter is shown in Fig. 3.</Paragraph>
    <Paragraph position="5"> It follows fi'om tile previous discussion that the UNL - NormSS interface should solve the following five tasks: 1. An appropriate English lexeme for every UW should be selected where it is possible; a Russian lexeme will be provided by tile ETAP English - Russian transfer dictionary. If no appropriate English word can be found for a UW, other means of expression should be found.</Paragraph>
    <Paragraph position="6"> 2. UNL syntactic relations should be tl-anslated, either by means of I~q'AP relations or widl tile help of lexical items. 3. UNL attributes should be translated, either by lneaus of granunatical features or with the help of lexical items (e.g. @however however). null 4. UNL graph should be converted in a tree. 5. Word order should be established.</Paragraph>
    <Paragraph position="7"> The first aild (parlly) the second tasks are soh, ed by uleaus Of the infornlatiou stored in the UW English and English conlbinalorial dictionaries. All lhe rest (tasks 2 io 5) is done by the rules written in the logical-based I~'OP, tZT formalism.</Paragraph>
    <Paragraph position="8"> Let us give one example lo ilhlstrate the transformation of UNL relations into NL words. UNL has a tim relation that holds between an event and its linle. As is known, lhe choice of approl)riaie words to express lhis relation is to a largo oxleni doterilliried by lexical properties of tile word denoting tilne; cf. oz._It Moll(lay, at midnight, idAl summe#; rhtri, e~ the it,at; etc. In ETAP-3 all these cases are treated as tile lexical function LOC denoting (tenlporal) locality (on lexical functions see 2.1.3). Tile values of all lexical fimctions are given in the lexicon in the entries of their arguments (see an example in 2.2 above). While processing tile UNL expression, the tim relation is linked to the lexical ftluclioll LOC which allows to l'iud a correct preposition, both in English and in Russian.</Paragraph>
    <Paragraph position="9"> 3.4. Current state and prospects for the future Tile module of Russian deconversiou is operational and can be tested at  hitp://proling.iitp.ru/Ooco. We plan to put it to geUela\] rise by aulunlu 2000. Tile interactive enconvorsion n\]odulo will be our next concorll. As sllo,vn ill Fig. 3, the interface botweou UNL and Russian is established at tile level of the English NorlllS . At this point ETAP English-to-Russian nlachine Iranslation facility can be switched which carries through tile phases of transfer and Russian generation. This architecture allows to obtain English generalion for relatively cheap, as ETAP has a Russiau-to-English mode of operation as well. First experiments in this direction have been carried Otll which proved quite promising.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML