File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1022_metho.xml

Size: 14,554 bytes

Last Modified: 2025-10-06 14:12:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1022">
  <Title>A Preliminary Linguistic Framework for Eurotra in : Proceedings of the Conference on Theoretical and Nethodologtcal Issues In Machine Translation of Natural Languages</Title>
  <Section position="5" start_page="161" end_page="163" type="metho">
    <SectionTitle>
3. EUROTRPS'S LINGUISTICS
</SectionTitle>
    <Paragraph position="0"> The generators for the first small Implementation have been defined In such a way that they mirror traditional linguistic Ideas about analytical levels : morphology, surface syntax (Immediate constituents and syntactic functions), deep syntax and semantics. However, In order to offer a full treatment of all texts In our text type and domain without pre-edltXng, we also Included generators which cater for character normalIsatlon (in order for the dictionary to be Independent of typography) and text structure (e.g. lay-out, text format, figures, footnotes).</Paragraph>
    <Paragraph position="1"> At present the linguistic specifications define 6 levels In the Eurotra analysis and synthesis modules :  Figures 3 and 4 show an ECS, respectively IS, representation for the sentence : &amp;quot;The decision adopted by the Council on 25 April i983 was Implemented by the Member States In the course of i983&amp;quot;.</Paragraph>
    <Paragraph position="2"> The ECS representation of Figure 3 Is reduced structurally through two steps.First It ls translated Into the relational representation ERS which Identifies syntactic functions, elevates determiners, auxiliaries and valency bound prepositions and rearranges the constituents Into a canonical order.</Paragraph>
    <Paragraph position="3"> Then the relational representation Is translated Into the Interface structure IS whereby passive Is undone, empty elements are Inserted for oblxgtory arguments which are absent In the surface form and semantic Information Is added to the feature bundles.</Paragraph>
    <Paragraph position="4"> Note that, although the IS representation Is fairly simple from a structural point of view It still allows for modifier attachment at different levels of embedding, and thus, the two TIME constituents are attached to their proper governors without the use of complicated featurlsed references.</Paragraph>
    <Paragraph position="5"> All generators being described In the same formal language, the representations of text structure, normalXsed text and morphology are built by augmented CF rules, Just like the syntactic representations.</Paragraph>
    <Paragraph position="6">  on 1982.04. 25 1983 In course Flgure 4 : Example of an IS representation / The modular approach to linguistics, whereby text format, morphology, syntax and semantics are handled by separate generators, enhances the repairability and extenslblllty of the system. This means iThese trees are given In the form whlch ls output by our parametrlzable prettyprlnter with the parameter set to the &amp;quot;cat&amp;quot; feature In Figure 3 and to the &amp;quot;st&amp;quot; \[for semantic relation} feature In  feature information are much more complex and Impractical to read.</Paragraph>
    <Paragraph position="7"> for Instance, that changes of the ETS, ENT and EMS grammars may be made freely as long as they produce the output needed by ECS.</Paragraph>
    <Paragraph position="8"> The sequence of representations related by T-rules spans the distance between the actual text and the Interface structure (IS} which Is the beginning and end point of transfer.</Paragraph>
    <Paragraph position="9"> In order to simplify transfer, IS abstracts away from surface phenomena like Inflection, derivation, compounding, constituent structure etc. It contains semantically labelled arguments and modifiers related to a predicate, and the  transfer dictionaries, Ideally, Just relate lexlcal unlts of one IS to lexlcal units of another.</Paragraph>
    <Paragraph position="10"> In some cases, though, this does not worE, because of, e.g., lexlcal holes like the German word &amp;quot;Schlmmel&amp;quot; (meaning &amp;quot;white horse&amp;quot;} which has no lexlcal correspondent in English, or different functions mapping onto one another as the English predicate &amp;quot;liKe&amp;quot; which maps onto a 6erman adverbial modifier &amp;quot;gern&amp;quot;. In these cases explicit non-lexlcal T-rules must be written for transfer.</Paragraph>
    <Paragraph position="11"> The purpose of the IS experimentation In Eurotra Is precisely to minimize the number of expllclt transfer T-rules and the entire modular design as it has been proposed In the linguistic specifications Is primarily geared towards an experimentation process aimed at making It possible to reach an optimal IS through multiple cycles of prototyplng.</Paragraph>
  </Section>
  <Section position="6" start_page="163" end_page="163" type="metho">
    <SectionTitle>
4. EVALUATION OF Tim Y=~RE
</SectionTitle>
    <Paragraph position="0"> The architecture of our framework has given us full satisfaction with respect to Its modulaz'lty, 3Jmpilclty and the ease wlth which we could modify certaJn de31gn characteristic3 of the system, as described In Sectlon 5.</Paragraph>
    <Paragraph position="1"> Its modularity has made It possible to experiment with the interface structure quite Independently from the rest of the system. For Instance, research and experimentation about an adequate treatment of time and modalIty could be done In parallel with, and independently of, grammar Implementation work concerning other levels of representation.</Paragraph>
    <Paragraph position="2"> The simplicity of the generator's formalism has had positive and negative aspects. Amongst the former we can mention the fact that It was eaJY= to leaFn and teach, easy to r.odJfY= and a reasonably good communication tool for the scientists Involved In the definition and Implementation of the system.</Paragraph>
    <Paragraph position="3"> On the negative side we must mention the fact that It was not expressive enough IThls Is true even considering the fact that mechanisms for treating unbounded phenomena, for expressing rules In an ID/LP format and for built-In feature Inheritance were given second priority, and, therefore, were not implemented In the first version of the frameworK, nor In the first revision of the framework reported here.</Paragraph>
    <Paragraph position="4"> to describe In a natural way all the phenomena occurring In the languages we are treating / . Furthermore, for a system which requires the T-rules to be compositional, the operational semantics of generators turned out to be too poor, causing a proliferation of complex Trules. null For the Implementation worE, thls has caused some problems. For Instance, we couldn't Implement ETS, ENT and EMS In the first cycle because the prototype only allowed structure manipulation to happen at one level. This meant that we could not build text structure, morphological structure and syntactic structure Independently, they all had to be built by one generator, which then became very blg and difficult to manage.</Paragraph>
    <Paragraph position="5"> Another problem was that the analytical strategy of elevating functional elements like articles and prepositions, which Is motivated by the needs of MMT rather than by linguistic considerations, could not easily be reversed, because building new nodes In synthesis to represent these elements Is addition of structural Information.</Paragraph>
    <Paragraph position="6"> In consequence, we had to change the specifications of the virtual machlne In such a way that each generator had the power of completing a representational object according to Its own definition of well formed representation, thus alleviating the task of the translators. The modifiability of our original framework was Invaluable In this redesign since It allowed us to keep the core concepts basically unchanged while extending the functlonallty and expressiveness of the formalism.</Paragraph>
  </Section>
  <Section position="7" start_page="163" end_page="164" type="metho">
    <SectionTitle>
5. DHSIGN MODIFICATIONS TO THE PRAME~0RI
</SectionTitle>
    <Paragraph position="0"> As reported above, the first Implementation showed that two factors were responsible for the heaviness of the translators (not to be confused with transfer) : the relatively simple operational semantics of the generators comblned with the requirement that the translators be compositional.</Paragraph>
    <Paragraph position="1"> Rather than giving up compositionality, which guarantees a well defined relation between representations belonging to adjacent levels, we increased the power of the generators by modifying their operational semantics to Include a controlled form of addJtion of structural Information, rather than Just addition of feature Information.</Paragraph>
    <Paragraph position="2"> Translators were impoverished to the  point where they can now be defined by default almost everywhere by specifying correspondences between feature theories pertaining to adjacent generators.</Paragraph>
    <Paragraph position="3"> Special, exceptional cases of T-rules have still to be specified explicitly. .&amp;quot; In the modified framework, the representation output by a translator is completed by the target generator. For Instance, ECS will eventually expect as input from the preceding level, which Is EMS, a tree wlth the top node T(ext| branching Into CihapterL Se(cttons |and Piaragraphs|. The P nodes should then branch directly Into wordform nodes which dominate tree-structure representations of the morphological structure of each wordform. The ECS generator will then complete this tree by Inserting S nodes and nontermlnal categorlal nodes of the constituent structure representation of each sentence, resulting In a parse tree of the sentence.</Paragraph>
    <Paragraph position="4"> From our original framework we retain the architecture, i.e. breaking up the monollngual components of the system Into a sequence of generators whlch are related by translators. In principle the same generators are used for analysls and synthesis, but we don't know yet whether the non-default translators can also be used in both directions.</Paragraph>
    <Paragraph position="5"> The first implemented prototype based on the revised framework has been used for experiments wlth rewriting the grammars of the first Implementatlonal cycle, and these experiments have confirmed the assumption that recodlng of grammars does not pose special problems.</Paragraph>
    <Paragraph position="6"> Here It must be mentioned that an alternative modlfled framework has been proposed to overcome the Inadequacies discovered during the first cycle of Implementation, which departs more radically from the original one. The testing that this alternative revised framework has undergone has been more limited for various reasons. The merits and demerits of the two proposed revlslons will he assessed during the first quarter of i986 In a controlled experiment.</Paragraph>
  </Section>
  <Section position="8" start_page="164" end_page="165" type="metho">
    <SectionTitle>
7. CUHBKNTLY IMPLKIKNTBD STSTKN
</SectionTitle>
    <Paragraph position="0"> The majority of the implementation work to date has been carrled out within the original Eurotra framework leading to a system coverlng to varying degrees the orlglnal seven languages foreseen, that Is Danish, Dutch, English, French, German, Greek and Italian. Work done on Portuguese and Spanish Is scheduled In a different way, glven that Portugal and Spaln became members of the Community only in 1986. Analysis modules exist for all languages, generation for flve languages. Transfer components have been written for the following ten language  An average grammar has ca. 400 rules and 500 iexlcal entries |accounting for ca. 3000 full-word forms in moderately Inflected languagesL Before the end of 1987 It Is expected that all already implemented components will be recoded In the new formalism, that ten more language pairs will be added while the lexlcal and linguistic coverage will he Increased.</Paragraph>
    <Paragraph position="1"> Only the levels ECS, EBS and IS have been Implemented to date. Thls means that In the first Implementation each monollngual component works on the basis of a full-form dictionary, since the generator for morphology was given second priority, and accepts only single sentences as Input.</Paragraph>
    <Paragraph position="2"> The linguistic coverage lncludes maln clauses and relatlve clauses; all types of noun phrases; simple coordination In noun phrases; all verbal tenses (excluding modal constructions); possessive, relative, reflexive and Indefinite pronouns; all prepositional phrases; ad v e r b s; numerals; particles.</Paragraph>
    <Paragraph position="3"> Research and experimentation are still ongoing to Include ellipsis; modality; negation; scope; quantification; time-tense relation and pronoun resolution, but we do not expect to be able to do pronoun resolution on the basis of real world Knowledge in the near future.</Paragraph>
    <Paragraph position="4"> The implementation of the virtual machine is done mainly In C-Prolog. A new version of the software Including a relational data base for the coding and maintenance of the lexicon (to be extended to grammars) has Just been released.</Paragraph>
    <Paragraph position="5"> More details on the philosophy of the software construction can be found In \[53.</Paragraph>
    <Paragraph position="6"> A fragment of a sample grammar for English Is shown In Appendix A. Some examples of T-rules are given In Appendix B.</Paragraph>
    <Section position="1" start_page="165" end_page="165" type="sub_section">
      <SectionTitle>
Future Y=org
</SectionTitle>
      <Paragraph position="0"> By mid 1988 we plan to have a small scale, corpus based, prototype with a coverage of 2500 lexlcal entries per language for all the seven original languages together wlth all the 42 transfer components.</Paragraph>
      <Paragraph position="1"> In the shorter term, several additions to the new framework are foreseen, the most Important of which concern the possibility to express grammar rules and T-rules In an ID/LP type format \[4\] and a mechanism for treating unbounded phenomena.</Paragraph>
      <Paragraph position="2"> Current research topics In linguistics have been mentioned above. In relatlon to the framework, we are currently Investigating about a feature Inheritance mechanism and some restricted form of complex features other than the set valued features mentioned In Section</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML