XML Viewer - a88-1004

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1004_metho.xml
Size: 17,471 bytes
Last Modified: 2025-10-06 14:12:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1004">
  <Title>TIHE PRESENT :SYHPTOH (*PAIN :LOCATION (*BOOY-PART :NAHE *THROAT</Title>
  <Section position="4" start_page="0" end_page="25" type="metho">
    <SectionTitle>
3 The basic machinery
</SectionTitle>
    <Paragraph position="0"> The SEMSYN generator is organized into two major modules: * the generator kernel or 'realization component' and * the front end generator or 'morpho/syntactic component'.</Paragraph>
    <Paragraph position="1"> We will have a closer look at the operation of these modules now.</Paragraph>
    <Section position="1" start_page="0" end_page="25" type="sub_section">
      <SectionTitle>
3.1 The generator kernel
</SectionTitle>
      <Paragraph position="0"> The generator kernel starts from a semantic representation, i.e. a 'message' in the sense of \[McDonald et al. 87\]. Its task is to 'realize' the  message, i.e. to decide how its content may be expressed in natural language: * What is the adequate syntactic form for the utterance as a whole? * How should the subparts of the conceptual representation be realized and integrated into the utterance? * What are appropriate lexicalizations - as lexemes or whole phrasal structures of the target language - for the elements of the message?</Paragraph>
    </Section>
    <Section position="2" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
3.2 The linguistic representation
</SectionTitle>
      <Paragraph position="0"> The output of the generator kernel is a functional grammatical structure. This linguistic representation fully specifies the intended utterance: null * the syntactic category of the whole utterance and the grammatical functions and syntactic categories of all subparts, * the syntactic features of the head of each syntactic entity, * the lexemes or special lexical items marked with category information like :*PN for proper names or :*NC for noun compounds - to be used.</Paragraph>
    </Section>
    <Section position="3" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
3.3 The front end generator
</SectionTitle>
      <Paragraph position="0"> The functional grammatical structures produced by the generator kernel are input to the front end generator. This module has to execute all syntactic and morphological processes that are necessary to produce the corresponding surface string. This involves:  * linearization, i.e. constituent ordering, * agreement handling, * inflection.</Paragraph>
      <Paragraph position="1">  The need for an explicit linguistic representation of the intended utterance and a separate final processing step is especially obvious for highly inflectional languages with a rich repertoire of agreement phenomena (e.g. French, German).</Paragraph>
    </Section>
    <Section position="4" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
3.4 Examples
</SectionTitle>
      <Paragraph position="0"> 3.4.1 Frame structures as semantic representation null SEMSYN's generator kernel expects its input in a frame notation. Although there are minor variations between the different applications the basic format is fixed: frame structures consisting of a 'semantic symbol' as name and named roles or slots with - recursively frame structures as fillers.</Paragraph>
      <Paragraph position="1"> An example of a case frame:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="25" end_page="25" type="metho">
    <SectionTitle>
(GENERATE
:AGENT (PROJECT :NAME (:*PN SEMSYN))
:OBJECT (LANGUAGE :ATTRIBUTES GERMAN))
</SectionTitle>
    <Paragraph position="0"> Here the toplevel frame structure contains the semantic symbol 'GENERATE' and has two filled roles :AGENT and :OBJECT as further information.</Paragraph>
    <Section position="1" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
3.4.2 A realization result
</SectionTitle>
      <Paragraph position="0"> When the generator kernel realizes this case frame as a clause in active voice this results in the following functional grammatical structure: null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="25" end_page="26" type="metho">
    <SectionTitle>
3.5 Object-oriented implementa-
</SectionTitle>
    <Paragraph position="0"> tion of realization knowledge The main features of the object-oriented paradigm that we exploited for the implemen-I English glosses added as convenience for the reader.  tation of realization knowledge in the generator kernel are * hierarchy as organisation principle for the knowledge base and * message passing between objects as primary control structure.</Paragraph>
    <Paragraph position="1"> The specialization hierarchy used is rooted in a general class that defines the basic methods for realization (KBS-Schema). On the next level are general classes for  These classes differ with respect to the possible realizations of their instances: * concept-schemata allow only realizations as noun groups * case-schemata allow for various clausal forms (active, passive, topicalized) as well as nominalized forms * subclasses of relation-schema incorporate knowledge about realization possibilities for (more complex) semantic relations like the relation between :MEANS and :PUR-POSE, :REASON and :RESULT etc.</Paragraph>
    <Paragraph position="2">  The semantic representation of a summary of Macbeth may contain the following frame structure:</Paragraph>
  </Section>
  <Section position="7" start_page="26" end_page="28" type="metho">
    <SectionTitle>
(REASON - FOR
: RESULT
(MURDER :AGENT MACBETH :OBJECT DUNCAN)
: REASON
(PERSUADE
:AGENT (LADY-MACBETH :SPECIALIZE AMBITIOUS)
:OBJECT MACBETH))
</SectionTitle>
    <Paragraph position="0"> One possible way to express this relation is to realize the fillers of :REASON and :RE-SULT as clauses and add the clause from :REASON as a subordinate to that of :RE null In the meantime improved and extended versions of the SEMSYN generation system have been applied to quite a variety of input struc- null tures and generation tasks: * machine translation applications: - Generation of German from (handwritten) semantic structures proposed for use within EUROTRA \[Held, RSsner, Weck 87\] - Generation of German sentences in the domain of doctor/patient communication from semantic structures produced from Japanese and English by CMU's Universal Parser \[Tomita, Carbonell 86\] * Text generation: - SEMTEX: generation of news stories  from statistical data \[R5sner 87\] -GEOTEX: generation of descriptive texts for geometric constructions \[Kehl 86\] Although the basic design of the generator \[R5sner 86b\] proved to be flexible enough and could remain untouched each of these applications has led to additional features of the whole system.</Paragraph>
    <Section position="1" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
4.1 MT applications
</SectionTitle>
      <Paragraph position="0"> In the first application of the system we started from semantic representations derived from titles of Japanese papers in the field of information technology. Titles are in most cases noun groups. In order to generate German equivalents we had to provide the prototype primarily with knowledge about German noun group structures. On the other hand, for many of these semantic structures clausal forms were possible as well. We therefore provided the system with &amp;quot;stylistic&amp;quot; switches that allowed the alternative generation of clauses from case frames as well.</Paragraph>
      <Paragraph position="1">  tion The sample of semantic structures in this experiment was taken from doctor/patient communication. null The semantic structures produced by CMU's parsers for Japanese and English are basically case frames, but include syntactic information as well (e.g. about :MOOD or :TIME). The fragment of German generable by the SEMSYN system was extended by yes/no-questions and imperatives.</Paragraph>
      <Paragraph position="2"> An example: English input to CMU's parser: &amp;quot;i have a pain in the throat&amp;quot; Semantic structure as input to SEMSYN:  In order to support the EUROTRA-D group, we ran this experiment: A sample of semantic structures as proposed for use within ECIROTRA \[Steiner 86\] should serve as input to our generator.</Paragraph>
      <Paragraph position="3"> This experiment was interesting under various aspects: * The semantic representation used is based on systemic grammar; since the classes used are already hierarchically structured it was relatively easy to implement them as a FLAVOR hierarchy of realization classes.</Paragraph>
      <Paragraph position="4"> * The sample of semantic structures was chosen to cover the complete list of German sentential types from a textbook \[Helbig, Buscha 86\]. In order to be able to generate all of these surface forms we had to further enrich the generable fragments with e.g.</Paragraph>
      <Paragraph position="5">  - infinitival complements - genitive objects - subject and object clauses.</Paragraph>
    </Section>
    <Section position="2" start_page="27" end_page="28" type="sub_section">
      <SectionTitle>
4.2 Text generation
</SectionTitle>
      <Paragraph position="0"> SEMTEX starts from mere labor market data, extract a list of semantic representations from them as &amp;quot;text plan&amp;quot; and then converts this list into texts like the following: &amp;quot;Die Zahl der Arbeitslosen in der Bundesrepublik Deutschland ist im Dezember spllrbar angestiegen.</Paragraph>
      <Paragraph position="1"> Sic hat yon 2210700 auf 2347100 zugenommen. Die Arbeitslosenquote betrug Ende Dezember 9.4 Prozent.</Paragraph>
      <Paragraph position="2"> Sic hatte sich Ende Dezember des letzten Jahres auf 9.3 Prozent belaufen. Der DGB hat erkllirt, er sehe in der Vergriil3erung der Arbeitslosenzahl ein negatives Zeichen.&amp;quot; The main concern in implementing SEMTEX has been to provide the SEMSYN generator with mechanisms that keep track of previous generation decisions thus creating a representation of the textual context built up by the already uttered sentences. This context is used: * to avoid repetition in wording, * to deliberately elide information still valid (e.g. about the time period concerned), * to decide on pronominalisation and other types of reference.</Paragraph>
      <Paragraph position="3"> In addition a representation of the temporal context is used * to dynamically determine grammatical tense and * to produce appropriate natural language descriptions for the time units mentioned \[R6sner 86b\].</Paragraph>
      <Paragraph position="4">  In the GEOTEX application the SEMTEX text generator is combined with a tool for interactively creating geometric constructions \[Kehl 86\]. The latter offers formal commands for manipulating (i.e. creating, naming and deliberately - deleting) basic objects of Euclidean geometry. The generator is used to produce descriptive texts related to the geometric construction: * descriptions of the geometric objects involved, null * descriptions of the sequence of steps done during a construction.</Paragraph>
      <Paragraph position="5"> Verbalizing the course of a construction: When GEOTEX is describing the course of a construction in a concise and coherent text it starts from the sequence of commands of the geometry language. Let us look at an example:</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="28" end_page="28" type="metho">
    <SectionTitle>
GEOTEX
</SectionTitle>
    <Paragraph position="0"> * to update the associated FLAVOR representation for the domain, * to display (if possible) the objects on the screen (in this case: point $A with co-ordinates (15, 10), point $B with coordinates (20, 7), circle $K with center $B and through $A), * to create a message from the operation and give it as input to SEMTEX.</Paragraph>
    <Paragraph position="1"> SEMTEX renders this information in the order given. For the example this resulted in the following text:  To achieve this result SEMTEX' contexthandling mechanisms have been enriched: Elision is no longer restricted to adjuncts. For repetitive operations verb and subject will be elided in subsequent sentences (cf. the sentences 1 and 2).</Paragraph>
    <Paragraph position="2"> The distinction between known information (i.e. known geometric objects) and new one (i.e. new objects created from known ones) is exploited to decide on constituent ordering: the constituent referring to the known object is &amp;quot;topicalized&amp;quot;, i.e. put in front of the sentence (cf. sentence 3).</Paragraph>
    <Paragraph position="3"> In addition the system allows for more ways to refer to objects introduced in the text: pronouns, textual deixis using demonstrative pronouns (&amp;quot;dieser Punkt&amp;quot;,this point), names. The choice is done deliberately: Pronouns are avoided if their use might create an ambiguity; reference by name is used when an object has not constantly been in focus and therefore has to be re-introduced.</Paragraph>
  </Section>
  <Section position="9" start_page="28" end_page="29" type="metho">
    <SectionTitle>
5 SEMSYN's Software Envi-
</SectionTitle>
    <Paragraph position="0"> ronment SEMSYN's generation system has been implemented on a SYMBOLICS lisp machine. During the implementation we aimed at utilizing as much of the functionality of this machine in order to get optimal support for our work. We have built up an environment of linguistic and software tools that, though designed for our projects purposes, may be - at least in part - of interest for other projects in MT and CL in general. 2</Paragraph>
    <Section position="1" start_page="28" end_page="29" type="sub_section">
      <SectionTitle>
5.1 Interface tools:
</SectionTitle>
      <Paragraph position="0"> This comprises all software that provides easy and comfortable communication with the system (even for casual users).</Paragraph>
      <Paragraph position="1"> SEMSYN's user interface is centered around SEMNET-GRAPHICS, a tool for visualizing semantic nets - the starting point of the generation - as mouse-sensitive graphics \[R6sner 86b\]. The graphical representation is embedded in an interface &amp;quot;frame&amp;quot; 2 These tools are best illustrated by an interactive demo.  \[Weinreb, Moon 81\] whose &amp;quot;panes&amp;quot; are displaying various intermediate structures - depending on the users chosen &amp;quot;frame configuration&amp;quot; - and the generation result.</Paragraph>
    </Section>
    <Section position="2" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
5.2 Experimentation tools:
</SectionTitle>
      <Paragraph position="0"> These tools extend the capabilities of the user interface and are intended to enable and support experiments with the system.</Paragraph>
      <Paragraph position="1"> SEMNET-EDIT is a tool for experimenting the generator by interactively editing semantic nets \[Kehl 85\]: * modification of given semantic nets * creation of semantic nets from scratch * generation of German from created or modified semantic nets and/or their subnets. null Experimentation tools of this type are not only useful for purposes of debugging and system improvement but proved as well to be very helpful as comfortable means for introduction into the system's capabilities and limitations.</Paragraph>
    </Section>
    <Section position="3" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
5.3 Lexicon tools:
</SectionTitle>
      <Paragraph position="0"> In every realistic application dictionaries play an important role as body of linguistic knowledge; the need for support in maintaining and updating them seems obvious.</Paragraph>
      <Paragraph position="1"> SEMSYSTEM uses two types of dictionaries: A single German root form dictionary (with morpho/syntactic information) for the generator front end and so-called &amp;quot;realization dictionaries&amp;quot;, that relate semantic symbols to German lexical items (root forms of verbs, nouns, adjectives .... ) and that may vary for different applications of the generator. For both types of lexica there are window- and menu-based tools for maintenance.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="29" end_page="29" type="metho">
    <SectionTitle>
6 Prospects: From mono- to
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
multilingual generation
6.1 Teaching English to the system
</SectionTitle>
      <Paragraph position="0"> In a recent experiment 3 we changed and extended our generator system in such a way that 3This work is done in collaboration with Odyssey Research Associates, Ithaca, N.Y.</Paragraph>
      <Paragraph position="1"> - using the same representation for the different domains - the texts of SEMTEX and GEOTEX may be produced in English as well.</Paragraph>
      <Paragraph position="2"> A system produced example text from the newspaper application: Increase in the number of unemployed. null NURNBERG/BONN (cpa) DECEM-BER 5,85. The number of unemployed in West Germany has increased slightly during November.</Paragraph>
      <Paragraph position="3"> It has increased from 2148800 by 61900 to 2210700. At the end of November the unemployment rate had a value of 8.8 percent. At the end of the year-ago period it had a value of 8.7 percent. Gerd Muhr, the speaker of the DGB, declares, it sees a bad sign in the increase in the number of unemployed.</Paragraph>
      <Paragraph position="4"> French will be the next target language; we have started to prepare the morphological and syntactic data for such an experiment.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML