File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1432_metho.xml
Size: 10,423 bytes
Last Modified: 2025-10-06 14:15:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1432"> <Title>SYSTEM DEMONSTRATION OVERVIEW OF GBGEN*</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> SYSTEM DEMONSTRATION OVERVIEW OF GBGEN* </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="290" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> This paper presents an overview of the GBGen system, a sentence realizer currently developped for French. The system is strictly deterministic, i.e. it maps semantic structures to surface forms without either simulating parallelism or using backtracking, and the performances are accordingly extremely satisfying. It is procedural, based on Government & Binding Theory (Chomsky 1981): several levels of syntactic representation are defined, on which configurational searches and transformations apply.</Paragraph> <Paragraph position="1"> GBGen is large-scale, based on a lexicon of approximately 185.000 entries (more or less 24.000 lexemes together with inflected word forms). The system covers simple and complex sentences, complex grammatical phenomena like unbounded dependencies, raising and control structures, intrasentential coreference, cliticization , modifiers (both clausal and prepositional) and main cases of coordination. It also computes Several morphosyntactic phenomena hke agreement, contractions or pronoun lexicalization. In what follows, we present the general characteristics of the software and detailits majors components.</Paragraph> <Paragraph position="2"> 2: Overview of the system Two main components form the GBGen system. The pseudo-semantic component, which defines the semantic input of the generation process and the syntactic component, which produces a sentence (in written or spoken format) from the pseudo-semantic specifications. We describe the main aspects of these components in the following sections.</Paragraph> <Paragraph position="4"> The input of the generation process is dubbed Pseudo-Semantics. A pseudo-semantic structure (PSS) contains both lexical and abstract information (whence the term pseudo). A PSS can be one of the following four semantic objects: CLS, DPS, SLS and CHS.</Paragraph> <Paragraph position="5"> CLSs (clause structures) represent events and states. They contain a predicate (usually a verb or an adjective), functional information such as Tense and Aspect, * and other PSS objects that participate in the interpretation of the CLS (e.g., elements bearing the thematic roles assigned by the predicate, etc.). DPSs (DP structures) semantically characterize noun phrases. They consist *The GBGen project is supported by grant n o 12-50797.97 from the Fonds National Sulssc pour la Recherche Sclentifique. We are grateful to the members of the LATL, especially Eric Wehrli and Christopher Laenzlinger, for comments and feedback du~\]ng the development of the system. Special thanks are due to Juri Mengon, whose work since he joined the project has led to major developments of the system.</Paragraph> <Paragraph position="6"> of a nominal Property along with a semantic Operator, phi-features, and a referential index used for Binding resolution. SLSs (Semantic Label Structures) consist of a semantic label/function and an associated PSS. Roughly, these objects are used to characterize thematic-role bearing elements, modifiers, or the semantic function of adverbs and adjectives. Finally, CHSs (Characteristic Structures) are used to represent adjectives and adverbs. All these elements can be combined to obtain the desired semantic representation, but can also be used autonomously (a useful characteristic for the use of pseudo-semantics for machine translation).</Paragraph> <Paragraph position="7"> As an illustration, the (slightly simplified) PSS for the sentence (la) is (lb): Let us briefly detail the components of the above PSS. The main object is a CLS with the predicate kill. Tense is represented through a modified version of Reichenbach's analysis (\[Reichenbach 47\]), where E is the event time point and S the speech time point, the two points being either equal or ordered with a precedence relation. Combining Tense with non-lexical aspect (progressive, perfective) leads to verbal tenses. The other functional information states that the sentence to be generated is a declarative, positive and passiveone. The other elements that form part of the event are (unorderly)'listed in the Satellites list. The first one is an SLS with a thematic role Theme and a DPS bearing this role. The DPS has a lexical Property dog and an Operator some_ individual (the interpretation of DPSs follows the generalized quantifiers analysis, see \[Barwise & Cooper 81\]). A CHS appears in the Satellite list of the DPS, restricting the set denotation of the property. The second SLS in the above representation contains a semantic label Eval truth and an &quot;adverbial&quot; CHS. The label states that the semantic function of the CHS is an evaluation of the truth of the statement expressed in the CLS. Finally, a spatial SLS is present in the Satellite list, with a spatial label In and a DPS with a lexical Property bed and an Operator demonstrative.</Paragraph> <Paragraph position="8"> Notice, to conclude this section, that the PSSs are not syntactic in nature. They are unordered, closed class elements are abstractly represented, and recursiveness in these structures represents no more than minimal semantic scope. Hence, the efficiency of the System does not come from the * :fact that the input contains syntactic information, but rather from the way syntactic realization is done.</Paragraph> <Section position="1" start_page="289" end_page="290" type="sub_section"> <SectionTitle> 2.2 Syntactic Component </SectionTitle> <Paragraph position="0"> The syntactic processing has three main steps. First, we map the pseudo-semantic information into a D-structure. This is achieved by the projection subcomponent. Briefly, each *element of the PSS which has a categorial feature (X=V,N,...) is mapped into a local tree, as in (2):</Paragraph> <Paragraph position="2"> The Head/Projection distinction should be seen as a convenient presentational device. Actually, a Projection is a record of the properties of the lexical item. Thus, combination of XPs to create bigger structures can be done by using properties of heads (e.g., subcategorization). Spec and Compl are ordered lists which serve to combine all the subtrees created in the projection component, according to the properties of the subtrees. To give a concrete example, assuming the PSS in (lb), the system creates the D-structure in (3): (3) \[ cP \[ TP {past} \[ vP \[ AdvP probably\] \[ v kill (perf.; passive) \] \[ DP a \[ NP \[ AdjP big\] dog\] \] \[AdvP \[Plain \[Dpthis \[Npbed\]\]\] \] \] \]\] CP is the top node of each sentence * and always takes a TP as its complement, which contains tense information and the subject of the clause in most cases (in the example, a passive sentence, the subject is omitted). VP contains the verb, so-called VP-adverbs in its Spec list, and complements/adjunct s in its Compl list. In our example, the latter list contains the theme noun phrase and an adjunct (marked with an AdvP), which is the prepositional phrase in this bed. Nous phrases are formed with an NP, which contains the noun, its complements and adjectives, and a DP, the projection Of determiners, which subcategorizes for NPs.</Paragraph> <Paragraph position="3"> Movement and Binding algorithms apply once the D-structure has been created. They merely consist in searches in the tree and the movement operation is the generic Move a instruction, familiar to GB practici0ners. In this respect, syntactic processing in the system is configurational. Going back to our example, the object* of the passive verbal form is moved to the first (Spec of) TP with finite T, leaving a coindexed empty category, and the obtained S-structure is the following one: ,</Paragraph> <Paragraph position="5"> Finally, we apply the morphological procedure, which computes agreement, selects the correct verbal inflected forms, and treats other phenomena like determiner contraction or pronoun lexicalization. In our simple example, we would obtain the final sentence in (al).</Paragraph> </Section> </Section> <Section position="3" start_page="290" end_page="290" type="metho"> <SectionTitle> 3 Concluding Remarks </SectionTitle> <Paragraph position="0"> GBGen is written in Modula-2, developed under Open VMS on a DEC-Alpha system, and also runs on PC-Windows. The system is being used (or will be used in the near future) in the following systems/projects: * ITS3 - a multilingual machine translation system \[Etcheg0Yhen g~ Wehrli 98\]. This system uses the IPS parser (\[Wehrli 92\]) to parse English, French, German or ItMian inputs and GBGen to generate into the target language. The French-to-French version of the system, used as a test tool for GBGen, is available on the web. 1 * CSTAtt-H speech to speech machine translation project? The aim of the project is to produce on line translation of dialogs in the domain of hotel reservation and travel information. GBGen takes as input the interlingua developed for the project and produces French spoken output. * GENE. This is the interactive version of GBGen, in which the user interactively creates pseudo-semantic inputs. The system will soon be part of the SAFRAN project (\[Hamel Wehrli 97\], \[ttamel & Vandeventer 98\]), a toomox for computer assisted language learning. We presented an overview of GBGen, a large-scale domain-independent syntactic generator. At present, the system covers a large part of French grammar artd deals with complex grammatical phenomena in a highly efficient way. The system is also strongly generic, which means that its extension to other languages should not require major changes in the procedures. A tentative orientation to English generation has shown that the system needs only small parametric variations in the procedures to generate major constructions of this language. Given the promising results of the approach to surface realization we have choosen, we will pursue the development of the GBGen system by extending its grammatical coverage and adding several languages to it.</Paragraph> </Section> class="xml-element"></Paper>