File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/w96-0409_abstr.xml
Size: 7,093 bytes
Last Modified: 2025-10-06 13:48:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0409"> <Title>Tactical Generation in a Free Constituent Order Language</Title> <Section position="1" start_page="0" end_page="81" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes tactical generation in Turkish, a free con~stituent order language, in which the order of the constituents may change according to the information structure of the sentences to be generated. In the absence of any information regarding the information structure of a sentence (i.e., topic, focus, background, etc.), the constituents of the sentence obey a default order, but the order is almost freely changeable, depending on the constraints of the text flow or discourse. We have used a recursively structured finite state machine for handling the changes in constituent order, implemented as a right-linear grammar backbone. Our implementation environment is the GenKit system, developed at Carnegie Mellon University-Center for Machine Translation. Morphological realization has been implemented using an external morpholggical analysis/generation component which performs concrete morpheme selection and handles morphographemic processes.</Paragraph> <Paragraph position="1"> Introduction Natural Language Generation is the operation of producing natural language sentences using specified communicative goals. This process consists of three main kinds of activities (McDonald, 1987): * the goals the utterance is to obtain must be determined, * the way the goals may be obtained must be planned, * the plans should be realized as text.</Paragraph> <Paragraph position="2"> Tactical generation is the realization, as linear text, of the contents specified usually using some kind of a feature structure that is generated by a higher level process such as text planning, or transfer in machine translation applications. In this process, a generation grammar and a generation lexicon are used.</Paragraph> <Paragraph position="3"> As a component of a large scale project on natural language processing for Turkish, we have undertaken the development of a generator for Turkish sentences. In order to implement the variations in the constituent order dictated by various information structure constraints, we have used a recursively structured finite state machine instead of enumerating grammar rules for all possible word orders. A second reason for this approach is that many constituents, especially the arguments of verbs are typically optional and dealing with such optionality within rules proved to be rather problematic. Our implementation is based on the GenKit environment developed at Carnegie Mellon University-Center for Machine Translation. GenKit provides writing a context-free backbone grammar along with feature structure constraints on the non-terminals.</Paragraph> <Paragraph position="4"> The paper is organized as follows: The next section presents relevant aspects of constituent order in Turkish sentences and factors that determine it. We then present an overview of the feature structures for representing the contents and the information structure of these sentences, along with the recursive finite state machine that generates the proper order required by the grammatical and information structure constraints. Later, we give the highlights of the generation grammar architecture along with some example rules and sample outputs. We then present a discussion comparing our approach with similar work, on Turkish generation and conclude with some final comments.</Paragraph> <Paragraph position="5"> Turkish In terms of word order, Turkish can be characterized as a subject-object-verb (SOV) language in which constituents at certain phrase levels can change order rather freely, depending on the constraints of text flow or discourse. The morphology of Turkish enables morphological markings on the constituents to signal their grammatical roles without relying on their order. This, however, does not mean that word order is immaterial. Sentences with different word orders reflect different pragmatic conditions, in that, topic, focus and background information conveyed by such sentences differ, t Information conveyed through intonation, stress and/or clefting in fixed word order languages such as English, is expressed in Turkish by changing the order of the constituents. Obviously, there are certain constraints on constituent order, especially, inside noun and post-positional phrases. There are also certain constraints at sentence level when explicit case marking is not used (e.g., with indefinite direct objects).</Paragraph> <Paragraph position="6"> In Turkish, the information which links the sentence to the previous context, the topic, is in the first position. The information which is new or emphasized, the focus, is in the immediately preverbal position, and the extra information which may be given to help the hearer understand the sentence, the background, is in the post verbal position (Erguvanh, 1979).</Paragraph> <Paragraph position="7"> The topic, focus and background information, when available, alter the order of constituents of Turkish sentences. In the absence of any such control information, the constituents of Turkish sentences have the default order: subject, ezpression of time, ezpression of place, direct object, beneficiary, source, goal, location, instrument, value designator, path, duration, expression of manner, verb.</Paragraph> <Paragraph position="8"> All of these constituents except the verb are optional unless the verb obligatorily subcategorizes for a specific lexical item as an object in order to convey a certain (usually idiomatic) sense. The definiteness of the direct object adds a minor twist to the default order. If the direct object is an indefinite noun phrase, it has to be immediately preverbal. This is due to the fact that, both the subject and the indefinite 1See Erguvanh (1979) for a discussion of the function of word order in Turkish grammar.</Paragraph> <Paragraph position="9"> direct object have no surface case-marking that distinguishes them, so word order constraints come into play to force this distinction.</Paragraph> <Paragraph position="10"> In order to present the flavor of word order variations in Turkish, we provide the following examples. These two sentences are used to describe the same event (i.e., have the same logical form), but they are used in different discourse situations. The first sentence presents constituents in a neutral default order, while in the second sentence 'bugiin' (today) is the topic and 'Ahmet' is the focus: 2</Paragraph> <Paragraph position="12"> Ahmet bug{in evden okula Ahmet today home+ABL school+DAT &quot;Ahmet went from home to school otob{isle 3 dakikada git~i.</Paragraph> <Paragraph position="13"> bus+WITH 3 minute+LOC go+PAST+aSG by bus in 3 minutes today.' b.</Paragraph> <Paragraph position="14"> Bug{in evden okula otobiisle today home+ABL school+DAT bus+WITH 'It was Ahmet who went from home to</Paragraph> </Section> class="xml-element"></Paper>