File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0314_metho.xml

Size: 23,076 bytes

Last Modified: 2025-10-06 14:13:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W94-0314">
  <Title>Generating Context-Appropriate Word Orders in Turkish</Title>
  <Section position="2" start_page="0" end_page="117" type="metho">
    <SectionTitle>
2 Free Word Order in Turkish
</SectionTitle>
    <Paragraph position="0"> The most common word order used in simple transitive sentences in Turkish is SOV (Subject-Object-Verb), but all six permutations of a transitive sentence can be used in the proper discourse situation since the subject and object are differentiated by case-marking. 1 *I would like to thank Mark Steed.man and the anonymous referees for their valuable advice. This work was partially sup- null ported by DARPA N00014-90-J-1863, ARO DAAL03-89-C-0031, NSF IRI 90-16592, Ben Franklin 91S.3078C-1.</Paragraph>
    <Paragraph position="1"> 1 According to a language acquisition study in (Slobin-82), 52% of transitive sentences used by a sample of Turkish speakers were not in the canonical SOV word order.</Paragraph>
    <Paragraph position="2"> (1) a. AySSe Fatma'yl anyor.</Paragraph>
    <Paragraph position="3"> Ay~e Fatma-Acc seek-Pres-(3Sg).</Paragraph>
    <Paragraph position="4"> &amp;quot;AySSe is looking for Fatma.&amp;quot; b. Fatma'yl Ay~e arlyor.</Paragraph>
    <Paragraph position="5"> c. AySSe arlyor Fatma'yl.</Paragraph>
    <Paragraph position="6"> d. Farina'y1 anyor Ay~e.</Paragraph>
    <Paragraph position="7"> e. Anyor Fatma'yl Ay~e.</Paragraph>
    <Paragraph position="8"> f. Anyor AySSe Fatma'yl.</Paragraph>
    <Paragraph position="9">  The propositional interpretation assigned to all six of these sentences is seek'(Ay~e',Fatma'). However, each word order conveys a different discourse meaning only appropriate to a specific discourse situation. The one propositional interpretation cannot capture the distinctions in meaning necessary for effective translation and communication in Turkish. The interpretations of these different word orders rely on discourse-related notions such as theme/rheme, focus/presupposition, topic/comment, etc. that describe how the sentence relates to its context.</Paragraph>
    <Paragraph position="10"> There is little agreement on how to represent the discourse-related functions of components in the sentence, i.e. the information structure of the sentence. Among Turkish linguists, Erguvanh (Erguvanli-84) captures the general use of word order by associating each position in a Turkish sentence with a specific pragmatic function. Generally in Turkish, speakers first place the information that links the sentence to the previous context, then the important and/or new information immediately before the verb, and the information that is not really needed but may help the hearer understand the sentence better, after the verb. Erguvanll identifies the sentence-initial position as the topic, the immediately preverbal position as the focus, and the postverbal positions as backgrounded information. The following template that I will be using in the implementation describes the general association between word order and information structure components (in bold font) for Turkish sentences: (2) Topic Neutral Focus Verb Background</Paragraph>
    <Section position="1" start_page="117" end_page="117" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> I will call the phrase formed by the topic and the neutral components the theme of the sentence and the phrase formed by the focus and the verb, the rheme of the sentence.</Paragraph>
      <Paragraph position="1"> Using these information structure components, we can now explain why certain word orders are appropriate or inappropriate in a certain context. For example, a speaker may use the SOV order in (3b) because in that context, the speaker wants to focus the new object, Ahmet, and so places it in the immediately preverbal position. However, in (4)b, Ahmet is the topic or a link to the previous context whereas the subject, Fatma, is the focus, and thus the OSV word order is used. Here, we translate these Turkish sentences to English using different &amp;quot;stylistic&amp;quot; constructions (e.g. topicalization, it-clefts, phonological focusing etc.) in order to preserve approximately the same meanings.</Paragraph>
      <Paragraph position="2">  (3) a. Fatma kimi anyor? Fatma who seek-Pres? &amp;quot;Who is Fatma looking for?&amp;quot; b. Fatma Ahmet'i anyor. SOV Fatma Ahmet-Acc seek-Pres.</Paragraph>
      <Paragraph position="3"> &amp;quot;Fatma is looking for AHMET.&amp;quot; (4) a.</Paragraph>
      <Paragraph position="4"> b.</Paragraph>
      <Paragraph position="5">  Ahmet'i kim anyor? Ahmet-Dat who seek-Pres.</Paragraph>
      <Paragraph position="6"> &amp;quot;Who is looking for Ahmet?&amp;quot; Ahmet'i Fatma anyor. OSV Ahmet-Acc Fatma seek-Pres.</Paragraph>
      <Paragraph position="7"> &amp;quot;As for Ahmet, it is FATMA who is looking for him.&amp;quot; It is very common for Turkish speakers to put information already mentioned in the discourse, i.e. discourse-given, in the post-verbal positions, in the background component of the information structure. In fact, discourse-new elements cannot occur in the postverbal positions. In addition, referential status,' i.e. whether the speaker uses a full noun phrase, an overt pronoun, or a null pronoun to refer to an entity in the discourse, can be used to signal the familiarity information to the hearer. Thus, given information can be freely dropped (5)bl or placed in post-verbal positions (5)b2 in Turkish. Although further research is required on the interaction between referential status and word order, I will not concentrate on this issue in this paper. null  (5) a.</Paragraph>
      <Paragraph position="8"> bl.</Paragraph>
      <Paragraph position="9">  The same information structure components topic, focus, background can also explain the positioning of adjuncts in Turkish sentences. For example, placing a locative phrase in different positions in a sentence results in different discourse meanings, much as in English sentences:  &amp;quot;Fatma looked for Ahmet, in Istanbul.&amp;quot; Long distance scrambling, word order permutation involving more than one clause, is also possible out of most embedded clauses in Turkish; in complex sentences, elements of the embedded clauses can occur in matrix clause positions. However, these word orders with long distance dependencies are only used by speakers for specific pragmatic functions. Generally, an element from the embedded clause can occur in the sentence initiM topic position of the matrix clause, as in (7)b, or to the right of the matrix verb as backgrounded information, as in (7)c. 2  (7) a.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="117" end_page="121" type="metho">
    <SectionTitle>
3 The Categorial Formalism
</SectionTitle>
    <Paragraph position="0"> Many different syntactic theories have been proposed to deal with free word order variation. It has been widely debated whether word order variation is the result of stylistic rules, the result of syntactic movement, or basegenerated. I adopt a categorial framework in which the word order variations in Turkish are pragmatically2I have put-in coindexed traces and italicized the scrambled elements in these examples to help the reader; I am not making the syntactic claim that these traces actually exist.</Paragraph>
    <Section position="1" start_page="118" end_page="118" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> driven; this lexicalist framework is not compatible with transformational movement rules.</Paragraph>
      <Paragraph position="1"> My work is influenced by (Steedman-91) in which a theory of prosody, closely related to a theory of information structure, is integrated with Combinatory Categorial Grammars (CCGs). Often intonational phrase boundaries do not correspond to traditional phrase structure boundaries. However, by using the CCG type-raising and composition rules, CCG formalisms can produce nontraditional syntactic constituents which may match the intonational phrasing.</Paragraph>
      <Paragraph position="2"> These intonational phrases often correspond to a unit of planning or presentation with a single discourse function, much like the information structure components of topic, neutral, focus, and background in Turkish sentences. Thus, the ambiguity that CCG rules produce is not spurious, but in fact, necessary to capture prosodic and pragmatic phrasing. The surface structure of a sentence in CCGs can directly reflect its information structure, so that different derivations of the same sentence correspond to different information structures.</Paragraph>
      <Paragraph position="3"> In the previous section, we saw that ordering of constituents in Turkish sentences is dependent on pragmatic functions, the information structure of the sentence, rather than on the argument structure of the sentence as in English. Moreover, information structure is distinct from argument structure in that adjuncts and elements from embedded clauses can serve a pragmatic function in the matrix sentence and thus be a component of the information structure without taking part in the argument structure of the matrix sentence. This suggests an approach where the ordering information which is dependent on the information structure is separated from the the argument structure of the sentence.</Paragraph>
      <Paragraph position="4"> In section 3.1, I describe a version of CCGs adapted for free word order languages in (Hoffman-92) to capture the argument structure of Turkish, while producing a flexible surface structure and word order. In addition, each CCG constituent is associated with a pragmatic counterpart, described in section 3.2, that contains the context-dependent word order restrictions.</Paragraph>
    </Section>
    <Section position="2" start_page="118" end_page="119" type="sub_section">
      <SectionTitle>
3.1 {}-CCG
Multi-set Combinatory Categorial Grammars,
</SectionTitle>
      <Paragraph position="0"> {}-CCGs, (Hoffman-92) is a version of CCGs for free word order languages in which the subcategorization information associated with each verb does not specify the order of the arguments. Each verb is assigned a function category in the lexicon which specifies a rnulliset of arguments, so that it can combine with its arguments in any order. For instance, a transitive verb has the following category Sl{Nn , Na} which defines a function looking for a set of arguments, nominative case noun phrase (Nn) and an accusative case noun phrase (Na), and resulting in the category S, a complete sentence, once it has found these arguments. Some phrase structure information is lost by representing a verb as a function with a set of arguments. However, this category is also associated with a semantic interpretation.</Paragraph>
      <Paragraph position="1"> For instance, the verb &amp;quot;see&amp;quot; could have the following category where the hierarchical information among the arguments is expressed within the semantic interpretation separated from the syntactic representation by a colon: S : see(X,Y)\]{Nrt : X, Na : Y}. This category can easily be transformed into a DAG representation like the following where coindices,z and y, are indicated by italicized font. s  We can modify the CCG application rules for functions with sets as follows. The sets indicated by braces in these rules are order-free, i.e. Y in the following rules can be any element in the set. Functions can specify a direction feature for each of their arguments, notated in the rules as an arrow above the argument. 4 We assume that a category X\[{ } where { } is the empty set rewrites by a clean-up rule to just X.</Paragraph>
      <Paragraph position="3"> Using these new rules, a verb can apply to its arguments in any order. For example, the following is a derivation of a sentence with the word order  guments in the set can be associated with feature labels which indicate their category and case.</Paragraph>
      <Paragraph position="4"> 4 Since Turkish is not strictly verb-final, most verbs will not specify the direction features of their arguments.</Paragraph>
    </Section>
    <Section position="3" start_page="119" end_page="119" type="sub_section">
      <SectionTitle>
7th International Generation Workshop * Kennebunkport, Maine * June 21-24, 1994
</SectionTitle>
      <Paragraph position="0"> Instead of using the set notation, we could imagine assigning Turkish verbs multiple lexical entries, one for each possible word order permutation; for example, a transitive verb could be assigned the categories S\Nn\ga, S\Na\Nn, S\Na/gn, etc., instead of the one entry S\[ {Nn, Na}. However, we will see below that the set notation is more than a shorthand representing multiple entries because it allows us to handle long distance scrambling, permutations involving more than one clause, as well.</Paragraph>
      <Paragraph position="1"> The following composition rules are proposed to combine two functions with set-valued arguments, e.g. two verbs.</Paragraph>
      <Paragraph position="3"> These composition rules allow two verb categories with sets of arguments to combine together. For example, (12) go-gerund-acc knows.</Paragraph>
      <Paragraph position="4"> S~o : go(y)lUVg : y} S: know(z,p)I{Nn: x, S..: p} &lt;B S: know(x, go(y))l{Ng : y, Nn : x} As the two verbs combine, their arguments collapse into one argument set in the syntactic representation. However, the verbs' respective arguments are still distinct within the semantic representation of the sentence. The predicate-argument structure of the subordinate clause is embedded into the semantic representation of the matrix clause. Long distance scrambling can easily be handled by first composing the verbs together to form a complex verbal function which can then apply to all of the arguments in any order.</Paragraph>
      <Paragraph position="5"> Certain coordination constructions (such as 'SO and SOV' seen in (13) as well as 'SOV and SO') can be handled in a CCG based formalism by type-raising NPs into functions over verbs. Two type-raised noun phrases can combine together using the composition rules to form a nontraditional constituent which can then coordinate. null (13) AySSe kitabx, Fatma da gazeteyi okuyor.</Paragraph>
      <Paragraph position="6"> Ay~e book-acc, Fatma too newspaper-acc reads.</Paragraph>
      <Paragraph position="7"> &amp;quot;Ay~e is reading the book and Fatma the newspaper.&amp;quot; Order-preserving type-raising rules that are modified for {}-CCGs are used to convert nouns in the grammar into functors over the verbs. These rules can be obligatorily activated in the lexicon when case-marking morphemes attach to the noun stems.</Paragraph>
      <Paragraph position="8">  The first rule indicates that a noun in the presence of a case morpheme becomes a functor looking for a verb on its right; this verb is also a functor looking for the original noun with the appropriate case on its left. After the noun functor combines with the appropriate verb, the result is a functor which is looking for the remaining arguments of the verb. The notation ... is a variable which can unify with one or more elements of a set. The second type-raising rule indicates that a case-marked noun is looking for a verb on its left. {\]-CCGs can model a strictly verb-finM language like Korean by restricting the noun phrases of that language to the first type-raising rule. Since most, but not all, case-marked nouns in Turkish can occur behind the verb, certain pragmatic and semantic properties of a Turkish noun determine whether it can type-raise to the category produced by the second rule.</Paragraph>
      <Paragraph position="9"> The {}-CCG for Turkish described above can be used to parse and generate all word orders in Turkish sentences. However, it does not capture the more interesting questions about word order variation: namely, why speakers choose a certain word order in a certain context and what additional meaning these different word orders provide to the hearer. Thus, we need to integrate the {}-CCG formalism with a level of information structure that represents pragmatic functions, such as topic and focus, of constituents in the sentence in a compositional way.</Paragraph>
    </Section>
    <Section position="4" start_page="119" end_page="121" type="sub_section">
      <SectionTitle>
3.2 A Grammar for Word Order
</SectionTitle>
      <Paragraph position="0"> In (Steedman-91; Prevost/Steedman-93), a theory of prosody, closely related to a theory of information structure, is integrated with CCGs by associating every CCG category encoding syntactic and semantic properties with a prosodic category. Taking advantageof the non-traditional constituents that CCGs can produce, two CCG constituents are allowed to combine only if their prosodic counterparts can also combine.</Paragraph>
      <Paragraph position="1"> Similarly, I adopt a simple interface between {}-CCG and ordering information by associating each syntactic/semantic category with an ordering category which bears linear precedence information. These two categories are linked together by the features of the information structure. For example, the verb &amp;quot;arwor&amp;quot; (seeks) is assigned the lexical entry seen in the category feature of the DAG in Figure 1. The category feature contains the argument structure in the features syn and sere as well as the information structure in the feature info. This lexical entry is associated with an ordering category seen in the feature order of the DAG in Figure 1. This ordering feature is linked to the category  feature via the co-indices T, N, F, and B.</Paragraph>
      <Paragraph position="2"> The ordering categories are assigned to lexical entries according to context-dependent word order restrictions found in the language. All Turkish verbs are assigned the ordering category seen in the orderfeature in Figure 1; this is a function which can use the categorial application rules to first combine with a focused constituent on its left, then a neutral constituent on its left, then a topic constituent on its left, and then a background constituent on its right, finally resulting in a complete utterance. This function represents the template mentioned in example 2 for assigning discourse functions according to their positional relation to the verb following (Erguvanli-84). However, it is more flexible than Erguvanh's approach in that it allows more than one possible information structure. The parentheses around the arguments of the ordering category indicate that they are optional arguments. The sentence may contain all or some or none of these information structure components. It may turn out that we need to restrict the optionality on these components. For instance, if there is no topic found in the sentence-initial position, then we may need to infer a topic or a link to the previous context. In the current implementation, the focus is an obligatory constituent in order to ensure that the parser produces the derivation with the most likely information structure first, and there is an additional ordering category possible where the verb itself is focused and where there are no pre-verbal elements in the sentence.</Paragraph>
      <Paragraph position="3"> Categories other than verbs, such as nouns, determiners, adjectives, and adverbs, are associated with an ordering category that is just a basic element, not a function. In Turkish, the familiarity status of entities in the discourse model serves a role in determining their discourse function. For example, discourse-new entities cannot occur in the post-verbal or sentence initial positions in Turkish sentences. Thus, discourse-new elements can be assigned ordering categories with the feature-attribute focus or neutral with their semantic properties as the feature-value, but they cannot be associated with background or C/opic ordering categories.</Paragraph>
      <Paragraph position="4"> There are no such restrictions for discourse-old entities; thus they can be assigned a variable which can unify with any of the information structure components.</Paragraph>
      <Paragraph position="5"> During a derivation in parsing or generation, two constituents can combine only if the categories in their category features can combine using the {}-CCG rules presented in the previous section, and the categories in their order features can combine using the following rewriting rules. A sample derivation involving the ordering grammar can be seen in Figure 2.</Paragraph>
      <Paragraph position="7"> The identity rule allows two constituents with the same discourse function to combine. The resulting  constituent may not be a traditional syntactic constituent, however as argued in (Steedman-91), this is where we see the advantage of using a CCG based formalism. Through type-raising and composition, CCG formalisms can produce nontraditional syntactic constituents which may have a single discourse function. For example in Figure 2, the NPs Farina and Ay~e form a pragmatic constituent using the identity rule in the ordering grammar; in order to form a syntactic constituent as well, they must be type-raised and combine together using the {}-CCG composition rule.</Paragraph>
      <Paragraph position="8"> Type-raising in Turkish is needed for sentences involving more than one NP in the neutral and background positions.</Paragraph>
      <Paragraph position="9"> The ordering grammar also allows adjuncts and elements from other clauses (long distance scrambled) to be components in the information structure. This is because the information structure in a verb's lexical entry does not specify that its components must be arguments of the verb in its argument structure. Thus, adjuncts and elements from embedded clauses can be serve a purpose in the information structure of the matrix clause, although they are not subcategorized arguments of the matrix verb. For long distance scrambling, the additional restriction (that Y is not a functor) on the application rules given above ensures that a verb in the embedded clause has already combined with all of its obligatory arguments or skipped all of its optional arguments before combining with the matrix verb.</Paragraph>
      <Paragraph position="10"> The ordering grammar presented above is similar to the template grammars in (Danlos-87), the syntax specialists in PAULINE (Hovy-88), and the realization classes in MUMBLE (McDonald/Pustejovsky-85) in that it allows certain pragmatic distinctions to influence the syntactic construction of the sentence. The ordering grammar does not make as fine-grained pragmatic distinctions as the generation systems above, but it represents language-specific and context-dependent word order restrictions that can be lexicalized into compositional categories. The categorial formalism presented above captures the general discourse meaning of word order variation in languages such as Turkish while using a compositional method.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML