File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/e95-1034_metho.xml
Size: 17,591 bytes
Last Modified: 2025-10-06 14:14:01
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1034"> <Title>Integrating &quot;Free&quot; Word Order Syntax and Information Structure</Title> <Section position="3" start_page="0" end_page="246" type="metho"> <SectionTitle> 2 The Turkish Data </SectionTitle> <Paragraph position="0"> The arguments of a verb in Turkish (as well as many other &quot;free&quot; word order languages) do not have to occur in a fixed word order. For instance, all six permutations of the transitive sentence below are possible, since case-marking, rather than word order, serves to differentiate the arguments, a aThe accusative, dative, genitive, ablative, and locative cases are associated with specific morphemes (1) a. Fatma Ahmet'i gSrdii.</Paragraph> <Paragraph position="1"> Fatma Ahmet-Acc see-Past.</Paragraph> <Paragraph position="2"> &quot;Fatma saw Ahmet.&quot; b. Ahmet'i Fatma gSrdii.</Paragraph> <Paragraph position="3"> c. Fatma gSrdfi Ahmet'i.</Paragraph> <Paragraph position="4"> d. Ahmet'i gSrdfi Fatma.</Paragraph> <Paragraph position="5"> e. GSrdfi Ahmet'i Fatma.</Paragraph> <Paragraph position="6"> f. GSrdii Fatma Ahmet'i.</Paragraph> <Paragraph position="7"> Although all the permutations have the same propositional interpretation, see(Fatma, Ahmet), each word order conveys a different discourse meaning only appropriate to a specific discourse situation. We can generally associate the sentence-initial position with the topic, the immediately preverbal position with the focus which receives the primary stress in the sentence, and postverbal positions with backgrounded information (Erguvanli, 1984). The post-verbal positions are influenced by the given/new status of entities within the discourse; postverbal elements are always evoked discourse entities or are inferrable from entities already evoked in the previous discourse, and thus, help to ground the sentence in the current context.</Paragraph> <Paragraph position="8"> I define topic and focus according to their informational status. A sentence can be divided into a topic and a comment, where the topic is the main element that the sentence is about, and the comment is the main information we want to convey about this topic. Assuming the heater's discourse model or knowledge store is organized by topics, the sentence topic can be seen as specifying an &quot;address&quot; in the heater's knowledge store (Reinhart, 1982; Vallduvi, 1990). The informational focus is the most information-bearing constituent in the sentence, (Vallduvi, 1990); it is the new or important information in the sentence (within the comment), and receives prosodic prominence in speech. These information structure components are successful in describing the context-appropriate answer to database queries. In this domain, the focus is the new or important part of the answer to a wh-question, while the topic is the main entity that the question and answer are both about, that can be paraphrased using the clause &quot;As for X&quot;. In other domains, finding the topic and focus of sentences according to the context may be more complicated.</Paragraph> <Paragraph position="9"> We can now explain why certain word orders are appropriate or inappropriate in a certain context, in this case database queries. For example, a speaker may use the SOV order in (2b) to answer the wh-question in (2a) because the speaker wants to focus the new object, Ahmet, and so places it in the immediately preverbal position. However, given a different wh-question in (3), the subject, (and their vowel-harmony variants) which attach to the noun; nominative case and subject-verb agreement for third person singular are unmarked.</Paragraph> <Paragraph position="10"> Fatma, is the focus of the answer, while Ahmet is the topic, a link to the previous context, and thus the OSV word order is used. 2 &quot;As for Ahmet, FATMA saw him.&quot; Adjuncts can also occur in different sentence positions in Turkish sentences depending on the context. The different positions of the sentential adjunct &quot;yesterday&quot; in the following sentences result in different discourse meanings, much as in English.</Paragraph> <Paragraph position="11"> Clausal arguments, just like simple NP arguments, can occur anywhere in the matrix sentence as long as they are case-marked, (5)a and b. Subordinate verbs in Turkish resemble gerunds in English; they take a genitive marked subject and are case-marked like NPs, but they assign structural case to the rest of their arguments like verbs. The arguments and adjuncts within most embedded clause can occur in any word order, also seen in (5)a and b. In addition, elements from the embedded clause can occur in matrix clause positions, i.e. long distance scrambling, (5c). As indicated by the translations, word order variation in complex sentences also affects the interpretation. (5) a.</Paragraph> <Paragraph position="12"> The information structure (IS) is distinct from predicate-argument structure (AS) in languages such as Turkish because adjuncts and elements long distance scrambled from embedded clauses can take part in the IS of the matrix sentence without taking part in the AS of the matrix sentence. As motivated from the data, a formalism for &quot;free&quot; word order languages such as Turkish must be flexible enough to handle word order variation among the arguments and the adjuncts in all clauses, as well as the long distance scrambling of elements from embedded clauses. In addition, to capture the context-appropriate use of word order, the formalism must associate information structure components such as topic and focus with the appropriate sentence positions, regardless of the predicate-argument structure of the sentence, and be able to handle the information structure of complex sentences. In the next sections I will present a combinatory categorial formalism which can handle these characteristics of &quot;free&quot; word order languages.</Paragraph> </Section> <Section position="4" start_page="246" end_page="247" type="metho"> <SectionTitle> 3 &quot;Free&quot; Word Order Syntax </SectionTitle> <Paragraph position="0"> In Multiset-CCG 3, we capture the syntax of free.</Paragraph> <Paragraph position="1"> argument order within a clause by relaxing the subcategorization requirements of a verb so that it does not specify the linear order of its arguments. Each verb is assigned a function category in the lexicon which subcategorizes for a multi-set of arguments, without linear order restrictions. For instance, a transitive verb has the category SI{Nn , Wa}, a function looking for a set of arguments, a nominative case noun phrase (Nn) and an accusative case noun phrase (Na), and resulting in the category S, a complete sentence, once it has found these arguments in any order.</Paragraph> <Paragraph position="2"> The syntactic category for verbs provides no hierarchical or precedence information. However, it is associated with a propositional interpretation that does express the hierarchical ranking of the arguments. For example, the verb &quot;see&quot; is assigned the lexical category S: see(X, Y)\]{Nn: X, Na : Y}, and the noun &quot;Fatma&quot; is assigned Nn : Farina, where the semantic interpretation is separated from the syntactic representation by a colon. These categories are a shorthand forthe many syntactic and semantic features associated with each lexical item. The verbal functions can also specify a direction feature for each of their arguments, notated in the rules as an arrow above the argument. Thus, verb-final languages such as Korean can be modeled by using this direction feature in verbal categories, e.g. S\]{ffn, ffa).</Paragraph> <Paragraph position="3"> Multiset-CCG contains a small set of rules that combine these categories into larger constituents.</Paragraph> <Paragraph position="4"> The following application rules allow a function 3A preliminary version of the syntactic component of the grammar was presented in (Hoffman, 1992).</Paragraph> <Paragraph position="5"> such as a verbal category to' combine with one of its arguments to its right (>) or left (<). We assume that a category X I0 where there are no arguments left in the multiset rewrites by a clean- null up rule to just X.</Paragraph> <Paragraph position="6"> (6) a. Forward Application (>):</Paragraph> <Paragraph position="8"> Using these application rules, a verb can apply to its arguments in any order. For example, the following is a derivation of a transitive sentence with the word order Object-Subject-Verb; variables in the semantic interpretations are In fact, all six permutations of this sentence can be derived by the Multiset-CCG rules, and all are assigned the same propositional interpretation, see(Fatma, Ahmet).</Paragraph> <Paragraph position="9"> The following composition rules combine two functions-with set-valued arguments, e.g. two verbal categories, a verbal category and an adjunct. null (8) a. Forward Composition (> B): X I (Argsx U {Y}) Y\] Argsy ==~ X I (ArgsxU Argsy) b. Backward Composition (< B): rl Argsy X I (Argsx U {Y}) ==~ X I (ArgsxU Argsy) c. Restriction: Y ~ NP.</Paragraph> <Paragraph position="10"> Through the use of the composition rules, Multiset-CCGs can handle the free word order of sentential adjuncts. Adjuncts are assigned a function category SI{S } that can combine with any function that will also result in S, a complete sentence. The same composition rules allow two verbs to compose together to handle complex sentences with embedded clauses. This will be discussed further in section 5.</Paragraph> <Paragraph position="11"> The restriction Y ~ NP on the Multiset-CCG composition rules prevents the categories for verbs, SI{NP}, and for adjectives, NP\]{\]~P), from combining together before combining with a bare noun. This captures the fact that simple NPs must be continuous and head-final in Turkish. Multiset CCG is flexible enough to handle &quot;free&quot; word order languages that are freer than Turkish, such as Warlpiri, through the use of unrestricted composition rules, but it can also handle languages more restrictive in word order such such as Korean by restricting the categories that can take part in the composition rules.</Paragraph> </Section> <Section position="5" start_page="247" end_page="248" type="metho"> <SectionTitle> 4 The Discourse Meaning of &quot;Free&quot; Word Order </SectionTitle> <Paragraph position="0"> Word order variation in Turkish and other &quot;free&quot; word order languages is used to express the information structure of a sentence. The grammar presented in the last section determines the predicate-argument structure of a sentence, regardless of word order. In this section, I add the ordering component of the grammar where the information structure of a sentence is determined. The simple compositional interface described below allows the AS and the IS of a sentence to be derived in parallel. This interface is very similar to Steedman's approach in integrating prosody and syntax in CCGs for English (Steedman, 1991).</Paragraph> <Paragraph position="1"> A. Each Multiset-CCG category encoding syntactic and semantic properties in the AS is associated with an Ordering Category which encodes the ordering of IS components.</Paragraph> <Paragraph position="2"> B. Two constituents can combine if and only if i. their syntactic/semantic categories can combine using the Multiset-CCG application and composition rules, ii. and their Ordering Categories can combine using the rules below: Every verbal category in Multiset-CCG is associated with an ordering category, which serves as a template for the IS. The ordering category in (9) is a function that specifies the components which must be found to complete a possible IS. The forward and backward slashes in the category indicate the direction in which the arguments must be found, and the parentheses around arguments indicate optionality. The variables T, F, G1, G2 will be unified with the interpretations of the proper constituents in the sentence during the derivation. null The function above can use the simple application rules to first combine with a focused constituent on its left, then a ground constituent on its left, then a topic constituent on its left, and a ground constituent on its right. This function will result in a complete IS only if it finds the obligatory sentence-initial topic and the immediately preverbal focus constituent; its other arguments (the ground) are optional and can be skipped during the derivation through a category rewriting rule, XI(Y ) :::> X, that may apply after the application rules. 5 Nonverbal elements are associated with simpler ordering categories, often just a variable which can unify with the topic, focus, or any other component in the IS template during the derivation. The identity rule allows two constituents with the same discourse function (often variables) to combine. These simpler ordering categories also contain a feature which indicates whether they represent given or new information in the discourse model, which is dynamically checked during the derivation. Restrictions (such that elements to the right of the verb have to be discourse-old information in Turkish) are expressed as features on the arguments of the verbal ordering functions.</Paragraph> <Paragraph position="3"> What is novel about this formalism is that the predicate-argument structure and the information structure of a sentence are built in parallel in a compositional way. For example, given the following question, we may answer in a word order which indicates that &quot;today&quot; is the topic of the sentence, and &quot;Little Ahmet&quot; is the focus. The derivation for this answer is seen in Figure 1.</Paragraph> <Paragraph position="4"> (10) a. Bugiin kimi gSrecek Fatma? In Figure 1, every word in the sentence is associated with a lexical category right below it, which is then assoc{ated with an ordering category in the next line. Parallel lines indicate the application of rules to combine two constituents together; the first line is for combining the syntactic categories, and the second line is foe combining the ordering categories of the two constituents. The syntactic constituents are allowed to combine to form a larger constituent, only if their pragmatic counterparts (the ordering categories) can also combine. Thus, the derivation reflects the single surface structure for the sentence, while compositionally building the AS and the IS of the sentence in 5Another IS is available where the topic component is marked as &quot;inferrable', for those cases where the topic is a zero pronoun instead of an element which is realized in the sentence. After the derivation is complete, further discourse processing infers the identity of the unrealized topic from among the salient entities in the discourse model.</Paragraph> <Paragraph position="5"> parallel.</Paragraph> <Paragraph position="6"> Using this formalism, I have implemented a database query system (Hoffman, !994) which generates Turkish sentences with context-appropriate word orders, in answer to database queries. In generation, the same topic found in the database query is maintained in the answer. For wh-questions, the information that is retrieved from the database to answer the question becomes the focus of the answer. I have extended the system to also handle yes-no questions involving the question morpheme &quot;mi&quot;, which is placed next to whatever element is being questioned in the sentence. If the verb is being questioned, this is a cue that the assertion or negation of the verb will be the focus of the answer: &quot;No, (as for Ahmet) Farina did NOT see him.&quot; In most Turkish sentences, the immediately pre-verbal position is prosodically prominent, and this corresponds with the informational focus. However, verbs can be focused in Turkish by placing the primary stress of the sentence on the verb instead of immediately preverbal position and by lexical cues such as the placement of the question morpheme. Thus, we must have more than one IS available for verbs, where verbs can be in the focus or the ground component of the IS. In addition, it is possible to focus the whole VP or the whole sentence, which can be determined by the context, in this case the database query: (13) a. Bugiin Fatma ne yapacak? Today Fatma what do-Fut? &quot;What's Fatma going to do today?&quot; b.</Paragraph> <Paragraph position="7"> Bug/in Fatma \[kitap okuyacak\]r.</Paragraph> <Paragraph position="8"> Today Fatma book read-fut.</Paragraph> <Paragraph position="9"> &quot;Today, Fatma is going to \[read a BOOKIE In yes/no questions, if a non-verbal element is being focused by the question morpheme and the answer is no, the system provides a more natural and helpful answer by replacing the focus of the question with a variable and searching the database for an alternate entity that satisfies the rest of the question.</Paragraph> <Paragraph position="10"> Thus, Multiset CCG allows certain pragmatic distinctions to influence the syntactic construction of the sentence using a lexicalized compositional method. In addition, it provides a uniform approach to handle word order variation among arguments and adjuncts, and as we will see in the next section, across clause boundaries.</Paragraph> </Section> class="xml-element"></Paper>