File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/c90-2058_abstr.xml

Size: 15,291 bytes

Last Modified: 2025-10-06 13:46:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-2058">
  <Title>IMPLEMENTING THE GENERALIZED WORD ORDER GRAMMARS OF CHOMSKY AND DIDERICHSEN</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
IMPLEMENTING THE GENERALIZED WORD ORDER GRAMMARS OF
CHOMSKY AND DIDERICHSEN
</SectionTitle>
    <Paragraph position="0"> by Bengt Sigurd, Dept of Linguistics, Lund University, SWEDEN Helgonabacken 12, S-223 62 Lund, e- mail: linglund@gemini.ldc.lu.se Many of the insights of Transformational Grammar (TG) concern the movability of constituents. But in recent versions (Government &amp; Binding, GB; cf. Chomsky, 1982, Sells,1985) the sentence representations (trees) include both the site of the moved constituent and the site from where it has been moved; the original site of the moved constituent is marked as a trace (t) or empty (e, \[\]). In the sentence schema (Field or Position Grammar) developed by the Danish linguist Paul Diderichsen (1946), there are also positions both for the new and the old site of moved constituents. Thus Diderichsen observes that an adverb could introduce or be the fundament of a sentence, in which case the subject np &amp;quot;remains&amp;quot; in its &amp;quot;normal&amp;quot; position after the finite verb (Swedish example: ldag kom pojken; literally: Today came the boy). If the subject np introduces the sentence (Pojken kom idag) its &amp;quot;original&amp;quot; place after the finite verb must be empty (For comparisons between Transformational Grammar and Diderichsen's grammar, cf. Teleman, 1972, Platzack,1986).</Paragraph>
    <Paragraph position="1"> Underlying both Chomskyan GB grammar and Diderichsen's Field Grammar is a grammatical system which consists of a general word or constituent order schema supplemented with co-occurrence restrictions.</Paragraph>
    <Paragraph position="2"> This type of system may be called Generalized Word Order Grammar (GWOG), and this paper deals with ways of implementing such a system on the computer using Definite Clause Grammar (DCG;Clocksin &amp; Mellish, 1981), a formalism available in most Prolog versions.</Paragraph>
    <Paragraph position="3"> Definite Clause Grammar is a convenient rewriting system with an arrow (--&gt;) familiar to generative linguists. It allows one to state the maximum sequence of constituents (the order schema) to the right of the arrow. A setup of constraining conditions can then be used to prohibit overgeneration.</Paragraph>
    <Paragraph position="4"> Such restrictions are stated within curly brackets in the Definite Clause Grammar formalism. Constraining conditions may require that certain slots be filled or empty, that a certain variable have a certain value, that certain constituents cannot occur at the same time (co-occurrence restrictions), etc.</Paragraph>
    <Paragraph position="5"> In addition one may have further conditions which state that a certain constituent is to have a certain functional role, e.g. be the subject or the object of the sentence. Such conditions may be called functional role conditions (f-conditions) as they build a functional structure (f-representation). This structure may be built in a certain slot (as an additional argument) to the left of the arrow.</Paragraph>
    <Paragraph position="6"> Further conditions may concern the topic (focus), mode, clause type, lacking constituent, etc. of the sentence, and this information may also be gathered as arguments in slots to the left of the arrow.</Paragraph>
    <Paragraph position="7"> The system to be presented in this paper also incorporates many of the ideas of Referent Grammar (RG; Sigurd, 1987), a :functional generalized phrase structure grammar used in the automatic translation project Swetra (Sigurd &amp; Gawronska-Werngren, 1988). I hereby acknowledge the help of Mats Eeg-Olofsson, \] 336 Barbara Gawronska-Werngren and Per Warter in the Swetra group at Lund.</Paragraph>
    <Paragraph position="8"> The generalized word order schemas of Chomsky and Diderichsen As can be seen from articles and text-books (e.g. Sells,1985), a typical Chomskyan Government &amp; Binding representation is a high binary hierarchical tree with complementizer phrases (C-phrases) on top of I(nfl)- and Vphrases. A tree for the Swedish sentence: &amp;quot;Vem slog pojken?&amp;quot; (Literally: Whom hit the boy?) given here in a parenthesis notation might look as follows, assuming &amp;quot;pojken&amp;quot; (&amp;quot;the boy&amp;quot;) to be the subject:</Paragraph>
    <Paragraph position="10"> &amp;quot;Whom hit the boy&amp;quot; This simplified representation means that the object &amp;quot;vem&amp;quot; is found in a front slot called &amp;quot;XP&amp;quot;, the finite verb is found in the slot called &amp;quot;C(omplement)&amp;quot; and the subject &amp;quot;pojken&amp;quot; is found in the &amp;quot;specifier&amp;quot; slot under IP. The &amp;quot;spec&amp;quot; under &amp;quot;VP&amp;quot; is empty and so are the verb slot under V' and the NP slot under V'.</Paragraph>
    <Paragraph position="11"> The transformational (process) description would, say that &amp;quot;vem&amp;quot; (&amp;quot;whom&amp;quot;) has been moved from its final position leaving a trace indexed with the same number (e:i) for reference. Similarly the transformational description would say that the finite verb &amp;quot;slog&amp;quot; and ',pojken&amp;quot; have left coindexed traces (e:j, e:k) behind. The Swedish sentence: &amp;quot;Vem slog pojken&amp;quot; is ambiguous and could also be interpreted as &amp;quot;Who hit the boy&amp;quot;. In that case the question pronoun &amp;quot;vem&amp;quot; (now equivalent to English &amp;quot;who&amp;quot;) should be coindexed with a trace in the position where &amp;quot;pojken&amp;quot; was found in the first case and &amp;quot;pojken&amp;quot; should be found in the &amp;quot;object position&amp;quot; under V'.</Paragraph>
    <Paragraph position="12"> Diderichsen uses a simpler model - he did his work long before Chomsky when formal grammar was not as highly developed.</Paragraph>
    <Paragraph position="13"> He would have stated the facts in the following way: Fund v s a V S A Vem slog pojken - - Vem slog - pojken-For the first interpretation of the sentence the &amp;quot;object slot&amp;quot; S(ubstantive=nominal) is empty; for the second interpretation the subject slot s(ubstantive) is empty - besides the empty slots for sentence adverbs (a), non-finite verbs (V) and other adverbs (A) also marked by the minus sign (-). Diderichsen calls the first three slots &amp;quot;the nexus field&amp;quot; and the last three &amp;quot;the content field&amp;quot; (indholdsfeltet). This division suits sentences containing an auxiliary with infinitives or participles, but for other sentences the division between a nexus field and a content field is unfortunate. The objects (in S) get separated from the finite verb (v) in simple transitive sentences. In the model to be presented below infinitives and participles are treated as subordinate (minor) clauses with their own objects and adverbs.</Paragraph>
    <Paragraph position="14"> GWOG rules . a simple illustration The following (simplified) Prolog (Definite Clause Grammar) rules illustrate how examples like those mentioned in the introduction can be handled by Generalized Word Order Grammar rules.</Paragraph>
    <Paragraph position="15">  This basic rule is a rewriting rule. It states that we get the information in the argument slots after &amp;quot;sent&amp;quot; if we find the (phrase or word) categories to the right of the arrow in the order they are given. Further phrase and word (lexical) rules defining an adverb (adv), an np, and an intransitive verb (vi), e.g. as described in Sigurd(1987) are needed. The lexical rules needed in order to generate our examples can have the following simplified form: np(np(pojken)) --&gt; \[pojken\]. np(\[\]) --&gt; \[\]. vi(kom) --&gt; \[kom\]. adv(adv(idag)) --&gt; \[idag\]. adv(\[\]) --&gt; \[\]. The categories np and adv may be empty (\[\]). The verb is obligatory.</Paragraph>
    <Paragraph position="16"> Diderichsen's &amp;quot;fundament&amp;quot; (&amp;quot;fund&amp;quot;) is an initial position unspecified as a syntactic category. Both an np and an adverb may occur as fundament in our simple example, so the following two fundament rules are therefore needeA: fund(F) --&gt; np(F)./* an np is fundament */ fund(F)--&gt; adv(F)./* an adv is fundament */ As can be seen, the schema would be overgenerating if no co-occurrence restrictions were introduced. Such restrictions or conditions are written within curly brackets ({ }) in Definite Clause Grammar, and they state which conditions are to hold on the variables specified. (Variables begin with capital letters in Prolog.) Two alternatives are shown with examples. The first alternative occurs if the fundament is an np: np(_,\[Fund\],\[\]). In that case no second np (Np2) can be found after the intransitive finite verb. (This is our way of stating that an np has been fronted). In addition to the co-occurrence restrictions, the sample rules illustrate how information about functional roles and topic is stated. In the first case the fundament (Fund) is assigned the functional role of subject. The value of the fundament is also assigned to the Topic variable (T).</Paragraph>
    <Paragraph position="17"> In the second alternative, given after semicolon (;), an adverb is the fundament: adv(_,\[Fund\],\[\]). Then there must be an Np2 (Np2 cannot be empty: Np2\= \[\]). In that case the subject is assigned the value (Np2) and the adverb (Fund) is the topic of the sentence. The value of tile adverb (Fund) is also assigned to the adverbial (Advl) of the functional representation. In both cases the Pred is assigned the value (V) of the verb, and in both cases the mode of the sentence is declarative, why M(ode) is set at d(eclarative). The two examples would both receive the following functional representation: s(subj (pojken),pred(kom),advl(idag)).</Paragraph>
    <Paragraph position="18"> This functional representation agrees with the standard format of Referent Grammar used in machine translation. The order in an RG functional representation is fixed: subject, predicate, dative obj, direct object, sentence adverbials, other adverbials.</Paragraph>
    <Paragraph position="19"> As can be seen there are slots for Mode, Topic and Functional representation with &amp;quot;sent&amp;quot;. The output of the parsing of a sentence 3 338 is information about mode, topic and the functional representation. In more advanced and extensive rules, information about clause type and defectiveness (in order to handle the percolation of missing constituents) is also gathered in additional slots with &amp;quot;sent&amp;quot;. A generalized word order schema for Swedish Generalizing from the word and constituent orders found in Swedish one may suggest the following basic rule for main clauses: sent(M,C1 type ,Defect,T,F repr) --&gt; fund(Fund), idag \[\] pojken v(V), kom gav lovade sadv(Sadv2), inte np(Np2), pojken pojken flickan sadv(Sadv3), inte np(Np3), flickan prediv(Prediv), np(Np4), hunden sunt(Sunt), att g~i adv(Adv2), idag The Swedish examples to the right show how slots may be filled differently: &amp;quot;Idag kom inte pojken&amp;quot; (Literally: Today came not the boy), &amp;quot;Gay pojken inte flickan hunden idag?&amp;quot; Literally: Gave the boy not the girl the dog today?),&amp;quot;Pojken lovade flickan att gfi&amp;quot; (Literally: The boy promised the girl to go). &amp;quot;Sunt&amp;quot; is the category containing subordinate clauses and minor (infinitive or participial) clauses.</Paragraph>
    <Paragraph position="20"> Compared to Diderichsen's model there is a longer sequence of categories, and non-finite verbs are treated as subordinate clauses. Chomsky and his followers try to define functional roles configurationally, but our approach is rather a formalization of Diderichsen's verbal descriptions. The functional representation is built as a list in the more advanced versions, but we will not go into such technical details here.</Paragraph>
    <Paragraph position="21"> The following are further illustrations of the conditions needed:  M--d}./* pojken lovade flickan att gft */ The first condition states that if there is nothing (Fund=\[\]) before a doubly transitive finite verb (vtt), the mode must be &amp;quot;q(uestion)&amp;quot; and the noun phrases are assigned the roles: subject, dative object (dobj) and direct object (obj) in that order. This covers our example &amp;quot;Gay pojken (inte) flickan hunden idag?&amp;quot; (Literally: Gave the boy (not) the girl the dog today?&amp;quot;). The second alternative (after ;) shows the case of &amp;quot;verba dicendi&amp;quot; (vd) as in &amp;quot;Pojken lovade flickan att ggt&amp;quot; (Literally: The boy promised the girl to go). In that case the first noun phrase after the finite verb (Np2) is taken as a dative object and the infinitive clause represented by &amp;quot;Sunt&amp;quot; as the direct object.</Paragraph>
    <Paragraph position="22">  It is clear that there is a trade-off between the extension (generality) of the order schema and the co-occurrence restrictions. A very general schema requires many constraining restrictions, several simpler schemas require fewer restrictions, but the overall system grows bigger. Chomsky and his followers seem to prefer to use one schema to cover all types of clauses in order to catch as many generalizations as possible. The node name 4 339 &amp;quot;comp(lementizer)&amp;quot; clearly stems from subordinate clauses, but it has been generalized to all sentences in GB. Diderichsen used one general schema for all types of main sentences, but a separate schema for subordinate clauses. For a general discussion of the potential of positional systems in syntax, morphology and phonology see Brodda &amp; Karlgren, 1964.</Paragraph>
    <Paragraph position="23"> Some of our restrictions and constraints on the value of certain variables and co-occurrence of constituents, etc. can be related to the constraining principles and filters used in GB.</Paragraph>
    <Paragraph position="24"> Swedish subordinate clauses differ from main clauses by having the sentence adverbs before the finite verb, and generally subordinate clauses are characterized by initial complementizers, such as subjunctions, infinitive markers o1&amp;quot; relative pronouns. In the current implementation subordinate clauses are treated by separate rules. In Swedish, almost all information about clause type, topic, and mode is to be found in the positions before the finite verb.</Paragraph>
    <Paragraph position="25"> It is clear that the GWOG model suits the Nordic and Germanic languages well with their finite verb second and fairly fixed word order, but not languages with fairly free word order (e.g Slavic languages) where the schema must allow for almost any combination of the words.</Paragraph>
    <Paragraph position="26"> The program illustrated works nicely for' analysis, but when used for synthesis (generation) further conditions are needed and the components have to be rearranged somewhat. The program may be considered as an alternative to Pereira's Extraposition grammar (1981).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML