File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1402_metho.xml
Size: 9,910 bytes
Last Modified: 2025-10-06 14:10:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1402"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A generation-oriented workbench for Performance Grammar: Capturing linear order variability in Geran and Dutch</Title> <Section position="4" start_page="0" end_page="9" type="metho"> <SectionTitle> -- D-PATR for Unification Grammar (Kart- </SectionTitle> <Paragraph position="0"> tunen, 1986) or XTAG for Tree-Adjoining Grammars (Paroubek et al., 192) are early examples. However, a parser is not a convenient tol for checking whether the current grammar implementation licenses all and only the strings qualifying as well-formed expressions of a given input. Sentence generators that try out all posible combinations of grammar rules applicable to the current input, are better suited.</Paragraph> <Paragraph position="1"> Few workbenches in the literature come with such a facility. LinGO (Copestake & Flickinger, 200), for Head-Driven Phrase Structure Grammar, provides a generator in addition to a parser. For Tree Adjoining Grammars, several workbenches with generation components have been built: InTeGenInE (Harbusch & Woch, 204) is a recent example.</Paragraph> <Paragraph position="2"> Finetuning the grammar such that it neither over- nor undergenerates, is a major problem for semi-free word order languages (e.g., German; cf. Kallmeyer & Yon, 204). Working out a satisfactory solution to this problem is logically prior to designing a generator capable of selecting, from the set of all posible paraphrases, those that sound &quot;natural,&quot; i.e., the ones human speakers/writers would chose in the situation at hand (cf. Kempen & Harbusch, 204).</Paragraph> <Paragraph position="3"> Verb constructions in German and Dutch exhibit extremely intricate word order patterns (cf. Seuren & Kempen, 203). One of the factors contributing to this complexity is the phenomenon of clause union, which allows constituents of a complement clause to be interspersed between those of the dominating clause. The resulting sequences exhibit, among other things, cros-serial dependencies and clause-final verb clusters. Further complications arise from all sorts of 'movement' phenomena such as fronting, extraction, dislocation, extraposition, scrambling, etc. Given the limited space available, we cannot describe the Performance Grammar (PG) formalism and the linearization algorithm that enables generating a broad range of linear order phenomena in Dutch, German, and English verb constructions. Instead, we refer to Harbusch & Kempen (202), and Kempen & Harbusch (202, 203).</Paragraph> <Paragraph position="4"> Here, we present the generation-oriented PG Workbench (PGW), which assists grammar developers, among other things, in testing whether the implemented syntactic and lexical knowledge allows all and only well-formed permutations.</Paragraph> <Paragraph position="5"> In Section 2, we describe PG's topology-based linearizer implemented in the PGW generator, whose software design is sketched in Section 3. Section 4 shows the PGW at work and draws some conclusions.</Paragraph> <Paragraph position="6"> 2 Linearization in PG and PGW Performance Grammar (PG) is a fuly lexicalized grammar that belongs to the family of tree substitution grammars and deploys disjunctive feature unification as its main structure building mechanism. It adheres to the ID/LP format (Immediate Dominance vs. Linear Precedence) and includes separate components generating the hierarchical and the linear structure of sentences. Here, we focus on the linearization component.</Paragraph> <Paragraph position="7"> PG's hierarchical structures consist of unordered trees composed of elementary building blocks called lexical frames. Every word is head of a lexical frame, which specifies the subcategorization constraints of the word. Associated with every lexical frame is a topology. Topologies serve to assign a left-to-right order to the branches of lexical frames. In this paper, we wil only be concerned with topologies for verb frames (clauses). We assume that clausal topologies of Dutch and German contain exactly nine slots -- see (1).</Paragraph> <Paragraph position="8"> (1) Wat wil je dat ik doe? / what want you that I do /'hat do you want me to do?'</Paragraph> <Paragraph position="10"> Wat dat ik doe The slot labeled F1 makes up the Forefield (from Ger. Vorfeld); slots M1-M6 make up the Midfield (Mitelfeld); slots E1 and E2 define the Endfield (Nachfeld). Every constituent (subject, head, direct object, complement, etc.) has a small number of placement options, i.e. slots in the topology associated with its &quot;own&quot; clause.</Paragraph> <Paragraph position="11"> How is the Direct Object NP wat 'what' 'extracted' from the complement clause and 'promoted' into the main clause? Movement of phrases between clauses is due to lateral topology sharing. If a sentence contains more than one verb, each of their lexical frames instantiates its own topology. This applies to verbs of any type -- main, auxiliary or copula. In such cases, the topologies are allowed to share identically labeled lateral (i.e. leftand/or right-peripheral) slots, conditionally upon several restrictions (not to be explained here; but see Harbusch & Kempen, 202)).</Paragraph> <Paragraph position="12"> After two slots have been shared, they are no longer distinguishable; in fact, they are unified and become the same object. In example (1), the embedded topology shares its F1 slot with the F1 slot of the matrix clause. This is indicated by the dashed borders of the botom F1 slot. Sharing the F1 slots effectively causes the embeded Direct Object wat to be preposed into the main clause (black dot in F1 above the single arrow in (1)). The dot in E2 above the double arrow marks the position selected by the finite complement clause.</Paragraph> <Paragraph position="13"> The overt surface order is determined by a read-out module that traverses the hierarchy of topologies in left-to-right, depth first manner. E.g., wat is already seen while the reader scans the higher topology.</Paragraph> </Section> <Section position="5" start_page="9" end_page="10" type="metho"> <SectionTitle> 3 A sketch of PGW's software design </SectionTitle> <Paragraph position="0"> The PGW is a computational grammar development tol for PG. Writen in Java, it comes with an advanced graphical directmanipulation user interface. Al lexical and grammatical data have been encoded in a relational database schema. This contrasts with the predominance of hierarchical databases in present-day computational linguistics. Relational lexical databases tend to be easier to maintain and update than hierarchical ones, especially for linguists with limited programming experience. The software was designed with an eye toward easy cros-language portability of the encoded information. For German we developed a lexicon converter that maps the In order to convey an impression of the capabilities of the PGW, we show it at work in generating verb constructions that involve rather delicate linearization phenomena: &quot;Particle Hoping&quot; in Dutch (2), and &quot;Scrambling&quot; in German (3).</Paragraph> <Paragraph position="1"> The finite complement clause (2) includes the verb meezingen 'sing along with,' where me 'with' is a preposition functioning as separable particle. The three other verbs are auxiliaries. According to a topology sharing rule for Dutch, clauses headed by auxiliaries are free to share 4, 5 or 6 left-peripheral slots of their own topology with that of its complement. The most restrictive sharing option is shown in (2).</Paragraph> <Paragraph position="2"> (2) .. dat ze dit (lied) zouden kunen heben megezongen .. that they this (song) would be-able-to have along-sung '.. that they might have sung along this lands in M3 of the lowest topology. As this slot belongs to the four left-peripheral ones, it is always shared and its content gets promoted all the way up into the highest clause (see single arrows). Particle me always lands in the fifth slot (M4), i.e. in the optionally shared area. Hence, its surface position depends on the actual number of shared left-peripheral slots. In (2), with minimal slot sharing, mee stays in its standard position immediately preceding the head verb. In case of non-minimal topology sharing, the particle may move leftward until (but no farther than) the direct object, thus yielding exactly the set of grammatical placement options.</Paragraph> <Paragraph position="3"> The quality of PGW's treatment of Scrambling in German can be assessed in terms of a set of 30 word order variations of sentence (3), discused by Rambow (194), who also provides grammaticality ratings for all members of the set. Seuren (203) presents similar grammaticality judgments obtained from an independent group of native speakers. As the rating scores appeared to vary considerably (cf. (3a) and (3b)), we checked which permutations are actually generated by the PGW. It turned out easy to find a set of topology sharing values that generates all and only the paraphrases with high or satisfactory grammaticality scores.</Paragraph> <Paragraph position="4"> In conclusion, although the performance data discused here are very limited, we believe they justify positive expectations with respect to the potential of a topology-based linearizer to approximate closely the grammaticality judgments of native speakers and thus to avoid over- and undergeneration.</Paragraph> <Paragraph position="5"> (3) a. ... weil niemand das Fahrad zu reparieren zu versuchen verspricht because nobody the bike to repair to try promises '... because nobody promises to try to repair the bike' b. *...weil zu versuchen das Fahrad niemand zu reparieren verspricht</Paragraph> </Section> class="xml-element"></Paper>