File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1506_metho.xml
Size: 17,725 bytes
Last Modified: 2025-10-06 14:14:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1506"> <Title>The ConTroll System as Large Grammar Development Platform</Title> <Section position="4" start_page="39" end_page="39" type="metho"> <SectionTitle> 2 A graphical user interface </SectionTitle> <Paragraph position="0"> Two practical problems arise once one tries to implement larger grammars. On the one hand, the complex data structures of such grammars contain an overwhelming number of specifications which are difficult to present to the user. On the other hand, the interaction of grammar constraints tends to get very complex for realistic linguistic theories.</Paragraph> <Section position="1" start_page="39" end_page="39" type="sub_section"> <SectionTitle> 2.1 Data Structure Visualization </SectionTitle> <Paragraph position="0"> In ConTroll, the powerful graphical user interface Xtroll addresses the presentation problem. The Xtroll GUI programmed by Carsten Hess allows the user to interactively view AVMs, search attributes or values in those representations, compare two representations (e.g. multiple results to a query) and highlight the differences, etc. Fonts and Colors can be freely assigned to the attributes and types. The displayed structures (or any part of it) can be can be printed or saved as postscript file. The GUI comes with a clean backend interface and has already been used as frontend for other natural language applications, e.g., in the VER.BMOBIL project.</Paragraph> <Paragraph position="1"> A special feature of Xtroll is that it offers a mechanism for displaying feature structures as trees according to user specified patterns. Note that displaying trees is not an obvious display routine in ConTroll, since the system does not impose a phrase structure backbone but rather allows a direct implementation of HPSG grammars whic:h usually encode the constituent structure under DTRS or some similar attribute. Since trees are a very compact representation allowing a good overview of the structure, Xtroll allows the user to specify that certain paths under a type are supposed to be displayed in a tree structure. As labels for the tree nodes, Xtroll can display a user definable selection of the following: the feature path to the node, the type of the structure, the phonology, and finally an abbreviation resulting from matching user specified feature structure patterns. An example for such a tree output is shown in Fig. 2. In this tree, the abbreviations were used to display category information in an X-bar fashion.</Paragraph> <Paragraph position="2"> Clicking on the labels displays the AVM associated with this node. In the example, we did open three of the nodes to show the modification going on between the adjective sehnelles (fast) and the noun fahrrad (bike). Note that those attributes which are irrelevant to this matter were hidden by clicking on those attributes.</Paragraph> <Paragraph position="3"> The use of the fully user definable, sophisticated display possibilities of Xtroll in our experience have turned out to be indispensable for developing large typed feature based grammars.</Paragraph> </Section> <Section position="2" start_page="39" end_page="39" type="sub_section"> <SectionTitle> 2.2 A graphical debugger </SectionTitle> <Paragraph position="0"> The second problem is addressed with a sophisticated tracing and debugging tool which was developed to allow stepwise inspection of the complex constraint resolution process.</Paragraph> <Paragraph position="1"> The debugger displays the feature structure(s) to be checked for grammaticality and marks the nodes on which constraints still have to be checked. As a result of the determinacy check, each such node can also be marked as failed, delayed or deterministic.</Paragraph> <Paragraph position="2"> Similar to standard Prolog debuggers, the user can step, skip, or fail a constraint on a node, or request all deterministic processing to be undertaken. An interesting additional possibility for non-deterministic goals is that the user can inspect the matching defining clauses and chose which one the system should try.</Paragraph> <Paragraph position="3"> For example, in Fig. 3, the selected goal with tag \[\] is listed as delayed and is displayed at the bottom to have two matching defining clauses out of seven possible ones. Using the mouse, the user can chose to display the matching or all defining clauses in separate windows.</Paragraph> <Paragraph position="4"> We believe that the availability of a sophisticated debugger like the one implemented for the ConTroll system is an important prerequisite for large scale grammar development.</Paragraph> <Paragraph position="6"/> </Section> </Section> <Section position="5" start_page="39" end_page="42" type="metho"> <SectionTitle> 3 Grammar Organization Issues </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="39" end_page="39" type="sub_section"> <SectionTitle> 3.1 A modular grammar file organization </SectionTitle> <Paragraph position="0"> To organize grammars in a modular fashion, it is important to be able to distribute the grammar into several files to permit modification, loading, and testing of the different parts of a grammar separately. Also, only a modular file organization allows for distributed grammar development since such an organization makes it possible to coordinate the work on the individual files with software engineering tools such as a revision control system.</Paragraph> <Paragraph position="1"> ConTroll supports the use of a grammar configuration file which can contain basic directory and file specifications, as well as a more sophisticated system allowing the linguist to specify the dependencies between signature, theory and lexicon files.</Paragraph> <Paragraph position="2"> To find out which signature, theory, and lexicon is supposed to be used and in which directories the files are supposed to be found, the system looks for a grammar configuration file. If such a file is not found, default names for the signature and the theory file are used. If there is a configuration file, it can specify the theory, signature, and lexicon files to be used, as well as the relevant directories. The downside of this explicit mode of specification is that each time one wants to load a part of the grammar, e.g.</Paragraph> <Paragraph position="3"> for testing, one needs to realize which files needed are needed to test this part of the grammar in order to be able to list them explicitly. While this might seem like a trivial task, our experience has shown that in a distributed grammar development environment such a complete specification requires significant insight into the entire grammar.</Paragraph> <Paragraph position="4"> ConTroll therefore provides a more intelligent way of specifying the files needed for a module by allowing statements which make the dependencies between the different theory, signature, and lexicon files explicit. These specifications were modeled after the makefiles used in some programming environments. Once the dependencies are provided in the configuration file, selecting the parts of the grammar to be loaded can be done without having to being aware of the whole grammar organization by specifying one file for each module of the grammar needs to be included. The signature, theory, and lexicon files which are needed for the selected files are then automatically loaded according to the dependency specifications.</Paragraph> </Section> <Section position="2" start_page="39" end_page="41" type="sub_section"> <SectionTitle> 3.2 Automatic macro detection </SectionTitle> <Paragraph position="0"> When writing a typed feature structure based grammar one usually wants to abbreviate often used feature paths or complex specifications. In ConTroll this can be done using the definite clause mechanism. However, from a processing point of view, it is inefficient to treat macros in the same way as ordinary relations. We thus implement a fast, purely syntactic preprocessing step that finds the relations that can be treated as macros, i.e., unfolded at compile time. These macro relations are then compiled into an internal representation during a second pre-processing step. When the rest of the grammar is parsed, any macro goal will simply be replaced by its internal representation.</Paragraph> <Paragraph position="1"> After the actual compilation, ConTroll closes the grammar under deterministic computation. This step must be carefully distinguished from the macro detection described above. A goal is deterministic in case it matches at most one defining clause, but a relation is a macro by virtue of its definition, irrespective of the instantiation of the actual calling goals. Of course, the macro detection step can be eliminated, since the deterministic closure will also unfold all macros. However, for our reference grammar, adding the macro detection step reduced compile times by a factor of 20. Thus, for large grammars, compilation without macros is simply not practical.</Paragraph> <Paragraph position="2"> Obviously, making automatic macro detection a property of the compiler relieves the grammar developer from the burden of distinguishing between macros and relations, thereby eliminating a potential source of errors.</Paragraph> </Section> <Section position="3" start_page="41" end_page="41" type="sub_section"> <SectionTitle> 3.3 Automatic macro generation </SectionTitle> <Paragraph position="0"> Since HPSG theories usually formulate constraints about different kind of objects, the grammar writer usually has to write a large number of macros to access the same attribute, or to make the same specification, namely one for each type of object which this macro is to apply to. For example, when formulating immediate dominance schemata, one wants to access the VFORM specification of a sign. When specifying the valence information one wants to access the VFORM specification of a synsem object. And when specifying something about non-local dependencies, one may want to refer to VFOI~M specifications of local objects.</Paragraph> <Paragraph position="1"> ConTroll provides a mechanism which automatically derives definitions of relations describing one type of object on the basis of relations describing another type of object - as long as the linguist tells the system which path of attributes leads from the first type of object to the second.</Paragraph> <Paragraph position="2"> Say we want to have abbreviations to access the VFOR.M of a sign, a synsem, local, cat, and a head object. Then we need to define a relation accessing the most basic object having a VFORM, namely head: vform_h(X-vforra) :== vforra:X.</Paragraph> <Paragraph position="3"> Second, (once per grammar) access_suffix and access_rule declarations for the grammar need to be provided. The former define a naming convention for the generated relations by pairing types with relation name suffixes. The latter define the rules to be used by the mechanism by specifying the relevant paths from one type of object to another. For our example the grammar should include the recipes shown in Fig. 4. This results in the macros shown in Fig. 5 to be generated.</Paragraph> <Paragraph position="4"> For a large grammar, which usually specifies hundreds of macros, this mechanism can save a significant amount of work. It also provides a systematic rather than eclectic way of specifying abbreviations in a grammar, which is vital if several people are involved in grammar development.</Paragraph> </Section> <Section position="4" start_page="41" end_page="42" type="sub_section"> <SectionTitle> 4.1 Lexical rules for a compact and </SectionTitle> <Paragraph position="0"> efficient lexicon encoding Lexical rules receive a special treatment in ConTroll. The lexical rule compiler implements the covariation approach to lexical rules (Meurers and Minnen, 1995). It translates a set of HPSG lexical rules and their interaction into definite relations used to constrain lexical entries. In HPSG, lexical rules are intended to &quot;preserve all properties of the input not mentioned in the rule.&quot; (Pollard and Sag, 1987, p. 314). The lexical rule compiler of the ConTroll system to our knowledge is the only system which provides a computational mechanism for such lexical rules by automatically computing the necessary B frame predicates accounting for the intended preservation of properties. Since the lexical rules do not need to be expanded at compile time, ConTroll is able to handle the infinite lexica which have been proposed in a number of HPSG theories.</Paragraph> <Paragraph position="1"> Constraint propagation is used as program transformation techniques on the definite clause encoding resulting from the lexical rule compiler (Meurers and Minnen, 1996). The relation between parsing times with the expanded (EXP), the covariation (cov) and the constraint propagated covariation (OPT) lexicon for a German HPSG grammar (Hinrichs, Meurers, and Nakazawa, 1994) can be represented as OPT : EXP : COV = 0.75 : 1 : 18. Thus, the lexical rule compiler results not only in a compact representation but also in more efficient processing of a lexicon including lexical rules.</Paragraph> </Section> <Section position="5" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.2 Incremental compilation and global </SectionTitle> <Paragraph position="0"> grammar optimization To keep development cycles short, a fast compiler is essential. Particularly when developing a large grammar, small changes should not necessitate the recompilation of the whole grammar - an incremental compiler is called for. This is relatively easy for systems where the compilation of individual pieces of code does not depend on the rest of the program.</Paragraph> <Paragraph position="1"> In ConTroll, this task is complicated for two reasons. 1. Interaction of universal constraints. If several different universal constraints apply to objects of the same type, the compiler will merge them together. Changing a single high-level constraint may thus necessitate the recompilation of large parts of the grammar.</Paragraph> <Paragraph position="2"> 2. Off-line deterministic closure. Since the grammar is closed under deterministic computation at compile time, a change in some relation entails recompilation of all clauses that have inlined a call to that relation, which in turn may lead to changes in yet other relations, and so on. Nothing less than the maintenance of a complete call graph for the whole grammar would enable the compiler to know which parts of the grammar need to be recompiled.</Paragraph> <Paragraph position="3"> We decided on a compromise for incremental compilation and made our compiler aware of the first sort of dependency, but not the second. This means that incremental recompilation is always done on the basis of the grammar before deterministic closure. Therefore, after incremental recompilation deterministic closure needs to be done for the whole grammar.</Paragraph> </Section> <Section position="6" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.3 Arbitrary multiple indexing of </SectionTitle> <Paragraph position="0"> grammar constraints ConTroll allows the specification of indexing information for predicates individually. This is comparable to the indexing of terms in relational databases, e.g., the SICStus Prolog external database (Nilsson, 1995). Figure 6 shows the definition of a two-place</Paragraph> <Paragraph position="2"> relation r including a typing declaration and two indexing instructions. Given a fully instantiated goal for the relation r, the run-time environment of ConTroll can deterministically pick the right clause without leaving behind a choice-point.</Paragraph> <Paragraph position="3"> The indexing mechanism not only works for relations, but also implicational constraints. Figure 7 shows possible indexing instructions for the lexical index(word,phon:hd:string).</Paragraph> <Paragraph position="4"> index(word,synsem:loc:cat:head:head).</Paragraph> <Paragraph position="5"> type word, namely for the phonological form, and the syntactic category.</Paragraph> </Section> </Section> <Section position="6" start_page="42" end_page="43" type="metho"> <SectionTitle> 5 Experience using the System </SectionTitle> <Paragraph position="0"> Our implementation has been tested with several smaller and one large (> 5000 lines) grammar, a linearization-based grammar of a sizeable fragment of German. The grammar was developed in a distributed fashion by eight people and consist of 57 files. It provides an analysis for simple and complex verb-second, verb-first and verb-last sentences with scrambling in the Mittelfeld, extraposition phenomena, wh-movement and topicalization, integrated verb-first parentheticals, and an interface to an illocution theory, as well as the three kinds of infinitive constructions (coherent, incoherent, thirdconstruction), nominal phrases, and adverbials (Hinrichs et al., 1997).</Paragraph> <Paragraph position="1"> With grammars this size, it is necessary to pay careful attention to control to achieve acceptable parsing times. With our Prolog based interpreter, parse times were around 1-5 sec. for 5 word sentences and 10-60 sec. for 12 word sentences. We are currently experimenting with a C based compiler (Zahnert, 1997) using an abstract machine with a specialized set of instructions based on the WAM (Warren, 1983; Ai-Kaci, 1991). This compiler is still under development, but it is reasonable to expect speed improvements of an order of magnitude.</Paragraph> </Section> class="xml-element"></Paper>