XML Viewer - c04-1001

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1001_metho.xml
Size: 26,215 bytes
Last Modified: 2025-10-06 14:08:39
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1001">
  <Title>Grammar Modularity and its Impact on Grammar Documentation</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Lexical-Functional Grammar
</SectionTitle>
    <Paragraph position="0"> LFG is a constraint-based linguistic theory (Bresnan (2001), Dalrymple (2001)). It defines different levels of representation to encode syntactic, semantic and other information.</Paragraph>
    <Paragraph position="1"> The levels that are relevant here are constituent structure (c-structure) and functional structure (fstructure). The level of c-structure represents the constituents of a sentence and the order of the terminals. The level of f-structure encodes the functions of the constituents (e.g. subject, adjunct) and morpho-syntactic information, such as case, number, and tense.</Paragraph>
    <Paragraph position="2"> The c-structure of a sentence is determined by a context-free phrase structure grammar and is represented by a tree. In contrast, the f-structure is represented by a matrix of attribute-value pairs. The structures are linked by a correspondence function (or mapping relation), called &amp;quot; -projection&amp;quot;. The (simplified) analysis of the sentence in (1)  illustrates both representation levels, see fig. 1.</Paragraph>
    <Paragraph position="3"> (1) Maria liest oft B&amp;quot;ucher  M. reads often books 'Maria often reads books' As an example, we display the CP rule in (2) (which gives rise to the top-most subtree in fig. 1). (2) CP ! NP C0 (&amp;quot;SUBJ)=# &amp;quot;=# The arrows &amp;quot; and # refer to f-structures; they define the -projection from c-structure nodes to fstructures. The &amp;quot;-arrow refers to the f-structure of  the mother node (= the CP), the #-arrow to the f-structure of the node itself (= NP, C0).2 That is, the above rule states that CP dominates an NP and a C0 node; the NP functions as the sub-ject (SUBJ) of CP, and C0 is the head of CP (sharing all features, by unification of their respective fstructures). null However, the NP preceding C0 may as well function as the direct (OBJ) or indirect object (OBJ2), depending on case marking. We therefore refine the CP rule by making use of disjunctive annotations, marked by curly brackets, cf. (3).</Paragraph>
    <Paragraph position="4">  (3) CP !</Paragraph>
    <Paragraph position="6"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Grammar Modularity
</SectionTitle>
    <Paragraph position="0"> Large grammars are similar to other types of large software projects in that modularity plays an important role in the maintainability and, hence, reusability of the code. Modularity implies that the software code consists of different modules, which in ordinary sofware engineering are characterized by two prominent properties: (P1) they are &amp;quot;black boxes&amp;quot;, and (P2) they are functional units.</Paragraph>
    <Paragraph position="1"> Black boxes Modules serve to encapsulate data and are &amp;quot;black boxes&amp;quot; to each other. That is, the input and output of each module (i.e. the interfaces between the modules) are clearly defined, while the module-internal routines, which map the input to the output, are invisible to other modules.</Paragraph>
    <Paragraph position="2"> Functional units Usually, a module consists of pieces of code that belong together in some way (e.g. they perform similar actions on the input).</Paragraph>
    <Paragraph position="3"> 2Whenever an arrow is followed by a feature, e.g. SUBJ, they are enclosed in parentheses, (&amp;quot;SUBJ).</Paragraph>
    <Paragraph position="4"> That is, the code is structured according to functional considerations.</Paragraph>
    <Paragraph position="5"> Modular code design supports transparency, consistency, and maintainability of the code. (i) Transparency: irrelevant details of the implementation can be hidden in a module, i.e. the code is not obscured by too many details. (ii) Consistency is furthered by applying once-defined modules to many problem instances. (iii) Maintainability: if a certain functionality of the software is to be modified, the software developer ideally only has to modify the code within the module encoding that functionality.</Paragraph>
    <Paragraph position="6"> In this way, all modifications are local in the sense that they do not require subsequent adjustments to other modules.</Paragraph>
    <Paragraph position="7"> Turning now to modules in grammar implementations, we see that similar to modules in ordinary software projects, grammar modules encode generalizations (functional units, property P2). However, we argue below that (certain) grammar modules are not black boxes (whose internal structure is irrelevant, property P1), because these generalizations encode important linguistic insights.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Grammar Modules
</SectionTitle>
      <Paragraph position="0"> Similarly to modules in ordinary software projects, modules in grammar implementations assemble pieces of code that are functionally related: they do this by encoding linguistic generalizations. A linguistic generalization is a statement about properties that are common to/shared by different constructions. A grammar module consists of a coherent piece of code that encodes such common properties and in this sense represents a functional unit. In a modularized grammar, all constructions that share a certain property should make use of the same grammar module to encode this property.</Paragraph>
      <Paragraph position="1"> Generalizations that remain implicit (i.e. generalizations that are not encoded by modules) are error-prone. If the analysis of a certain phenomenon is modified, all constructions that adhere to the same principles should be affected as well, automatically--which is not the case with implicit generalizations.</Paragraph>
      <Paragraph position="2"> Which sorts of modules can be distinghuished in a grammar implementation? In this paper, we limit ourselves to two candidate modules: (i) syntactic rules and (ii) macros.</Paragraph>
      <Paragraph position="3"> Syntactic rules Each syntactic rule, such as the CP rule in (3), can be viewed as a module. A syntactic category occurring on the right-hand side of a rule (e.g. NP in (3)) then corresponds to a module call (routine call); the f-structure annotations of such a category ((&amp;quot;SUBJ)=#) can be seen as the instantiated (actual) parameters that are passed to the routine. Groups of rules (e.g. CP, C0, and C) form higher-level modules: X0-projections.</Paragraph>
      <Paragraph position="4"> To sum up, syntactic rules can sensibly be viewed as modules (cf. also Wintner (1999), Zajac and Amtrup (2000)). Their internal expansion is irrelevant for the calling rule (property P1), and they form a linguistically motivated unit (property P2).3 Macros Grammar development environments (such as XLE, Xerox Linguistic Environment, described in Butt et al. (1999, ch. 11)) provide further means of abstraction to modularize the grammar code, e.g. (parametrized) macros and templates. Each macro/template can be viewed as a module, encoding common properties.4 An example macro is NPfunc in (4), which may be used by the closely related annotations of NPs in different positions in German, e.g. the annotations of NPs dominated by CP and by VP, cf. (5). (Macro calls are indicated by '@'.)  canonical black boxes. LFG provides powerful referencing means within global f-structures, i.e. f-structure restrictions are not (and can in general not be) limited to local subtrees. In a way, f-structure information represents what is called &amp;quot;global data&amp;quot; in software engineering: all rules and macros are essentially operating on the same &amp;quot;global&amp;quot; data structures. 4XLE macros/templates can be used to encapsulate c-structure and f-structure code. Moreover, macros/templates can be nested, and can thus be used to model constraints similar to type hierarchies (Dalrymple et al., To Appear).</Paragraph>
      <Paragraph position="5"> That is, NPfunc is used to encapsulate the alternative NP functions in German. This encoding technique has the advantage that the code is easier to maintain. For instance, the grammar writer might decide to rename the function OBJ2 by IOBJ. Then she/he simply has to modify the definition of the macro NPfunc rather than the annotations of all NPs in the code. Clearly, NPfunc represents a functional unit; the question of whether NPfunc is a black box to other modules, such as the syntactic rule CP, is addressed in the next section.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Code Transparency and Black Boxes
</SectionTitle>
      <Paragraph position="0"> The above example shows how macros can be used to encode common properties. In this way, the intentions of the grammar writer are encoded explicitly: it is not by accident that the NPs within the CP and VP are annotated by identical annotations. In this sense, the use of macros improves code transparency. Further, macros help guarantee code maintainability: if the analysis of the NP functions is modified, only one macro (NPfunc) has to be adjusted. null In another sense, however, the grammar code is now obscured: the functionality of the CP and VP rules cannot be understood properly without the definition of the macro NPfunc. Macro definitions may even be stacked, and thus need to be traced back to understand the rule encodings. In this sense, one might say that the use of macros hinders code transparency.5 null In order to distinguish these opposing views more precisely we introduce two notions of transparency, which we call intensional and extensional.</Paragraph>
      <Paragraph position="1"> Intensional transparency of grammar code means that the characteristic defining properties of a construction are encoded by means of suitable macros, i.e in terms of generalizing definitions.</Paragraph>
      <Paragraph position="2"> Hence, all constructions that share certain defining properties make use of the same macros to encode these properties (e.g. the CP and VP rules in (5)).</Paragraph>
      <Paragraph position="3"> Conversely, distinguishing properties of different constructions are encoded by different macros-even if the content of the macros is identical. Extensional transparency means that linguistic generalizations are stated &amp;quot;extensionally&amp;quot;, i.e. macros are replaced by their content/definition (similar to a compiled version of the code). The grammar rules thus introduce the constraints directly rather than by calling a macro that would introduce them (similar to the CP rule in (3)).</Paragraph>
      <Paragraph position="4"> 5The same argumentation applies to type hierarchies: to understand the functionality of a certain type, constraints that are inherited from less specific, related types must be traced back. Comparing both versions, the extensional version (3) may seem easier to grasp and, hence, more transparent. To understand the generalized version in (5), it is necessary to follow the macro calls and look up the respective definitions. Obviously, one needs to read more lines of code in this version, and often these lines of code are spread over different places and files. For instance, the CP rule may be part of a file covering the CP internal rules, while the macro NPfunc figures in some other file.</Paragraph>
      <Paragraph position="5"> Especially for people who are not well acquainted with the grammar, the intensional version thus requires more effort for understanding. In contrast, people who work regularly on the grammar code know the definitions/functionalities of macros more or less by heart. They certainly grasp the grammar and its generalizations more easily in the intensional version.</Paragraph>
      <Paragraph position="6"> One might argue that to know the name of a macro, such as NPfunc, often suffices to &amp;quot;understand&amp;quot; or &amp;quot;know&amp;quot; (or to correctly guess) the functionality of the macro. Hence, a macro would be a black box (whose definition/internal structure is irrelevant), similar to modules in ordinary software programs.</Paragraph>
      <Paragraph position="7"> However, there is an important difference between grammar implementations and canonical software programs: grammars encode linguistic insights. The grammar code by itself represents important information in that it encodes formalizations of linguistic phenomena (in a particular linguistic framework). As a consequence, users of the grammar are not only interested in the pure functionality (the input-output behaviour) of a grammar module.</Paragraph>
      <Paragraph position="8"> Instead, the concrete definition of the module is relevant, since it represents the formalization of a linguistic generalization.</Paragraph>
      <Paragraph position="9"> We therefore conclude that macro modules, such as NPfunc, are only defined by property P2 (functional unit), not by property P1 (black box).</Paragraph>
      <Paragraph position="10"> The criteria of maintainability and consistency clearly favour intensional over extensional transparency. We argue that the shortcomings of intensional transparency--namely, poorer readability for casual users of the grammar--can be compensated for by a special documentation structure.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Grammar Documentation
</SectionTitle>
    <Paragraph position="0"> In large software projects, code documentation consists of high-level and low-level documentation. The high-level documentation comprises information about the function and requirements of (high-level) modules and keeps track of higher-level design decisions (e.g. which modules are distinguished). More detailed documentation includes lower-level design decisions, such as the reasons for the chosen algorithms or data structures.</Paragraph>
    <Paragraph position="1"> The lowest level is that of code-level documentation. It reports about the code's intent rather than implementation details however, i.e. it focuses on &amp;quot;why&amp;quot; rather than &amp;quot;how&amp;quot;. For instance, it summarizes relevant features of functions and routines. A large part of the code-level documentation is taken over by &amp;quot;good programming style&amp;quot;, e.g. &amp;quot;use of straightforward and easily understandable approaches, good variable names, good routine names&amp;quot; (McConnell (1993, p. 454)).</Paragraph>
    <Paragraph position="2"> The level that is of interest to us is that of code-level documentation. In contrast to documentation of other types of software, grammar documentation has to focus both on &amp;quot;why&amp;quot; and &amp;quot;how&amp;quot;, due to the fact that in a grammar implementation the code in and of itself represents important information, as argued above. That is, the details of the input-output mapping represent the actual linguistic analysis. As a consequence, large parts of grammar documentation consist of highly detailed code-level documentation. null Moreover, the content/definition of certain dependent modules (such as macros) is relevant to the understanding of the functionality of the mother rule. Hence, the content of dependent modules must be accessible in some way within the documentation of the mother rule.</Paragraph>
    <Paragraph position="3"> One way of encoding such dependencies is by means of links. Within the documentation of the mother rule, a pointer would point to the documentation of the macros that are called by this rule. The reader of the documentation would simply follow these links (which might be realized by hyperlinks).6 However, a typical grammar rule calls many macros, and macros often call other macros. This hierarchical structure makes the reading of link-based documentation troublesome, since the reader has to follow all the links to understand the functionality of the top-most module.7 We therefore conclude that the structure of the 6Certain programming languages provide tools for the automatic generation of documentation, based on comments within the program code (e.g. Java provides the documentation tool Javadoc, URL: http://java.sun.com/ javadoc/). The generated documentation makes use of hyperlinks as described above, which point to the documentation of all routines and functions that are used by the documented module.</Paragraph>
    <Paragraph position="4"> 7Routines and functions in ordinary software may be hierarchically organized as well. In contrast to grammar modules, however, these modules are (usually) black boxes. That is, a reader of the documentation is not forced to follow all the links  documentation should be independent of the structure of the grammar code. We suggest a documentation method that permits copying of relevant grammar parts (such as macros) and results in a user-friendly presentation of the documentation.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 An XML-based Grammar
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Documentation Technique
</SectionTitle>
      <Paragraph position="0"> In our approach, grammar code and documentation are represented by separate documents. The documentation of a rule comprises (automatically generated) copies of the relevant macros rather than simple links to these macros. In a way, our documentation tool mirrors a compiler, which replaces each macro call by the content/definition of the respective macro. In constrast to a (simple) compiler, however, our documentation keeps a record of the macro calls (i.e. the original macro calls are still apparent). In the terminology introduced above, our documentation thus combines extensional transparency (by copying the content of the macros) with intensional transparency (by keeping a record of the macro calls).</Paragraph>
      <Paragraph position="1"> The copy-based method has the advantage that the structure of the documentation is totally independent of the structure of the code which is being documented.</Paragraph>
      <Paragraph position="2"> We propose an XML-based documentation method, i.e. the source documentation and the grammar code are enriched by XML markup. XSLT stylesheets operate on this markup to generate the actual documentation (e.g. an HTML document or a LaTeX document, which is further processed to to understand the functionality of the top-most module. result in a postscript or PDF file). The XML tags are used to link and join the documentation text and the grammar code. In this way, the documentation is independent of the structure of the code.</Paragraph>
      <Paragraph position="3"> Fig. 2 presents the generation of the output documentation. The source documentation is created manually in XML format (e.g. by means of an XML editor); the source grammar is written manually in LFG/XLE format. Next, XML markup is added to the source LFG grammar via Perl processing. Specific XML tags within the documentation refer to tags within the grammar code. The XSLT processing copies the referenced parts of the code to the output documentation.</Paragraph>
      <Paragraph position="4"> This approach guarantees that the code fragments that are displayed in the documentation are always up-to-date: whenever the source documentation or grammar have been modified, the output documentation is newly created by XSLT processing, which newly copies the code parts from the most recent version of the grammar.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Further Features of the Approach
</SectionTitle>
      <Paragraph position="0"> The described documentation method is a powerful tool. Besides the copying task, it can be exploited in various other ways, both to further the readibility of the documentation and to support the task of grammar writing (see also the suggestions by Erbach (1992)).8 Snapshots Grammar documentation is much easier to read if pictures of c- and f-structures illustrate the analyses of example sentences. XLE supports 8Except for the different output formats, all of the features mentioned in this paper have been implemented.</Paragraph>
      <Paragraph position="1"> the generation of snapshot postscript files, displaying trees and f-structures, which can be included in a LaTeX document. Note, however, that after any grammar modification, such snapshots have to be updated, since the modified grammar may now yield different c- and f-structure analyses.</Paragraph>
      <Paragraph position="2"> In our approach, snapshots are updated automatically: All example sentences in the source documentation are marked by a special XML tag. XLE snapshots are triggered by this markup and automatically generated and updated for the entire documentation, by running the XSLT stylesheet.</Paragraph>
      <Paragraph position="3"> Indices In our approach, the documentation does not follow the grammar structure but assembles grammar code from different modules. Moreover, documentation may refer to partial rules only (or macros). That is, the complete documentation of an entire rule can be spread over different sections of the documentation.</Paragraph>
      <Paragraph position="4"> User-friendly documentation therefore has to include an index that associates a grammar rule (or macro) with the documentation sections that comment on this rule. That is, besides referencing from the documentation to the grammar (by copying), the documentation must also support referencing (indexing) from various parts of the grammar to the relevant parts of the documentation.</Paragraph>
      <Paragraph position="5"> Again, such indices are generated automatically based on XML tags in our approach.</Paragraph>
      <Paragraph position="6"> Test-Suites Example sentences in the documentation can be used to automatically generate a testsuite. In this way, the grammar writer can easily check whether the supposed coverage--as reported by the documentation--and the actual coverage of the grammar are identical.</Paragraph>
      <Paragraph position="7"> It is also possible to create specialized test-suites. For instance, one can create a test-suite of interrogative NPs, by extracting all examples occurring within the section documenting interrogative NPs.</Paragraph>
      <Paragraph position="8"> Up to now, we have seen how to create and exploit XML-based grammar documentation. The next section addresses the question of how to maintain such a type of documentation.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Maintainability
</SectionTitle>
      <Paragraph position="0"> A grammar implementation is a complex software project and, hence, often needs to be modified, e.g.</Paragraph>
      <Paragraph position="1"> to fix bugs, to widen coverage, to reduce overgeneration, to improve performance, or to adapt the grammar to specific applications. Obviously, the documentation sections that document the modified grammar parts need to be modified as well.9 9As mentioned above, in some respects, the (output) documentation is updated automatically by our XML/XSLT-based In our approach, grammar code and documentation are represented by separate documents.</Paragraph>
      <Paragraph position="2"> Compared to code-internal comments, such codeexternal documentation is less likey to remain up-todate, because it is not as closely associated with the code. This section discusses techniques that could be applied to support maintenance of our XML-based documentation.</Paragraph>
      <Paragraph position="3"> We distinguish three types of grammar modifications. (i) An existing rule (or macro) is deleted. (ii) An existing rule is modified. (iii) A new rule is added to the code.</Paragraph>
      <Paragraph position="4"> In case (i), the XSLT processing indicates whether a documentation update is necessary: Any rule that is documented in the documentation is referenced by its 'id' attribute. If such a rule is deleted from the code, the referenced 'id' attribute does not exist any more. In this case, the XSLT processing prints out a warning that the referenced element could not be found.</Paragraph>
      <Paragraph position="5"> If, instead, rules are modified or added (cases (ii) and (iii)), utilities such as the UNIX command 'diff' can be applied to the output text files: Suppose that the grammar has been modified while leaving the documentation text untouched. Now, if the LaTeX files are newly generated, the only parts that may possibly have changed are the parts citing grammar code. These parts can be located by means of the 'diff' command. If such changes between the last and the current LaTeX files have occurred, these changes indicate that the surrounding documentation sections may need to be updated. If no changes have occurred, despite the grammar modifications, this implies that the modified parts are not documented in the (external) documentation and, hence, no update is necessary. By this technique, the grammar writer gets precise hints as to where to search for documentation parts that may need to be adjusted. null To sum up, maintenance of the documentation text can be supported by techniques that give hints as to where the text needs to be adjusted. In the scenarios sketched above, the grammar writer would first modify the grammar only and generate some new, temporary output documentation. Comparing the current with the last version of the output documentation would yield the desired hints. After an update of the documentation text, a second run of the XSLT processing would generate the final output documentation.</Paragraph>
      <Paragraph position="6"> approach. XSLT operates on the most recent version of the grammar, therefore all grammar-related elements within the output documentation that are generated via XSLT are automatically synchronized to the current grammar (e.g. snapshots).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML