XML Viewer - w98-0211

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0211_metho.xml
Size: 16,301 bytes
Last Modified: 2025-10-06 14:15:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0211">
  <Title>How to build a (quite general) linguistic</Title>
  <Section position="5" start_page="76" end_page="78" type="metho">
    <SectionTitle>
3 Design
</SectionTitle>
    <Paragraph position="0"> We provide in this section a high-level specification of the editor. Details of implementation are given in section 5.</Paragraph>
    <Section position="1" start_page="76" end_page="77" type="sub_section">
      <SectionTitle>
3.1 Assumptions
</SectionTitle>
      <Paragraph position="0"> We make two basic assumptions. First, the well-formedness of diagrams is stated in terms of a context free grammar. This point will be illustrated below. Such an assumption is entirely in accord with practice in the areas of the specification of syntax and semantics of linguistic and semantic formalisms, including the graphical conventions used by such formalisms. Second, there is a small set of graphical primitives to state the layout of diagrams and a means for labelling parts of diagrams. Our context free assumption above means that, generally, we can require the layout problem to be deterministic for each proper subpart of a diagram and thus for diagrams as a whole, as well.</Paragraph>
      <Paragraph position="1">  Our current specification makes use of three kinds of primitives. Leaf elements which specify the typeface in which to set a sequence of characters, for example plain, italic, et cetera (a total of seven). Shape primitives surround a single figure, for example with brackets of various kinds or with a box or circle and so on (a total of five). Layout primitives arrange one or more figures into larger diagrams, and these provide for vertical, horizontal, tree and array layouts (six primitives). 1 So, leaf primitives are fully specified by a series of characters; layout primitives take one or more operands each of which may be any of the primitives; shape primitives require a single operand. 2 These primitives have been selected on the grounds of generality, while preserving the property that layout is deterministic.</Paragraph>
      <Paragraph position="2">  In addition to specifying layout, we also need to indicate when a type of diagram has variable subparts, and what types of diagram may appear in those subparts. To take a particular example, we may wish to say that a drs consists of a universe, which is a collection of referents, and its conditions. Each of the conditions may be atomic, an implication or of still other types. As a point of terminology, where any number of diagrams may appear in a particular location, we will say that the diagrams that may occur there represent a repeating type.</Paragraph>
      <Paragraph position="3"> Each of the elements in italic above indicates the type of a particular subpart of a larger diagram, and constitute a context free rule relating a diagram and its subparts. In the abstract (i.e. ignoring details of layout) and with the usual interpretation of Kleene star, we end up with the following characterization: 3  (1) drs ~ referent* condition*  In order for the content of a diagram to be interpretable, we allow the subparts of a diagram to be named, for example (and again in the abstract): (2) drs -4 universe:referent* conditions:condition* The names of subparts must be unique within any one type of diagram. All that remains is for such specifications to include layout information. A possible specification would then be as follows, where square brackets delimit sequences of specifications, and hbox and vbox provide horizontal and vertical fin general, these primitives may take options to control details of layout, for example the selection of smaller or larger fonts, or alignment within layouts. In examples here, these options have been suppressed for clarity. Similarly, primitives for controlling the appearance of branches and horizontal and vertical padding are not described here.</Paragraph>
      <Paragraph position="4"> Available tree layouts include the &amp;quot;standard&amp;quot; vertical orientation commonly used in linguistic presentations, and horizontally disposed dendro- (or clado-)grams.</Paragraph>
      <Paragraph position="5">  given in section 5 below.</Paragraph>
      <Paragraph position="6">  dispositions.</Paragraph>
      <Paragraph position="8"> In some cases (see for example the treatment of trees shown in the Appendix), more than one type of diagram may appear in some position in a diagram.</Paragraph>
      <Paragraph position="9"> In this case, one may specify a 'union' of diagram types. Overall (and ignoring labels), a grammar of diagrams allow two kinds of production rules:</Paragraph>
      <Paragraph position="11"> where M and N are non-terminal symbols and the rewrite for any non-terminal is unique. C is a non-terminal or terminal symbol. 4 The first states that a diagram of type M consists exactly of subdiagrams of types C1...Cm. The second, a diagram union states that diagram types C1...C, are alternative ways of realizing a diagram of type N. It is clear that any context free grammar can be rewritten so as to fall within this class. This choice of organization contributes greatly to the simplicity of the editor's user interface.</Paragraph>
      <Paragraph position="12"> The labelling of subparts of a diagram allows the content of a diagram to be represented in terms of sets of paths through the diagram. In general, a path is a sequence of elements of one of the following forms (where t is a diagram type, v the name of a subpart and n an integer): (5) tv ~vn The first assigns a diagram type and picks out a subpart of the diagram. The second references the nth diagram within a repeating type. A path may be terminated by a pair t s where s is a sequence of characters. So, a path such as (6) drs conditions i implication left refers to the LHS DRS in an implication which appears as the (say) first element in tile conditions of a DRS. Similarly (7) drs universe 1 id &amp;quot;x&amp;quot; identifies the content of the first referent in a DRS's universe.</Paragraph>
      <Paragraph position="13"> Ultimately, this type of specification is interestingly reminiscent of proposals for &amp;quot;rule-to-rule&amp;quot; semantics, for example (Gazdar et al, 1985), where 4 For completeness, a treatment of terminals is required and can be given straightforwardly in terms of arbitrary sequences over a limited alphabet.</Paragraph>
      <Paragraph position="14"> the interpretation (and in our case that can be taken to mean &amp;quot;graphical interpretation&amp;quot;) of a structure is given in terms of a function of its subparts. More practically, one effect of the restriction to context free rules is that it is extremely easy to generate an SGML document type definition (DTD) (Goldfarb, 1990) for the content of a particular class of diagrams. This at once provides a validator for data that the editor may be expected to display and a means of specifying stream-based communication protocols between the editor and other applications. Needless to say, the existence of a declarative specification of diagram types goes a long way towards avoiding the problem of obsolescence. In our implementation, SGML is used as the 'persistence format' for user's data.</Paragraph>
    </Section>
    <Section position="2" start_page="77" end_page="77" type="sub_section">
      <SectionTitle>
3.2 User interface
</SectionTitle>
      <Paragraph position="0"> One of the most obvious benefits of the above assumptions is that the range of possible actions a user may perform on a diagram is extremely limited, regardless of how complex a class of diagrams is. In general, the actions of the user consist only of selecting a subpart of a diagram and choosing one of the diagram types allowed at that point or of performing some other action on the selected subpart.</Paragraph>
      <Paragraph position="1"> Notice how the grammar is used to constrain the range of possible types at any one location. The only &amp;quot;structure-based&amp;quot; editors we are aware of with comparable generality are those, such as psgml (Staflin, 1996), which interpret an SGML DTD to determine allowable material in a context dependent way.</Paragraph>
      <Paragraph position="2"> The virtues of this simplicity should be obvious, but are worth stating. First, for educational purposes, users unfamiliar with some class of diagrams are explicitly guided through possible choices, in a way which provides immediate feedback on the consequence of choices. Second, this form of interaction is efficient. Effectively, the user provides all and only that information required to fully specify a diagram.</Paragraph>
      <Paragraph position="3"> Finally, there will be a corresponding simplicity in the relationship of the editor with a back-end processor controlling the operations of the editor for the purpose of animating operations over diagrams.</Paragraph>
    </Section>
    <Section position="3" start_page="77" end_page="78" type="sub_section">
      <SectionTitle>
3.3 Limitations
</SectionTitle>
      <Paragraph position="0"> There are substantial restrictions in the design we propose. There are many classes of diagrams used in linguistics which are more complex than trees, for example autosegmental diagrams, cf. (Bird and Klein, 1990), state transition diagrams, as used in finite state morphology, or the networks of Systemic Functional Grammar. In order to support the construction of diagrams in those particular areas, more complex systems are inevitably required. Our proposal is not intended to be so general, for precisely the reasons and benefits discussed above.</Paragraph>
      <Paragraph position="1"> On the other hand, there are other limitations  closer to home. A natural operation over attributes in an AVM is to order them (and their values) in some way. Similarly, an AVM editor might allow type constraints as discussed in (Carpenter, 1992) to be automatically verified. One might build such information into a diagram specification (and it may be feasible in some cases to do so automatically).</Paragraph>
      <Paragraph position="2"> These limitations stem from the essential part of our design which separates clearly the graphical conventions at use in some class of diagrams from the interpretation of the content of diagrams. Under that view, if one requires some formally equivalent, but graphically different representation of some information, it makes sense for the determination of equivalence to be made by a processor dedicated to a particular formalism. In other words, issues to do with the interpretation of a diagram are not to be decided by the editor. It&amp;quot; is our opinion that the benefits fully justify this distinction.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="78" end_page="78" type="metho">
    <SectionTitle>
4 Applications
</SectionTitle>
    <Paragraph position="0"> This system has been used to deliver drilling materials to undergraduates studying syntactic trees and a simplified form of DRT. Figure 6 in the appendix below shows how an editor based on the relevant class of diagrams. Experiments reveal (Cox et al, 1998) that viewing dynamic diagrams (perhaps with an accompanying discussion by one or more people) enhances performance significantly on tasks such as syntactic category labelling and tree construction.</Paragraph>
    <Paragraph position="1"> This enhancement is seen even when the grammar rules and categories are novel, and is (most intriguingly) still significant if no verbal explanation of the diagrams is provided.</Paragraph>
    <Paragraph position="2"> We have also provided an interface to a locally developed tokenization engine. This tool provides a graphical interface to complex rules. Off-the-shelf technology, in the form of an SGML processor (Thompson and McKelvie, 1996), provides a simple mapping to the format required by the tokenizer. We have developed (on the basis of (Smithers, 1997)) a treatment of diagrams in (Pollard and Sag, 1994), used to construct Figure 1.</Paragraph>
    <Paragraph position="3"> Finally, we have provided Web-based visualization tools for a major corpus of dialogues (Anne Anderson et al, 1991).</Paragraph>
    <Paragraph position="4"> Other classes of diagrams for which we have provided reasonably comprehensive grammars are: trees with unlimited branching and multipart node labels; categorial derivations in alternative styles; metrical trees; cladistic or cluster diagrams.</Paragraph>
    <Paragraph position="5"> There are many other kinds of applications which can be envisaged for such a system. Here we mention just a few. The &amp;quot;derivation checkers&amp;quot; or tree editors of (Calder, 1993) and (Paroubek et al, 1992) can be viewed as a mode in which each action by a user is verified for consistency with respect to a grammar.</Paragraph>
    <Paragraph position="6"> Recasting that mode within the context of delaying systems for the interpretation of constraint-based formalisms (e.g. (DSrre and Dorna, 1993)) would provide a debugger in which the grammar writer could perform an instantiation and view the results, perhaps in an animated fashion. On the other hand, the &amp;quot;off-line&amp;quot; construction of trees would provide a way of querying tree banks in a more perspicuous way than via the manual construction of a query in some query language.</Paragraph>
  </Section>
  <Section position="7" start_page="78" end_page="79" type="metho">
    <SectionTitle>
5 Implementation
</SectionTitle>
    <Paragraph position="0"> The system described here has been implemented in Java. Figure 6 is a screen capture of an editor instance using a diagram class specification very much like that given in the Appendix. There, a tree has been constructed and a partial conversion of another tree to a DRS has been performed. In this implementation, a box containing an ellipsis indicates a position permitting one or more occurrence of a diagram type or types, a box containing a question mark indicates a location allowing a single occurfence of the available types, and a question mark on its own indicates a location where characters may appear. In the state shown in the figure, the lowest ellipsis (i.e. the one immediately below 'Pip') is selected. The state of the buttons labelled by diagram type names reflect the choice open to the user at that position in structure. On instantiating a diagram at a location marked by an ellipsis, a new diagram is introduced and the location of the ellipsis moved rightward or downward according to the enclosing layout? Ellipses may be hidden (or revealed) by choosing the option Show ... (or Hide ...).</Paragraph>
    <Paragraph position="1"> The operation Kill allows the deletion of any selected diagram, while Yank will be available if the most recently deleted material is of a type compatible with the currently selected position. Other operations include preparing a printable form of the image or a DTD for the class of diagrams.</Paragraph>
    <Paragraph position="2"> We use a function-like syntax to indicate the primitives and their operands. To indicate how drawing  use of a description of a diagram and could be processed by th4 editor to draw a subtree of the tree on the left of Figure 6.</Paragraph>
    <Paragraph position="3"> A diagram type is specified by means of a statement such as shown in Figure 3. (Further examples are given in the Appendix.) A variable subpart of a diagram is indicated by the syntax var(name, type). That is, a diagram of the stated type may appear in this position and be referred to by the stated name. The use of square brackets, as in both uses of vat above, is equivalent to the Kleene star in the abstract formulation of section 3.1.2, i.e. any number of diagrams of that type may occur at this position. As a further illustration, consider the definitions shown in Figure 4. As their names suggest, the first of these limits the daughters of a tree to two, while the second allows any number of daughters. The last line illustrates the concrete syntax for diagram unions.</Paragraph>
    <Paragraph position="4"> diagram_spec(two_branch, tree(\[var(mother, category), vat(left, leaf or_tree), vat(right, leaf_or_tree)\]) diagram_spec(arbitrarytree, tree(\[var(mother, category), oar(daughter, \[leaf_or_tree\])\])) diagram_union(tree_top, \[one_branch, two_branch\] )</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML