File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2144_metho.xml
Size: 14,782 bytes
Last Modified: 2025-10-06 14:07:15
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2144"> <Title>Thistle and Interarbora</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Diagrams play a crucial role in (computational) linguistics, in presenting analyses and characterizing fragments of theories. This role has not to date been adequately supported by programs for the creation, maintenance and delivery of diagrams. We conjecture that this has to do with three main factors. First, in a changing field, obsolescence may be a concern. Second, it may be diflicult to see how to provide a uniform interface to an appropriately wide range of kinds of diagrams. Third, integration with delivery systems may be difficult to achieve. We argue below that the design of the Thistle diagram editor provides mechanisms for obviating each of these problems. We start with a brief description of the design of the editor, stressing design decisions that avoid the problems just mentioned. We then turn briefly to some implementation details, before describing and exemplifying the classes of diagrams which have been developed so far. We end with a discussion el'current attd future directions for this work. All of the examples can be accessed on-line I .</Paragraph> <Paragraph position="1"> Some of our practical considerations are worth emphasising. First, we aim for typographic quality as close as possible to standard print presentations of the diagrams in use. The diagrams shown in this paper are presented using the PostScript generated by Thistle. They have essentially the same form as delivered by a web browser. Second, the system should be lightweight in several senses. It should be usable without specialist knowledge of the diagrams in question. The user inte> \[~lce should be simple. It should be deployable with minimal assumptions about the hosting enviromnent. These considerations mean that other programs for manipulating diagrams, such as more general purpose graph edi-I hll P://wwwdtg'ed'ac'uk/sdegflware/t hi stle tot's (for example daVinci 2, DiaGen (Viehstaedt and Minas 1995) or VGJ 3) are generally unsuitable, as are more complex tools for data annotation, such as the MATE workbench (Dybkjzer at al 2000). Such systems may of course be able to present more complex diagrams than Thistle, or offer alternative functionality.</Paragraph> <Paragraph position="2"> Crucial to the simplicity of Thistle is the assumption that many diagram classes of interest can be characterized using only context fi'ee methods. As we will demonstrate below, this assumption is consistent with a usefully wide range of classes. We first discuss motivation for the design of Thistle, and describe the gramnmrs that characterize classes of diagram. We then discuss briefly some example classes and the lnterarbora service. After giving details of the current implementation and recent enhancemenls, we describe the settings in which these tools have been exploited. Finally, we describe our current work, and possible strategies lot&quot; usefully broadening the kinds of diagram that Thistle can describe.</Paragraph> </Section> <Section position="4" start_page="0" end_page="993" type="metho"> <SectionTitle> 2 Design </SectionTitle> <Paragraph position="0"> Thistle is a parametcrizable diagram editor. A class of diagrams is selected by providing Thistle with a gramnmr which characterizes the diagrams of interest. The grammar describes the hierarchical structure o1' diagrams, and provides information about layout.</Paragraph> <Paragraph position="1"> Gramnmrs for diagram classes utilize a particular form of context free gramnmr, in which there arc two kinds of statement. In the first, the left hand side of a rule names a particular type of diagram, and its rewrite describes the abstract structure and concrete layout of a diagram type. In the second, the rewrite is a set of names of other diagram types, representing a disjunctive choice between the latter. Left hand sides are required to be uniqne throughout. (It is straightforward to show that any context free grammar can be encoded in this term.) Figure 1 shows a fragment of the grammar used to generate the diagram in Figure 2. This fragment can be used to analyse that part of the diagram expressing the wdue of the feature CONTENT.</Paragraph> <Paragraph position="3"> The lirst and third statements here express the hierarchical structure of and layout of attribute-value matrices.</Paragraph> <Paragraph position="4"> One can gloss the first as: &quot;A diagram of type plain_avm consists of any number of diagrams o1' type awn_line. 4 Tile subdiagrams are arranged vertically and enclosed by a Imir of sqtmre brackets.&quot; In other words, var elemenls stand for a variable subpart of a diagram and indicate the type o1' diag,'am that can appear at that location. Note that such elements also assign a label to each wu'iable sublmrt. The second statement above indicates lhat a diagram of type arm_line can be realized as either o1' the named types. The fourth statement indicates how diagram types may introduce sequences of characters.</Paragraph> <Paragraph position="5"> This form of CFG leads directly to a user interface based on top-down rewriting, 5 where a rule of the first kind is invoked, leading to choices in the diagrams introduced as subparts, and so on. In practical terms, then, given a class of diagrams, a particular instance may be consmmted by selecling a location in a diagram, and choosing among the possible types of diagram for lhat location. What the user sees on the surthce is a WYSI-WYG presentation o1' the consequences of tile parlicular arrangements of diagram types.</Paragraph> <Paragraph position="6"> These aspects of the design address at once problems of obsolescence and of providing a uniform user interface. In order to provide a new class of diagram, one has only to construct a grammar for tllat class, providing lhe class is amenable to context free treatment (see SS6 below). We make use of existing standards in lackling the problem of integrating with other systems. Any instance of the editor may be used via a web browser, so that local installation of software is not essential.</Paragraph> <Paragraph position="7"> The graphical presentation ol'a diagram may be saved in PostScript, while the logical content of a diagram is stored as SGML. 6 The precise format of a diagram's logical content exploits the fact that each variable subpart ot' a diagram is assigned a unique name.</Paragraph> <Paragraph position="8"> In addition to the construction of static diagrams, This- null tie may also be used to construct step-time sequences of diagrams. A 'diagram player' can be used to step through (or jump between) diagrams in the sequence. One example shows the states visited by a top-down backtracking parser, on some input and with respect to a given graminar. null</Paragraph> </Section> <Section position="5" start_page="993" end_page="993" type="metho"> <SectionTitle> 3 Example diagram classes </SectionTitle> <Paragraph position="0"> There is a wide range of diagram classes currently available, ranging from an essentially complete treatment of the diagrams in Pollard and Sag (1994) (Figure 2), and in Kamp and Reyle (1993) (Figure 3), to small but useful classes for diagrams from particular areas of linguistics, such as metrical trees and categoriaI derivations. There are also a number of generic diagram classes such as trees with unlimited or fixed branching.</Paragraph> </Section> <Section position="6" start_page="993" end_page="993" type="metho"> <SectionTitle> 4 Interarbora </SectionTitle> <Paragraph position="0"> construction and display o1' tree diagrams via Web browsers. The user supplies a tree specification as a labelled bracketted string, which is then analysed to produce a specification of a Thistle diagram for a simple diagram class. This information is then passed back to the Web browser, which computes a Thistle diagram for display.</Paragraph> <Paragraph position="1"> The analyser for braeketted strings attempts to be quite liberal. One target format that we handle successfully is that of the Penn Treebank 8. Figure 3 shows a simple example from Interarbora. As with the other diagrams in this paper, this example is lbrmatted here using Postscript generated by Interarbora. There is no discernable difference between this presentation and that delivered by a web browser Interarbora is described in more detail by Calder (2000).</Paragraph> </Section> <Section position="7" start_page="993" end_page="994" type="metho"> <SectionTitle> 5 Current status </SectionTitle> <Paragraph position="0"> The system described above is fully implemented and is available at no charge for non-commercial purposes. As our implementation platform is Java, there are relatively few portability issues. 9 In addition to the mode of operation described above, where a user selects a location in a diagram and chooses a type for that location, we have also investigated modes which are not strictly topdown. Such modes are essential in tasks such as annotation, where one has, for example, a given string or text to mark up. In lhis case, one is interested in adding to the (possibly minimal) existing structure, and this cannot be straightforwardly done under a pure top-down model.</Paragraph> <Paragraph position="1"> Consequently, we have added a range of operations over diagrams, including: split a sequence of characters is replaced by two (or more) of its subsequences with appropriate structural ad, iustments join the inverse of split demote a diagram is adjoined into the diagram at tim current location promote the diagram at the current location replaces its mother.</Paragraph> <Paragraph position="2"> There are a number of interesting points to these operations. First, the possibility of such operations is in general determined through grammatical inference. So it is not possible to split a sequence of characters in a location where only one such sequence is allowed by the grammar. Second, the demote operation is the exact analog of adjunction in Tree Adjoining Grammars (see e.g. Joshi et al, 1991). A demote operation is only allowed if the type of diagram at the current location is permitted within some other diagram type t and the type t is also permissible at the current location in structure. In general, having selected a location for a demote operation, there may be several ways of executing the operation. For example, the user may be asked to choose which daughter in a tinite branching local tree should receive the diagram at the currently selected location. Finally, these operations are not grammar specific, so that 9Our implementation predates later versions of Jawt which provide a tree abslraction, and so our current implementalion does not make use of lhis facility.</Paragraph> <Paragraph position="3"> the same kinds of operations arc awtilable, whether one is dealing with corpus annotation tools oi&quot; all editor for HPSG diag,'ams.</Paragraph> <Paragraph position="4"> 6 Current use and on-going and fnture work The system is in use in the support of teaching in a variety of settings. Cox et al results about the effectiveness of Thistle in teaching concepts to do with phrase structure and category membership. Understanding of these concepts seems to have been improved simply by viewing a video capture ot' trees being editing, lnterarbora is used at several institutions in junior level courses. We have used Thistle as a front end to a variety o1' rule formats, inch, ding those for the tokenization tool TTT (Grover et a12000). The diagram player has been used for the visualization o\[' the results of corpus searches in GSI~ARCll I0 and of dialogue slates, in concert with software developed in the TRINI)I project If.</Paragraph> <Paragraph position="5"> On-going SUl)port work includes changing the persistence formitt of diagrams fi'om SGML to XML, and bringing diagram classes within the same format. There arc a large number of minor improvements we inlcnd to make, including generalizing the Web interl:aces so tirol diagram classes and persistence fommts may be supplied by the user.</Paragraph> <Paragraph position="6"> Our current research has a number of aspects. The limitation to context free diagram classes simplifies many aspects of implementation, most notably in the area of layout. On the other hand, many diagram classes require greater thiul context free power for their adequate description. Important classes include stale transition diagrams, systemic functional networks and autosegmental dfitgrams. We are looking at COml~romises which will allow lhe construction and display of such diagrams while avoiding diflicult layout problems.</Paragraph> <Paragraph position="7"> Another a,'ea in which the context free assuml)lion is being examined has to do wilh diagrams where constraints such as equality are required to hold wiflfin a diagram. An example of this is the notion of proper binding in Discourse Representation Theory -- a variable occurring as an argument mr,st be appropriately introduced (and vice vetwa). A further example is the enforcement of appropriateness conditions within a typed feature framework. Strictly speaking, this case doesn't violate our context flee assumption, but encoding such conditions in a context free way is cumbersome. In these cases, we are interested in looking at ways of further constraining the content of diagrams. One possibility, which sits happily enough with Thistle's background of lbrmal langt, age theory, is to exploit the notion of path, a sequence of variable-type pairs. Any Thistle diagram corresponds to a set of such paths, and, because these are generated by a context free grammar, the language of such paths is regular. We could enforce apl~ropriateness in a typed feature setting, for example, by expressing further regular constraints over paths. Using greater than regular power would result in diagrams whose struclt, re was no longer context fi'ee.</Paragraph> <Paragraph position="8"> Other possibilities include looking at logics to express constraints over diagrams. We can view the set of paths as a model of some logical theory. As our diagrams are necessarily finite, this means that logical frameworks el: consklerable power could be invoked.</Paragraph> <Paragraph position="9"> One further clement of our work examines ways of providing programmatic control of diagrams, with applications in interactive diagram design, where a cooperating program may lill in details which are logically implied, and debugging o f complex represe ntation s.</Paragraph> </Section> class="xml-element"></Paper>