Thistle and Interarbora 
Jo Calder 
University of Edinburgh 
Division of Informatics 
Language Technology Group 
2 Buccleuch Place 
Edinburgh Scotland EH8 9LW 
J. Calder@ed. ac. uk 
Abstract 
We present a system for manipulating a wide class of 
linguistic diagrams, which is configurable and extensi- 
ble, and allows deployment as a web-delivered system. 
A major theme of this work is the transfer of the devices 
of lbrmal grammar into the analysis and construction of 
diagrams. 
1 Introduction 
Diagrams play a crucial role in (computational) linguis- 
tics, in presenting analyses and characterizing fragments 
of theories. This role has not to date been adequately 
supported by programs for the creation, maintenance and 
delivery of diagrams. We conjecture that this has to do 
with three main factors. First, in a changing field, obso- 
lescence may be a concern. Second, it may be diflicult 
to see how to provide a uniform interface to an appropri- 
ately wide range of kinds of diagrams. Third, integration 
with delivery systems may be difficult to achieve. We ar- 
gue below that the design of the Thistle diagram editor 
provides mechanisms for obviating each of these prob- 
lems. We start with a brief description of the design of 
the editor, stressing design decisions that avoid the prob- 
lems just mentioned. We then turn briefly to some im- 
plementation details, before describing and exemplifying 
the classes of diagrams which have been developed so 
far. We end with a discussion el'current attd future direc- 
tions for this work. All of the examples can be accessed 
on-line I . 
Some of our practical considerations are worth empha- 
sising. First, we aim for typographic quality as close 
as possible to standard print presentations of the dia- 
grams in use. The diagrams shown in this paper are pre- 
sented using the PostScript generated by Thistle. They 
have essentially the same form as delivered by a web 
browser. Second, the system should be lightweight in 
several senses. It should be usable without specialist 
knowledge of the diagrams in question. The user inte> 
\[~lce should be simple. It should be deployable with min- 
imal assumptions about the hosting enviromnent. These 
considerations mean that other programs for manipulat- 
ing diagrams, such as more general purpose graph edi- 
I hll P://wwwdtg'ed'ac'uk/s°flware/t hi stle 
tot's (for example daVinci 2, DiaGen (Viehstaedt and Mi- 
nas 1995) or VGJ 3) are generally unsuitable, as are more 
complex tools for data annotation, such as the MATE 
workbench (Dybkjzer at al 2000). Such systems may of 
course be able to present more complex diagrams than 
Thistle, or offer alternative functionality. 
Crucial to the simplicity of Thistle is the assumption 
that many diagram classes of interest can be character- 
ized using only context fi'ee methods. As we will demon- 
strate below, this assumption is consistent with a usefully 
wide range of classes. We first discuss motivation for the 
design of Thistle, and describe the gramnmrs that charac- 
terize classes of diagram. We then discuss briefly some 
example classes and the lnterarbora service. After giv- 
ing details of the current implementation and recent en- 
hancemenls, we describe the settings in which these tools 
have been exploited. Finally, we describe our current 
work, and possible strategies lot" usefully broadening the 
kinds of diagram that Thistle can describe. 
2 Design 
Thistle is a parametcrizable diagram editor. A class of di- 
agrams is selected by providing Thistle with a gramnmr 
which characterizes the diagrams of interest. The gram- 
mar describes the hierarchical structure o1' diagrams, and 
provides information about layout. 
Gramnmrs for diagram classes utilize a particular form 
of context free gramnmr, in which there arc two kinds 
of statement. In the first, the left hand side of a rule 
names a particular type of diagram, and its rewrite de- 
scribes the abstract structure and concrete layout of a di- 
agram type. In the second, the rewrite is a set of names 
of other diagram types, representing a disjunctive choice 
between the latter. Left hand sides are required to be 
uniqne throughout. (It is straightforward to show that 
any context free grammar can be encoded in this term.) 
Figure 1 shows a fragment of the grammar used to gen- 
erate the diagram in Figure 2. This fragment can be used 
to analyse that part of the diagram expressing the wdue 
of the feature CONTENT. 
2http://www.infornlatik.uni-brenlen.deldaVinci/ 3http://www.eng.aubur,Ledu/csse/research/research_groups/ 
graph_drawing/graph_drawing.htnll 
992 
diagram spec(plain avm, 
bracket(\[delimiter(square)\], 
vbox(var(avpairs, \[avm line\])))) 
diagram union(avmline, 
\[avpair, path_value\]) 
diagram_spec(avpair, 
array element(\[align(baseline)\], 
\[vat(attribute, attribute), 
var(value, value)\])) 
diagram spec(attribute, 
smallcaps(var(name, Text))) 
Figure 1: A simplified fragment from lhe grammar used to generate the diagram in Figure 2 
LOCAL 
NONLOCAI~ 
IlEAl) MOD N' \[TO-BINDIREI~ {\[~ }\] " RESTRHJ 
CAT rllvzl 
1 INDEX CONTEN~I RF, STR 1\[~ } U \] 
TO-BINDISLANlt {~ } 
IN.EIlSLASlt } 
Figure 2: A diagram, reproduced using Thistle, from t'ollard and Sag (1994). 
The lirst and third statements here express the hierar- 
chical structure of and layout of attribute-value matrices. 
One can gloss the first as: "A diagram of type plain_avm 
consists of any number of diagrams o1' type awn_line. 4 
Tile subdiagrams are arranged vertically and enclosed by 
a Imir of sqtmre brackets." In other words, var elemenls 
stand for a variable subpart of a diagram and indicate the 
type o1' diag,'am that can appear at that location. Note 
that such elements also assign a label to each wu'iable 
sublmrt. The second statement above indicates lhat a di- 
agram of type arm_line can be realized as either o1' the 
named types. The fourth statement indicates how dia- 
gram types may introduce sequences of characters. 
This form of CFG leads directly to a user interface 
based on top-down rewriting, 5 where a rule of the first 
kind is invoked, leading to choices in the diagrams intro- 
duced as subparts, and so on. In practical terms, then, 
given a class of diagrams, a particular instance may be 
4The square brackets in \[avm_line\] are an ad hoc way of express- 
ing the Kleene slat. 
5Other ways of conslrucling a diagram are possible, as discussed in 
§5 below. 
consmmted by selecling a location in a diagram, and 
choosing among the possible types of diagram for lhat 
location. What the user sees on the surthce is a WYSI- 
WYG presentation o1' the consequences of tile parlicular 
arrangements of diagram types. 
These aspects of the design address at once problems 
of obsolescence and of providing a uniform user inter- 
face. In order to provide a new class of diagram, one 
has only to construct a grammar for tllat class, provid- 
ing lhe class is amenable to context free treatment (see 
§6 below). We make use of existing standards in lack- 
ling the problem of integrating with other systems. Any 
instance of the editor may be used via a web browser, 
so that local installation of software is not essential. 
The graphical presentation ol'a diagram may be saved 
in PostScript, while the logical content of a diagram is 
stored as SGML. 6 The precise format of a diagram's log- 
ical content exploits the fact that each variable subpart ot' 
a diagram is assigned a unique name. 
In addition to the construction of static diagrams, This- 
6See also §6 below. 
993 
tie may also be used to construct step-time sequences of 
diagrams. A 'diagram player' can be used to step through 
(or jump between) diagrams in the sequence. One exam- 
ple shows the states visited by a top-down backtracking 
parser, on some input and with respect to a given gram- 
inar. 
3 Example diagram classes 
There is a wide range of diagram classes currently avail- 
able, ranging from an essentially complete treatment of 
the diagrams in Pollard and Sag (1994) (Figure 2), and 
in Kamp and Reyle (1993) (Figure 3), to small but useful 
classes for diagrams from particular areas of linguistics, 
such as metrical trees and categoriaI derivations. There 
are also a number of generic diagram classes such as 
trees with unlimited or fixed branching. 
xy 
Jones ( x ) 
Uiysses(y ) 
x owns y 
x fascinates y 
Figure 3: A diagram based on Kamp and Reyle 1993 
4 Interarbora 
Interarbotzt 7 is an internet based service allowing the 
construction and display o1' tree diagrams via Web 
browsers. The user supplies a tree specification as a la- 
belled bracketted string, which is then analysed to pro- 
duce a specification of a Thistle diagram for a simple 
diagram class. This information is then passed back to 
the Web browser, which computes a Thistle diagram for 
display. 
The analyser for braeketted strings attempts to be quite 
liberal. One target format that we handle successfully 
is that of the Penn Treebank 8. Figure 3 shows a sim- 
ple example from Interarbora. As with the other dia- 
grams in this paper, this example is lbrmatted here using 
Postscript generated by Interarbora. There is no discern- 
able difference between this presentation and that deliv- 
ered by a web browser Interarbora is described in more 
detail by Calder (2000). 
5 Current status 
The system described above is fully implemented and is 
available at no charge for non-commercial purposes. As 
7 hllp://www.llg.ed.ac.uk/\]o/interarbora/ 
8 hnp://www.cis.upenn.edu/lreebank/home.hlml 
S 
NP VP 
t 
PN V 1 NP 
I I I 
Hank chased PN 
I 
Frank 
Comments: Hank chasing Frank 
Figure 4: An example tree fl'om Interarbora 
our implementation platform is Java, there are relatively 
few portability issues. 9 In addition to the mode of op- 
eration described above, where a user selects a location 
in a diagram and chooses a type for that location, we 
have also investigated modes which are not strictly top- 
down. Such modes are essential in tasks such as annota- 
tion, where one has, for example, a given string or text 
to mark up. In lhis case, one is interested in adding to 
the (possibly minimal) existing structure, and this cannot 
be straightforwardly done under a pure top-down model. 
Consequently, we have added a range of operations over 
diagrams, including: 
split a sequence of characters is replaced by two (or 
more) of its subsequences with appropriate struc- 
tural ad, iustments 
join the inverse of split 
demote a diagram is adjoined into the diagram at tim 
current location 
promote the diagram at the current location replaces its 
mother. 
There are a number of interesting points to these op- 
erations. First, the possibility of such operations is in 
general determined through grammatical inference. So it 
is not possible to split a sequence of characters in a lo- 
cation where only one such sequence is allowed by the 
grammar. Second, the demote operation is the exact ana- 
log of adjunction in Tree Adjoining Grammars (see e.g. 
Joshi et al, 1991). A demote operation is only allowed 
if the type of diagram at the current location is permit- 
ted within some other diagram type t and the type t is 
also permissible at the current location in structure. In 
general, having selected a location for a demote opera- 
tion, there may be several ways of executing the oper- 
ation. For example, the user may be asked to choose 
which daughter in a tinite branching local tree should re- 
ceive the diagram at the currently selected location. Fi- 
nally, these operations are not grammar specific, so that 
9Our implementation predates later versions of Jawt which provide 
a tree abslraction, and so our current implementalion does not make 
use of lhis facility. 
994 
the same kinds of operations arc awtilable, whether one 
is dealing with corpus annotation tools oi" all editor for 
HPSG diag,'ams. 
6 Current use and on-going and fnture 
work 
The system is in use in the support of teaching in a vari- 
ety of settings. Cox et al results about the effectiveness of 
Thistle in teaching concepts to do with phrase structure 
and category membership. Understanding of these con- 
cepts seems to have been improved simply by viewing a 
video capture ot' trees being editing, lnterarbora is used 
at several institutions in junior level courses. We have 
used Thistle as a front end to a variety o1' rule formats, 
inch, ding those for the tokenization tool TTT (Grover et 
a12000). The diagram player has been used for the visu- 
alization o\[' the results of corpus searches in GSI~ARCll I0 
and of dialogue slates, in concert with software devel- 
oped in the TRINI)I project If. 
On-going SUl)port work includes changing the per- 
sistence formitt of diagrams fi'om SGML to XML, and 
bringing diagram classes within the same format. There 
arc a large number of minor improvements we inlcnd to 
make, including generalizing the Web interl:aces so tirol 
diagram classes and persistence fommts may be supplied 
by the user. 
Our current research has a number of aspects. The lim- 
itation to context free diagram classes simplifies many 
aspects of implementation, most notably in the area of 
layout. On the other hand, many diagram classes re- 
quire greater thiul context free power for their adequate 
description. Important classes include stale transition di- 
agrams, systemic functional networks and autosegmental 
dfitgrams. We are looking at COml~romises which will al- 
low lhe construction and display of such diagrams while 
avoiding diflicult layout problems. 
Another a,'ea in which the context free assuml)lion 
is being examined has to do wilh diagrams where con- 
straints such as equality are required to hold wiflfin a di- 
agram. An example of this is the notion of proper bind- 
ing in Discourse Representation Theory -- a variable oc- 
curring as an argument mr,st be appropriately introduced 
(and vice vetwa). A further example is the enforcement of 
appropriateness conditions within a typed feature frame- 
work. Strictly speaking, this case doesn't violate our 
context flee assumption, but encoding such conditions 
in a context free way is cumbersome. In these cases, 
we are interested in looking at ways of further constrain- 
ing the content of diagrams. One possibility, which sits 
happily enough with Thistle's background of lbrmal lan- 
gt, age theory, is to exploit the notion of path, a sequence 
of variable-type pairs. Any Thistle diagram corresponds 
to a set of such paths, and, because these are generated 
by a context free grammar, the  of such paths 
l°hllp://www.hcrc.ed.ac.uk/gsearch/ 
11hlllxl/www.ling.gtt.selresearchlprojects/trindi/irindikil.hlm I 
is regular. We could enforce apl~ropriateness in a typed 
feature setting, for example, by expressing further reg- 
ular constraints over paths. Using greater than regular 
power would result in diagrams whose struclt, re was no 
longer context fi'ee. 
Other possibilities include looking at logics to express 
constraints over diagrams. We can view the set of paths 
as a model of some logical theory. As our diagrams are 
necessarily finite, this means that logical frameworks el: 
consklerable power could be invoked. 
One further clement of our work examines ways of 
providing programmatic control of diagrams, with appli- 
cations in interactive diagram design, where a cooperat- 
ing program may lill in details which are logically im- 
plied, and debugging o f complex represe ntation s. 
7 Conclusions 
We have seen above that Thistle provides a flexible, 
lightweight interface to a wide variety of diagram types. 
Furthermore, it can be used 1"o1" the delivery of diagrams 
(and sequences of diagrams) in a variety of settings. The 
Interarbora service provides a way of allowing visualiza- 
tion of tree slruclures suitable t%ra wide variety of users. 

References 

Calder, J. (2000) Interarbora and Thistle: Delivering fin- 
guistic structure via the \]nternct, in Proceedings of the 
Second Lcmguage Resources and Evaluation Cot!fi'r- 
ence, 31 May-2 June 2000, Athens, Greece. 

Cox, R., MeKendree, J., Tobin, R., Lee, J. & Maycs, 
% (1999) Vicarious learning fi'om dialogue and dis- 
course: A controlled comparison. Instructional Sci- 
ence 27, pp431-458. 

l)ybkjler, l,., MOiler, M. B., Bernsen, N. O., Grosse, M., 
Olsen, M. and Schiffi'in, A. (2000) Annotating Com- 
munication Problems Using the MATE Workbench, in 
l'roceedings of the Second Language Resources and 
Evaluation Cot!/'erence, 3 \[ May-2 June 2000, Athens, 
Greece. 

Grover, C., Matheson, C., Mikheev, A. and Moens, M., 
(2000) LT TTT - A Flexible \]bkcnisation Tool, in 
Proceedings of the Second Language Resources attd 
Evaluation Cm{/'erence, 31 May-2 June 2000, Athens, 
Greece. 

Joshi, A. K., Vijay-Shanker, K. and Weir, D. J. (1991) 
The convergence of mildly context-sensitive grammat- 
ical formalisms, in P. Sells, S. M. Shieber and "L Wa- 
sow (eds.) Fotmdational lsxues in Natural Language 
Processing. MIT Press: Cambridge, MA. 

Kamp, H & Reyle, U. (1993). From Discomwe to Logic, 
Kluwer Academic: Dordrecht and London. 

Pollard, C.& Sag, I.A. (1994). Head-Driven l'hrase 
Structure Granmtar. CSLI: Stanlbrd and University of 
Chicago Press: Chicago and London. 

Viehstaedt, G. & Minas, M. (1995). Generating editors 
for direct manipulation of diagrams. In B. Blumenthal, 
J. Gornostaev & C. Unger, editors, Proc. 5th h~ter- 
national Conference on Human-Computer hzteraction 
(EWHCI'95), Moscow, Russia, LNCS 1015, pp17- 
25. Springcr-Verlag. 
