Vi-xfst: A Visual Regular Expression Development Environment for Xerox
Finite State Tool
Kemal Oflazer and Yasin Y lmaz
Human Language and Speech Technology Laboratory
Sabanc University
Istanbul, Turkey
oflazer@sabanciuniv.edu, yyilmaz@su.sabanciuniv.edu
Abstract
This paper describes Vi-xfst, a visual interface and
a development environment, for developing  nite
state language processing applications using the Xe-
rox Finite State Tool, xfst. Vi-xfst lets a user con-
struct complex regular expressions via a drag-and-
drop visual interface, treating simpler regular ex-
pressions as  Lego Blocks. It also enables the vi-
sualization of the structure of the regular expres-
sion components, providing a bird’s eye view of
the overall system, enabling a user to easily under-
stand and track the structural and functional rela-
tionships among the components involved. Since
the structure of a large regular expression (built in
terms of other regular expressions) is now transpar-
ent, users can also interact with regular expressions
at any level of detail, easily navigating among them
for testing. Vi-xfst also keeps track of dependen-
cies among the regular expressions at a very  ne-
grained level. So when a certain regular expression
is modi ed as a result of testing, only the depen-
dent regular expressions are recompiled resulting in
an improvement in development process time, by
avoiding  le level recompiles which usually causes
redundant regular expression compilations.
1 Introduction
Finite state machines are widely used in many lan-
guage processing applications to implement com-
ponents such as tokenizers, morphological analyz-
ers/generators, shallow parsers, etc. Large scale  -
nite state language processing systems built using
tools such as the Xerox Finite State Tool (Kart-
tunen et al., 1996; Karttunen et al., 1997; Beesley
and Karttunen, 2003), van Noord’s Prolog-based
tool (van Noord, 1997), the AT&T weighted  nite
state machine suite (Mohri et al., 1998) or the IN-
TEX System (Silberztein, 2000), involve tens or
hundreds of regular expressions which are compiled
into  nite state transducers that are interpreted by
the underlying run-time engines of the (respective)
tools.
Developing such large scale  nite state systems is
currently done without much of a support for the
 software engineering aspects. Regular expres-
sions are constructed manually by the developer
with a text-editor and then compiled, and the result-
ing transducers are tested. Any modi cations have
to be done afterwards on the same text  le(s) and the
whole project has to be recompiled many times in a
development cycle. Visualization, an important aid
in understanding and managing the complexity of
any large scale system, is limited to displaying the
 nite state machine graph (e.g., Gansner and North
(1999), or the visualization functionality in INTEX
(Silberztein, 2000)). However, such visualization
(sort of akin to visualizing the machine code of a
program written in a high-level language) may not
be very helpful, as developers rarely, and possibly
never, think of such large systems in terms of states
and transitions. The relationship between the reg-
ular expressions and the  nite state machines they
are compiled into are opaque except for the simplest
of regular expressions. Further, the size of the re-
sulting machines, in terms of states and transitions,
is very large, usually in the thousands to hundreds
of thousands states, if not more, making such visu-
alization meaningless. On the other hand, it may
prove quite useful to visualize the structural compo-
nents of a set of regular expressions and how they
are put together, much in the spirit of visualizing the
relationships amongst the data objects and/or mod-
ules in a large program. However such visualiza-
tion and other maintenance operations for large  -
nite state projects spanning over many  les, depend
on tracking the structural relationships and depen-
dencies among the regular expressions, which may
prove hard or inconvenient when text-editors are the
only development tool.
This paper presents a visual interface and develop-
ment environment, Vi-xfst (Y lmaz, 2003), for the
Xerox Finite State Tool, xfst, one of the most sophis-
ticated tools for constructing  nite state language
processing applications (Karttunen et al., 1997).
                                                                  Barcelona, July 2004
                                              Association for Computations Linguistics
                       ACL Special Interest Group on Computational Phonology (SIGPHON)
                                                    Proceedings of the Workshop of the
Vi-xfst enables incremental construction of com-
plex regular expressions via a drag-and-drop inter-
face, treating simpler regular expressions as  Lego
Blocks . Vi-xfst also enables the visualization of the
structure of the regular expression components, so
that the developer can have a bird’s eye view of the
overall system, easily understanding and tracking
the relationships among the components involved.
Since the structure of a large regular expression
(built in terms of other regular expressions) is now
transparent, the developer can interact with regular
expressions at any level of detail, easily navigating
among them for testing and debugging. Vi-xfst also
keeps track of the dependencies among the regular
expressions at a very  ne-grained level. So, when
a certain regular expression is modi ed as a result
of testing or debugging, only the dependent regu-
lar expressions are recompiled. This results is an
improvement in development time, by avoiding  le
level recompiles which usually causes substantial
redundant regular expression compilations.
In the following sections, after a short overview of
the Xerox xfst  nite state machine development en-
vironment, we describe salient features of Vi-xfst
through some simple examples.
2 Overview of xfst
xfst is a sophisticated command-line-oriented inter-
face developed by Xerox Research Centre Europe,
for building large  nite state transducers for lan-
guage processing applications. Users of xfst employ
a high-level regular expression language which pro-
vides an extensive palette of high-level operators.1
Such regular expressions are then compiled into  -
nite state transducers and interpreted by a run-time
engine built into the tool. xfst also provides a further
set of commands for combining, testing and inspect-
ing the  nite state transducers produced by the regu-
lar expression compiler. Transducers may be loaded
onto a stack maintained by the system, and the top-
most transducer on the stack is available for testing
or any further operations. Transducers can also be
saved to  les which can later be reused or used by
other programs in the Xerox  nite state suite.
Although xfst provides quite useful debugging facil-
ities for testing  nite state networks, it does not pro-
vide additional functionality beyond the command-
1Details of the operators are available at http://www.
xrce.xerox.com/competencies/content-analysis/
fsCompiler/fssyntax.html and http://www.xrce.
xerox.com/competencies/content-analysis/
fsCompiler/fssyntax-explicit.html.
line interface to alleviate the complexity of develop-
ing large scale projects. Building a large scale  nite
state transducer-based application such as a mor-
phological analyzer or a shallow  nite state parser,
consisting of tens to hundreds of regular expres-
sions, is also a large software engineering undertak-
ing. Large  nite state projects can utilize the make
functionality in Linux/Unix/cygwin environments,
by manually entering ( le level) dependencies be-
tween regular expressions tered into a make le. The
make program then invokes the compiler at the shell
level on the relevant  les by tracking the modi ca-
tion times of  les. Since whole  les are recompiled
at a time even when a very small change is made,
there may be redundant recompilations that may in-
crease the development time.
3 Vi-xfst  a visual interface to xfst
As a development environment, Vi-xfst has two im-
portant features that improve the development pro-
cess of complex large scale  nite state projects with
xfst.
1. It enables the construction of regular expres-
sions by combining previously de ned regular
expressions via a drag-and-drop interface.
2. As regular expressions are built by combining
other regular expressions, Vi-xfst keeps track of
the topological structure of the regular expres-
sion  how component regular expressions re-
late to each other. It derives and maintains the
dependency relationships of a regular expres-
sion to its components, and via transitive clo-
sure, to the components they depend on. This
structure and dependency relations can then be
used to visualize a regular expression at vari-
ous levels of detail, and also be used in very
 ne-grained recompilations when some regu-
lar expressions are modi ed.
3.1 Using Vi-xfst
In this section, we describe important features Vi-
xfst through some examples.2 The  rst example is
for a simple date parser described in Karttunen et
al. (1996). This date parser is implemented in xfst
using the following regular expressions:3
2The examples we provide are rather simple ones, as length
restrictions do not allow us to include large  gures to visualize
complex  nite state projects.
3The define command de nes a named regular expres-
sion which can then be subsequently referred to in later regular
expressions. | denotes the union operator. 0 (without quotes)
denotes the empty string traditionally represented by  in the lit-
erature. The quotes "are used to literalize sequence of symbols
which have special roles in the regular expression language.
define 1to9 [1|2|3|4|5|6|7|8|9];
define Day [Monday|Tuesday|Wednesday|
Thursday|Friday|
Saturday|Sunday];
define Month [January|February|March|
April|May|June|July|
August|September|October|
November|December];
define def2 [1|2];
define def4 [3];
define def5 ["0"|1];
define Empty [ 0 ]
define def16 [ (Day ", ")];
define SPACE [" "];
define 0To9 ["0" | 1To9];
define Date [1to9|[def2 0To9]|
[def4 def5]];
define Year [1to9 [[0To9 [[[0To9
[0To9|Empty]]
|Empty]|Empty]]
|Empty]];
define DateYear [(", " Year)];
define LeftBracket [ "[" ];
define RightBracket [ "]" ];
define AllDates [Day|[def16 Month
SPACE Date DateYear]];
define AllDatesParser
[AllDates @->
LeftBracket ... RightBracket];
read regex AllDatesParser;
The most important regular expression above is
AllDates, a pattern that describes a set of
calendar dates. It matches date expressions
such as Sunday, January 23, 2004 or just
Monday. The subsequent regular expression
AllDatesParser uses the longest match down-
ward bracket operator (the combination of @->and
...) to de ne a transducer that puts [ and ]
around the longest matching patterns in the input
side of the transducer.
Figure 1 shows the state of the screen of Vi-xfst just
after the AllDatesParser regular expression is
constructed. In this  gure, the left side window
shows, under the Definitions tab, the regular
expressions de ned. The top right window shows
the template for the longest match regular expres-
sion slots  lled by drag and drop from the list on the
left. The AllDatesParserregular expression is
entered by selecting the longest-match downward
bracket operator (depicted with the icon @-> with
... underneath) from the palette above, which then
inserts a template that has empty slots  three in this
case. The user then  picks up regular expressions
from the left and drops them into the appropriate
slots. When the regular expression is completed, it
can be sent to the xfst process for compilation. The
bottom right window, under the Messages tab,
shows the messages received from the xfst process
running in the background during the compilation
of this and the previous regular expressions.
Figure 2 shows the user testing a regular expression
loaded on to the stack of the xfst. The left win-
dow under the Networkstab, shows the networks
pushed on to the xfst stack. The bottom right win-
dow under Test tab lists a series of input, one of
which can be selected as the input string and then
applied up or down to the topmost network on the
stack.4 The result of application appears on the
bottom pane on the right. In this case, we see the
input with the brackets inserted around the longest
matching date pattern, Sunday, January 23,
2004 in this case.
3.2 Visualizing regular expression structure
When developing or testing a large  nite state trans-
ducer compiled from a regular expression built as a
hierarchy of smaller regular expressions, it is very
helpful, especially during development, to visualize
the overall structure of the regular expression to eas-
ily see how components relate to each other.
Vi-xfst provides a facility for viewing the structure
of a regular expression at various levels of detail.
To illustrate this, we use a simple cascade of trans-
ducers simulating a coke machine dispensing cans
of soft drink when the right amount of coins are
dropped in.5 The regular expressions for this ex-
ample are:6
define N [ n ];
define D [ d ];
define Q [ q ];
define DefPLONK [ PLONK ];
define CENT [ c ];
define SixtyFiveCents [ [ [ CENT ]^65 ]
.x.
DefPLONK ];
define CENTS [[N .x. [[ CENT ]^5 ]|
[D .x. [[ CENT ]^10 ]]]|
[Q .x. [[ CENT ]^25 ]]];
define BuyCoke [ [ [ CENTS ]* ]
.o.
SixtyFiveCents ];
4xfst only allows the application of inputs to the topmost
network on the stack.
5See http://www.xrce.xerox.com/competencies/
content-analysis/fsCompiler/fsexamples.html for
this example.
6The additional operators in this example are: .x. repre-
senting the cross-product and .o. representing the composi-
tion of transducers, and caret operator (  )denoting the repeated
concatenation of its left argument as many times as indicated by
its right argument.
Figure 1: Constructing a regular expression via the drag-and-drop interface
The last regular expression here BuyCoke de nes
a transducer that consist of the composition of two
other transducers. The transducer [ CENTS ]*
maps any sequence of symbols n, d, and q repre-
senting, nickels, dimes and quarters, into the appro-
priate number of cents, represented as a sequence
of c symbols. The transducer SixtyFiveCents
maps a sequence of 65 c symbols to the symbol
PLONK representing a can of soft drink (falling).
Figure 3 shows the simplest visualization of the
BuyCoke transducer in which only the top level
components of the compose operator (.o.) are dis-
played. The user can navigate among the visible
regular expressions and  zoom into any regular ex-
pressions further, if necessary. For instance, Figure
4 shows the rendering of the same transducer af-
ter the top transducer is expanded where we see the
union of three cross-product operators, while Figure
5 shows the rendering after both components are ex-
panded. When a regular expression laid out, the user
can select any of the regular expressions displayed
and make that the active transducer for testing (that
is, push it onto the top of the xfst transducer stack)
Figure 3: Simplest view of a regular expression
and rapidly navigate among the regular expressions
without having to remember their names and loca-
tions in the  les.
As we re-render the layout of a regular expression,
we place the components of the compose and cross-
product operators in a vertical layout, and others in
Figure 2: Testing a regular expression.
Figure 4: View after the top regular expression is
expanded.
a horizontal layout and determine the best layout
of the components to be displayed in a rectangular
bounding box. It is also possible to render the up-
ward and downward replace operators in a vertical
layout, but we have opted to render them in a hori-
zontal layout (as in Figure 1). The main reason for
this is that although the components of the replace
part of such an expression can be placed vertically,
the contexts need to be placed in a horizontal layout.
A visualization of a complex network employing a
different layout of the replace rules is shown in Fig-
ure 6 with the Windows version of Vi-xfst. Here we
see a portion of a Number-to-English mapping net-
work7 where different components are visualized at
different structural resolutions.
3.3 Interaction of Vi-xfst with xfst
Vi-xfst interacts with xfst via inter-process commu-
nication. User actions on the Vi-xfst side get trans-
lated to xfst commands and get sent to xfst which
maintains the overall state of the system in its own
universe. Messages and outputs produced by xfst
are piped back to Vi-xfst, which are then parsed
7Due to Lauri Karttunen; see http://www.cis.
upenn.edu/~cis639/assign/assign8.htmlfor the
xfst script for this transducer. It maps numbers like 1234 into
English strings like One thousand two hundred and thirty four.
Figure 6: Mixed visualization of a complex network
Figure 5: View after both regular expressions are
expanded.
and presented back to the user. If a direct API is
available to xfst, it would certainly be possible to
implement tighter interface that would provide bet-
ter error-handling and slightly improved interaction
with the xfst functionality.
All the  les that Vi-xfst produces for a project are
directly compatible with and usable by xfst; that is,
as far as xfst is concerned, those  les are valid reg-
ular expression script  les. Vi-xfst maintains all the
additional bookkeeping as comments in these  les
and such information is meaningful only to Vi-xfst
and used when a project is re-loaded to recover all
dependency and debugging information originally
computed or entered. Currently, Vi-xfst has some
primitive facilities for directly importing hand gen-
erated  les for xfst to enable manipulation of al-
ready existing projects.
4 Selective regular expression compilation
Selective compilation is one of the simple facili-
ties available in many software development envi-
ronments. A software development project uses se-
lective compilation to compile modules that have
been modi ed and those that depend (transitively) in
some way (via say header  le inclusion) to the mod-
i ed modules. This selective compilation scheme,
typically known as the make operation, depends on
a manually or automatically generated make le cap-
turing dependencies. It can save time during devel-
opment as only the relevant  les are recompiled af-
ter a set of modi cations.
In the context of developing large scale  nite state
language processing application, we encounter the
same issue. During testing, we recognize that a cer-
tain regular expression is buggy,  x it, and then have
to recompile all others that use that regular expres-
sion as a component. It is certainly possible to use
make and recompile the appropriate regular expres-
sion  les. But, this has two major disadvantages:
 The user has to manually maintain the make-
 le that captures the dependencies and invokes
the necessary compilation steps. This may be
a non-trivial task for a large project.
 When even a singular regular expression is
modi ed, the  le the regular expression resides
in, and all the other  les containing regular ex-
pressions that (transitively) depend on that  le,
have to be recompiled. This may waste a con-
siderable amount of time as many other regu-
lar expressions that do not need to be recom-
piled, are compiled just because they happen
to reside in the same  le with some other reg-
ular expression. Since some regular expres-
sions may take a considerable amount of time
to compile, this unnecessarily slows down the
development process.
Vi-xfst provides a selective compilation functional-
ity to address this problem by automatically keeping
track of the regular expression level dependencies as
they are built via the drag-and-drop interface. This
dependency can then be exploited by Vi-xfst when a
recompile needs to be done.
Figure 7 shows the directed acyclic dependency
graph of the regular expressions in Section 3.1, ex-
tracted as the regular expressions are being de ned.
A node in this graph represents a regular expression
that has been de ned, and when there is an arc from
a node to another node, it indicates that the regu-
lar expression at the source of the arc directly de-
pends on the regular expression at the target of the
arc. For instance, in Figure 7, the regular expres-
sion AllDatesdirectly depends on the regular ex-
pressions Date, DateYear,Month, SPACE, and
def16.
Figure 7: The dependency graph for the regular ex-
pressions of the DateParser.
After one or more regular expressions are modi ed,
we  rst recompile (by sending a de ne command to
xfst) those regular expressions, and then recompile
all regular expressions starting with immediate de-
pendents and traversing systematically upwards to
the regular expressions of all  top nodes on which
no other regular expressions depend, making sure
that
 all regular expressions that a regular expres-
sion depends on and have to be recompiled, are
recompiled before that regular expression is re-
compiled, and
 every regular expression that needs to be re-
compiled is recompiled only once.
To achieve these, we compute the subgraph of the
dependency graph that has all the nodes correspond-
ing to the modi ed regular expressions and any
other regular expressions that transitively depends
on these regular expressions. Then, a topological
sort of the resulting subgraph gives a possible linear
ordering of the regular expression compilations.
For instance for the dependency subgraph in Fig-
ure 7, if the user modi es the de nition of
the network 1to9, the dependency subgraph
of the regular expressions that have to be re-
compiled would be the one shown in Figure
8. A (reverse) topological sort of this depen-
Figure 8: The dependency subgraph graph induced
by the regular expression 1to9.
dency subgraph gives us one of the possible or-
ders for recompiling only the relevant regular
expressions as: 1to9, 0To9, Date, Year,
DateYear, AllDates, AllDatesParser
5 Conclusions and future work
We have described Vi-xfst, a visual interface and a
development environment for the development of
large  nite state language processing application
components, using the Xerox Finite State Tool xfst.
In addition to a drag-and-drop user interface for
constructing regular expressions in a hierarchical
manner, Vi-xfst can visualize the structure of a reg-
ular expression at different levels of detail. It also
keeps track of how regular expressions depend on
each other and uses this dependency information for
selective compilation of regular expressions when
one or more regular expressions are modi ed dur-
ing development.
The current version of Vi-xfst lacks certain features
that we plan to add in the future versions. One im-
portant functionality that we plan to add is user cus-
tomizable operator de nitions so that new regular
expression operators can be added by the user as op-
posed to being  xed at compile-time. The user can
de ne the relevant aspects (slots, layout) of an oper-
ator in a con guration  le which can be read at the
program start-up time. Another important feature
is the importing of libraries of regular expressions
much like symbol libraries in drawing programs and
the like.
The interface of Vi-xfst to the xfst itself is localized
to a few modules. It is possible to interface with
other  nite state tools by rewriting these modules
and providing user-de nable operators.
6 Acknowledgments
We thank XRCE for providing us with the xfst and
other related programs in the  nite state suite.

References
Kenneth R. Beesley and Lauri Karttunen. 2003. Fi-
nite State Morphology. CSLI Publications, Stan-
ford University.
Emden R. Gansner and Stephen C. North. 1999. An
open graph visualization system and its applications
to software engineering. Software  Practice and
Experience.
Lauri Karttunen, Jean-Pierre Chanod, Gregory
Grefenstette, and Anne Schiller. 1996. Regular ex-
pressions for language engineering. Natural Lan-
guage Engineering, 2(4):305 328.
Lauri Karttunen, Tamas Gaal, and Andre Kempe.
1997. Xerox Finite-State Tool. Technical report,
Xerox Research Centre Europe.
Mehryar Mohri, Fernando Pereira, and Michael Ri-
ley. 1998. A rational design for a weighted  nite 
state transducer library. In Lecture Notes in Com-
puter Science, 1436. Springer Verlag.
Max Silberztein. 2000. Intex: An fst toolbox. The-
oretical Computer Science, 231(1):33 46, January.
Gertjan van Noord. 1997. FSA utilities: A toolbox
to manipulate  nite state automata. In D. Raymond,
D. Wood, and S. Yu, editors, Automata Implemen-
tation, number 1260 in Lecture Notes in Computer
Science. Springer Verlag.
Yasin Y lmaz. 2003. Vi-XFST: A visual interface
for Xerox Finite State Toolkit. Master’s thesis, Sa-
banc University, July.
