BetaText: An Event Driven Text P~oessir~\] ~ Text ~lyzin~\] Systeml 
Benny Brodda 
Department of Linguistics 
University of Stockholm 
S-I06 91 Stockholm, $%~eden 
Abstract. BetaText can be described as an event 
driven pr(xluction system, in which (c~mbinations of) 
text events lead to certain actions, such as the 
printing of sentences that exhibit certain, say, 
syntactic phenomena. %~\]e analysis mechanism used 
allows for arbitrarily complex parsing, but is par- 
ticularly suitable for finite state i~arsing. A care- 
ful investigation of what is actually needed in 
linguistically relevant text processing resulted in 
a rather sn%all but carefully chosen set of "elemen- 
tary actions" to be implemented. 
1. Introdnction. The field of c~mputa'tior~\]\[ linguis- 
tics seems, roughly speaking, to o~IiprJ.se two rather 
disjoint subfields, one J.n which the typical 
researcher predominantly occupies himself witJl pro- 
blems such as "concordance generation", "backward 
sorting", "word frequencies" and so on, whereas the 
prototypic researd\]er in tJ~e otJler field has things 
like "parsing strategies", "semantic representa- 
tions" on top of his mind. 
qhis division into almost disjoint subfields is 
to be regretted, because we all are (or should be) 
students of one and the same thing - language as it 
is. %~e responsibility for this sad state of affairs 
can probably be divided equal by the researchers in 
these two subfields: the "concordance makers" .~- 
cause they seem so entirely ha~)py with rather unso- 
phisticated cx)raputational tools de~eloped a\].reac~ in 
the sixties (and which allow the researcher to look 
at words or word forms only, and their distribu- 
tion), and the theoreticians ~yecause they seem so 
obsessed with the idea of developing their fantastic 
ir~dels of \]xln(}lage in greater and greater detail, a 
mode\], that at a closer scrutiny is found to c~Dmprise 
a lexicon of, at best, a couple of hundred words, 
and cvavering, at best, a couple of hundred sentences 
or so. No wonder that the researchers in these two 
canlos thirJ< so little of each other. 
One way of closing the gap can be to develop 
niDre sophisticated tools for the investigation of 
actual texts; there is a need for die theoreticians 
to test to what extent their models actually cover 
actual language (and to get impulses from actual 
language), and there is a need for the "practicion- 
ers" to have simple tools for investigating snore 
complex st\[llctures in texts than mere words and word 
:totals. BetaText is an attempt to provide tools for 
both those needs. 
2. Text events and text oiyerations. BetaText is a 
system intended both for scientific investigations 
(or analyses) of texts, and text processing in a 
i~ore technical sense, such as reformattlng, washing 
spurious characters away, and so on. Due to the 
internal organisation of the system, even large 
texts can \[se run at a reasonable cost (of. Brodda- 
Karlsson ±98i). In this section we give some general 
definitions, and show their consequences for Beta- 
Te xt. 
i~i elementary (text) event consists of the obser- 
vation of one specified, concrete string in the 
text. The systera records sudl an observation through 
the introduction of a specific internal state (oz 
through a specific change of the internal state), 
the internal state being an internal variable that 
can take arbitrary, positive integral values. 
/Lrbitrarily chosen states (sets of states, in 
fact.) can be tied to specific activities (or pro 
cesses), and each time such a state is intro duced 
(i.e. the internal state becomes equal to that 
state) the corresponding process is aeti vated. Such 
states are called action states. 
A complex event (or just event, even elementary 
events can be. cor~lalex in the sense used here) is the 
c~3mbined result of a sequence of interconnected 
elementary events, possibly resulting in an action 
state. 
In BetaText all this is coi~pletely controlled by 
a set of prEx~uction rules (cf. Smullyan 196\].) of the 
type~ 
(<striug>, <set of states>) -> 
( <slew string>, <move>, <new state>, <action(s) >) 
where <string> is the string that is to be observed, 
<set of states> a condition for applying the rule, 
viz. that the current inter\]lal state belongs to this 
set; it is via such conditions that the chaining of 
several elementary events into one con~91ex event is 
achieved. <new string> is a string that is substi- 
tuted for the observed string (the default is that 
the original string is retained), <move> is a direc- 
tive to Che system w'here (in the text) it shall 
continue the analysis; the default is immediately to 
421 
the right of the observed string. <new state> is the 
state that the application of the rule results in. 
<action(s)>, finally, is the set of actions that are 
invoked through the application of the rule; the 
action part of a rule is only indirectly present, as 
the actions are invoked if the resulting state of 
the rule belongs to the corresponding action sets. 
The actual rule format also allows for context 
conditions (and not only state conditions as is 
indicated above), h/t it is the way state conditions 
are evaluated that makes the Beta formalism as 
strong as it is; cf. Brodda-Karlsson 81 and Brodda 
86. 
3. internal organiT~tion. The text corpus to be ana- 
lyzed is assumed to adhere to a format that ,~re or 
less has become the international standard, where 
each line of the running text is assumed to be 
preceded by a fixed length line head, usually con- 
taining some k~id of line identifier. (Typically a 
doct~nerrt identifier + a running line enumeration.) 
~ne running text is presented to the user (well, the 
program) as if consisting of one immensely long 
string (without the line heads) and in which the 
originnl line divisions are represented by number 
signs (or some other unique symbol not appearing 
otherwise in the text). %~e original line heads are 
also lined up in an internal queue, and the corre- 
spondence between lines and line heads is retained 
via pointers. (This is completely hidden for the 
user. ) 
%~e system has (or can be thought to have) a 
cn/rsor that is moved to and fro inside the text. At 
start up, the cursor is placed at the beginning of 
the text, and the internal state is initiated to l~ 
from there on, the user has complete control (via 
the application of rules) of the cursor and the 
internal state. (The the cursor is, however, auto- 
i~tically moved rightwards in the text as long as 
there is no rule applicable.) 
Output is again organized in the line head, text 
line format, but now the line head may be given an 
internal structure, viz. as 
<-kwoc-fie Id-> <-id-field-> <-enum- field-> 
where the id-field corresponds to the line head of 
the input text, the kwoc-field may be filled with 
material from the text itself (e.g. words if one is 
making a word concordance of the KWOC-type), and the 
ent~n(eration)-field, if defined, contains a running 
entuneration. These fields - if defined - must be 
explicitly filled with corresponding material, 
through the application of action rules, which we 
describe in the next section. 
4. Actions. The actions that can be invoked through 
the applications of rules can be divided into four 
different groups~ i) analysis actions, actions that 
422 
control in detail how the analysis proceedes in- 
ternally; ii) block and line head actions, actions 
through which one can move material from the text 
into the line head (and vice versa); iii) outl~It (or 
print) actions, actions which result in some kind 
of output, and, finally, iv)count actions. 
q%le analysis actions control how the analysis is 
to proceed internally. In an accumulating rule the 
resulting state is added to (or subtracted from) the 
current internal state, rather than assigned to it 
(which is the default case). In stack rules some 
important internal parameters (internal state and 
the present positions of the cursor and the flag; 
cf. below) are pushed onto or popped from an in- 
ternal stack. %~rough the use of stack actions ATN- 
like grammars can ~ writtern very conveniently in 
the Beta formalism (cf. Brodda 86.) 
Block and line head actions: A flag setting 
action implies that an internal pointer is set to 
the present position of the cursor. The flag can 
later be the target of move directives (i.e. the 
cursor can be moved back to the flag). The area from 
the flag to the current position of the cursor can 
also be moved out into the kwoc-field as one block 
in a kwoc action. 
In output actions the output can be forn~tted in 
n~ny convenient ways. in kwic-format, for instance, 
always exactly one line at a time is output, and in 
such a way that the cursor is positioned in a fixed 
co itu~In. 
BetaText has not in itself any advanced sta- 
tistical apparatus, but one can at least count 
things, and perhaps in a little bit more advanced 
way than is usually the case. Arbitrary sets of 
states can be assigned specific registers (up to \]28 
such sets can presently be defined), and ~henever 
any of these states is introduced, the correslxgnding 
register is raised by one. The content of the reg- 
isters are then presented in a log file that ac- 
conloanies all sessions with Beta'l~xt. 
Several examples of actual analyses will k~ 
shown at the conference. 
References

Brodda, B. & Karlsson, i.'. "An Experiment with 
Auton~tic Morphological Analysis of Fin- 
nish", Del~rt~f~ent of Linguistics, Universi- 
ty of Helsinki, Helsinki 1981. 

Brodda B. "~ Experiment with Heuristic Parsing of 
Swedish" in Papers from the Seventh Sc6~idi- 
navian Conference of Linguistics, Publica- 
tions No. 10, Department of Linguistics, 
University of Helsinki, Helsinki 1983. 

Brodda, B. "Beta%%xt: 7~i event Driven Text Proces- 
sing System and Text Analyzing System", to 
appear in Papers ~om the ~Ehglish Language 
and Literature department, University of 
Stockholm, Stockholm 1986. 

Sn~llyan, R.M. "Theory of Formal Systems", Annals 
of Math. Studies, New York 1961. 
