COMPUTER-AIDED GRAMMATICAL TAGGING OF SPOKEN ENGLISH 
Jan Svartvik, University of Lund 
Department of English 
Helgonabacken 14 
S-ZZ3 6Z Lund, Sweden 
Abstract 
The paper presents an outline of a system for 
grammatical tagging of the London-Lund Corpus 
of spoken English consisting of some 450 000 
words. The material, all of which will be 
available on magnetic computer tape, and part 
of which is now available in both machine- 
readable and printed form, has been transcrib- 
ed orthographically with prosodic marking for 
tone units, nuclei, stresses, pauses, etc (see 
Samples 1 and Z). Whereas there is now con- 
siderable agreement on the usefulness of a 
tagged corpus, there is as yet no consensus on 
the best type of tagging, let alone the procedure 
involved. The analysis proposed here is of 
course specifically aimed at tagging spoken 
English, but should be largely applicable also 
to written English. 
The syntactic tagging will initially be based on 
surface properties, since we are interested in 
gaining information that is directly available 
through the signals that hearers use for decod- 
ing a message, ie their perceptual strategies. 
In this respect, the plan is no innovation. One 
computer discourse model which is intended 
"to tackle problems that a speaker evidently 
tackles" has recently been reported by Davey 
(1978.4). His model, however, is designed to 
produce, not understand. Another and more 
important difference between the SSE system 
and the Davey model and most other computer 
discourse models is that the latter have been 
devised to handle restricted and artificial 
universes of discourse, such as describing 
games or moving blocks. However, the work 
of Winograd (197Z), for example, is directly 
relevant to our task, since it deals with wider 
aspects of language and makes impressive use 
of Halliday's systemic grammar for producing 
parsing algorithms. 
One of our aims is to make the tagging proce- 
dure as automatic as possible. Specifically, 
we would like to see how far it is possible to 
carry out syntactic analysis based on graphic 
words and prosody (provided by the material) 
and word class tags (provided by a general- 
purpose dictionary). Given that no fully 
automatic system for grammatical tagging 
exists, we propose to implement an interactive, 
semi-manual mode of analysis. 
The paper will present word class tagging of 
types from the Longman Dictionary of Con- 
temporary English, disambiguation of tokens 
and phrase tagging by means of a set of parsing 
algorithms. The basic unit of analysis will be 
the tone unit. In a previous study of Survey 
material of spoken English, it was found that 
the overall average length of a tone unit was 
5.3 words and that "there was considerable 
correlation between the length of tone units and 
their grammatical contents" with a "high degree 
of co-extensiveness between tone units and 
grammatical units of group, phrase, and 
clause structure" (Quirk et al 1964). 
The search for grammatical phrases will be 
from right to left within the tone unit. Since 
this search sequence is definitely unorthodox, 
some explanation may be called for. By and 
large, English phrase structure typically has 
the head to the right, as in 
Verb phrases: will be DOING 
Noun phrases: the nice little DOG 
Adjective phrases: stunningly BEAUTIFUL 
Assuming that a good number of the tone units 
consist of, at least, grammatical phrases, the 
nucleus will occur within the phrase and, more 
often than not, within the head of the phrase. 
Thus, it is likely that it will be linguistically 
rewarding as well as computationally economi- 
cal to search from right to left. It seems that 
a left-to-right search method also runs into 
difficulties with solving left-recursion struc- 
tures and predicting numerous alternatives. 
The phrase recognition rules are to be applied 
in the following order: 
(VPH) Verb phrases 
(APH) Adverb phrases 
(JPH) Adjective phrases 
(NPH) Noun phrases 
(PPH) Prepositional phrases 
29-- 
The typical features of this system are: taking 
tone units as the basis of grammatical analysis, 
choosing a general-purpose dictionary for word 
class tagging, making extensive use of phrase 
structure rules which are applied in a certain 
order and cyclically, and partly adopting an 
interactive mode of analysis. 
Sample i. Computer version of Text S.i.\]: TUs 71-i02. 
W 
s~ 
¢ 
m 
se 
d~ 
o 
w 
¢ 
@ 
I 
! 
,o 
u.J 
t,. Ib 
G 
L. IU 
kJ 
.m ~ li IE =m O" 
E U • "0 ~ • ~"- *"~ .£ • 
¢ ~ ~' f,,3 ~'~ • • ~ e,~ Oi, r"1 (.a 'P" 
~ P. I..--4 i.. ~ ~., O • ~ II IU ~ C~ )'~' 
! 
/.3 i.. -" ,I,. 
.,1 ~2 ~ ~a ~ E 
.'~o E ~ ~" 4:: ( 
ri,,,. ~ *J ~ .P .~o ,m 
iiiH 
aH 
e$ ~ 
o 
( ¢ 
.3 ~z. ~a,.: e,.,p ,~ ~ I 
! ,-- ~ (,3 ~ ~ ~,, ,~" ~ ~ ¢1: 
I J a~l.i;I q q II q ~ ~ ~( tB 
I 1 ~ (~ e~ 4m 41 '1 ~ II J ~1 ~ 4m L& 
qp--. ~lp..- i,- i.m, e,-. qp- ~r-* qr-. ~r-. ~f-.. q,.~ q.-. i-- r,- qp- qr-- ir,m ,ip- ~p- r-" ~p-, qr-. ~r~, 4p.- qp qr-,, qr- qp,, qp., qr-- qp-.. qr- i-, it- ip, j f.. L~) 
C,~& ~ "" qP"= qP" q=" 'qr" qr'- q" r" 1"..4 q~.,-" ~,'- g~,~l ql,'-" ,i-" ~ ~f" r o ~" ~'-' ~r'." ,i,-. 
f~ P~ ~ I,~, ~ I ~ U,~., " I,.. r,J ~i ~ ~ ~ ~ 
~1'-- qP'- qp-. 
I~1 ql- ~" t" q" ~=" i" qP - it" tr I" i-- 
I- qP- 41- r- iP ¢~J qP- i- qr- p r It" 
;I I .................... ;i 
(J ~J C J L~ LJ LJ .J i- e" r q- i- 
qP i- i" P r v-- I- ~P e- e-- i- l-- 
U.. 9L au~ 
lw ~--, ir- 
le ie Je 
qr-- y~ qp-, 
~'~ qP- qr" 
30 
Sample 2. Printed version of Text S.i.i: TUs 71-136. 
A 
B 
A 
B 
A 
> B 
A 
B 
A 
>B 
A 
>B 
A 
>B 
A 
II 
>A 
B 
71 \[o:m\]- - Delllaney's the CA,~N~DIAN • ST~I)ENT {rl..nM{:mm:.rs\] | 7., lilast 
Y i~a r s 
73 lllm hr~\]ll 
774 \[a:\] he llshould have had, his . dissertation "I'NII 73 ((at the)) bellginning of 
M'~YII • 776 ((but)) tile lldamn thing ((hasn't)) COMEI - 777 \[~:\] | lldid gel a 
td~'.~S'tCAaD Fr6M himll - - 778 llsaying that \[o:m\] the ,~lhing is now ,,a~ADYII 
779 and thai he will Usend it by the ,,end • of tJ~NEII • 80 llthat's what he 
/~s/~YSl • 81 llnow • zxA he may not • send it • guile as soon as . ,~THATI 
82 ,,nd I1.11 ts.~ it Ilmay take a hell of a long time to At'~MIill , 84 Ilif he Aputs 
it into the ~,diplomatic IIAGI 85 Ilas In:m\] - AWH~T'S his r-.namel • 
86 Mickey IIC~ItN ~-didll • ~7 Ilthen ((it's)) not so .~DII - 88 Ilbut Io:l ~ow 
/ are YOu going to be i't.ACEDI S9 tlfor ,~.((~HA~VINGI))'~. 
,~0 ~\[~:1 -~ I Ilwouldn't want it before the :,end of June ,~aNYHOW Rt~YNAaDI 
91 bellcause I'm zxgoing I0 MADaI'DII • 92 on the Irr~NrHII ~-~ and Ilcoming 
/ back on the TWENIh-NINTttll - 94 e\[~,:\]~ , I +shall+ "llnot 
95 *llI S{:l,:ll* q6 +llyl~Sll..l. 
94, lli~l 977 allway from home aTII'{NI 98 IINII'IYI I t~9 ill IIANY ralel l(~l the 
II~ND of ~-~ allbout the end of ~UGUST| - - 
1(13 so ~llany time in JUL~I 104 Iland~ AUGUSTI 105 Ilbut \[~:\] + • + 
i06 ~( _ _ a hiss-whistle)* +IIY'ESI+ 
105 ~not too'far into 'August if ~P~SSIBI,I:Ila - 107 II~TtlI~RWISI:J 108 I'1l be 
Ilstuck until about 16~:\] 
/, 10s Alwentielh . \[~\] l'm UlIOPIN(iI II0 to llgel into SPAINI • III from allbout 
/ the ~twenty-. t~lfill'lH of AUGt\]S'rl 112 '((to)) unlltil aboul the Atwentieth or 
,~something of that kind of SEPI-~MB~:r| e. * t,.~ but 
114 ~IIY~.AH II ~ 
113 II\[A~3~W\] altpart from :,TtlATII . 115 !'11 be at IIH~M£1 It0 and al'tlthough 
I'll be doing cs(" t>stuffll 117 and tlthat kind of "IHINGII 118 III can always 
/ 'put it on one ~SIl)El~ 119 and Ilget on with the P~i'F~RII 
120 ~IIY~.AHIIle 121 10:\] yOU Ilsee the ~'Ti4ER ~man~ n2 IK.'H(3MLEYII 
i23 Ilought • Ilought . Ilought ZX~LSOII 124 to have • Ilgot his in on Tt'l,,,tEII 
125 and I SUS'UP~.CH~DII ;26 II~twaYsll ~277 that Delllaney would be L,~TEII • 
/ ;28 thal llChomley would be on "rlMEII 129 and that Ilthis would • produce a 
nice '~ST~G(JERINGI 130 Of • of their arllrival on your ~D~.SKII e-~ ~3~ \[o:m\] 
Itnow it looks as if they they both 
.32 ~ imllllh~lll~ 
131 AR,,,R~VIil IJ3 \[~\] I llthink that we ,muslrl'l worry too +,+much A,~m~UT TH'ISl 
134 Ilwe we limake it ~+perfectly ~lear that :,papers must be in on the ~first of 
:.MXYI '~-* 135 l~:ml 
L~6 ,\[mlltlh~\]I, 
31 
