EXPLORING TIlE ROLE OF PUNCq'UATION IN PARSING NATURAl, TEXT 
Bernard E M Jones 
Centre for Cognitive Science, University of l;klinbnrgh, Edinburgh l'~H8 9ITvV, Scotland 
Email: bernieOcogsci.ed.ac.uk 
ABSTR.ACT 
Few, if any, current NLP systems make any 
significant use of punctuation. Intuitively, a 
treatment of lrunctuation seems necessary to the 
analysis and production of text. Whilst this 
has been suggested in the fiekls of discourse 
strnetnre, it is still nnclear whether punctu- 
ation can help in the syntactic field. This 
investigation atteml)ts to answer this question 
by parsing some corpus-based material with 
two similar grammars - one including rules 
for i)unctuation, the other igno,'ing it. The 
punctuated grammar significantly outq)erforms 
the unpunctnated on% and so the conclnsion 
is that punctuation can play a usefifl role in 
syntactic processing. 
INTRODUCTION 
qT'here are no cnrrent text I)ased natural language 
analysis or generation systems that make flfll use of 
punctuation, and while there are some that make 
limited use, like the \],klitor's Assistant \[l)ale 1990\], 
they tend to be the exception rather than the 
rule. Instead, punctuation is usually stripped out 
of the text belbre l)rocessing, and is not included in 
generated text. 
Intnitively, this seems very wrong. Punctuation 
is such an integral part of written language that it. 
is difficult to imagine natnrally producing any signi- 
ficant body of unpunctuated text, or being al)le to 
easily understand any such body of text. 
IIowever, this is what has been done in the compu- 
tational linguistics field. The reason that it has always 
been too difficult to incorporate any coherent account 
of punctuation into any system is because no such 
coherent account exists. 
I)unctuation has long been considered to be 
intimately related to intonation: that is that difDrent 
punctuation lnarks simply give the reader tiles :_ts to 
the possible prosodic and l)ausal characteristics of the 
text \[Markwardt, t942\]. This claim is questioned by 
Nunberg \[1990\], since such a transcriptional view of 
punctuation is theoretically nninteresting, and also 
correlates rather lradly with intonation in any case. 
However, even if we reeognise that punctuation 
fulfils a linguistic role of its own, it is by no means 
clear how this role is defined. Since there is still no 
concise linguistic aeconnt of the flmction of pllltCtU- 
ation, we have to rely mainly on personal intuitions. 
This in turn introduces new probhmas, since there is 
a great deal of idiosyncrasy associated with the use 
of Imnctuation marks. Whilst most people may agree 
on (:ore situations in which use of a given punctuation 
mark is desirable, or ewm necessary, there are still 
tnany situations where their nse is less clear. 
In his recent review, lhurlphreys \[1993\] suggests 
Llmt acconnts O\['lnmctuation fall into three categories: 
'"I'he first ...is selllessly dedicated to the task of 
bringing Punctuation to the Peasantry ,.. The second 
sort is the Style (I uide, written by editors and printers 
for tile private pleasure of fellow professionals ...The 
third, on tile linguistics 
ranch the rarest of all." 
Thus whilst we do 
lmblishers ' style guides, 
of the punctuation systenr, is 
not really want to rely on 
since the accounts of i)unctu- 
ation they contain are rather too proscriptive and 
concentrate on tile nse of punctuation rather than its 
ine~.tnillg, tim academic accounts of l)nnetnation are 
far from numerous. In the work of Dale \[1991\], the 
potential o1' punctuation in the tiehl of discourse and 
natnral hmguage generation is explored. However, 
little mention is made anywhere of tile role of 
lmnctuation within a syntactic framework. '£herefore 
the current investigation tries to determine whether 
taldng consideration of lmnetuation can further the 
goals of syntactic analysis of natural language. 
PUNCTUATION 
Punctuation, as we consider it, can 1)o defined w~ the 
central part of the range oF non-le×ical orthography. 
All,hough arguments could Ire made for including the 
md)-Iexical marks (e.g. hyl)hens, apostrol)hes ) and 
structural marlcs (e.g. bullets in itemisations), they 
are excluded since they Lend to be lexicalised or 
rather difficult to represent, respectively. Indeed, it 
is difficult t,o imagine the representation of struc- 
tural punctuation, other than through the use of some 
special structural description language such ~m SGM I,. 
Within our definition o\[' punctuation then, we lind 
bro~*dly three types of mark: delimiting, separating 
and disambigu~tting, as described by Nunberg \[1990\]. 
Some marks, the COlnlna especially, fall into multiple 
categories since they can have different roles, and the 
categories each per\[brm distinct lingnistic functions. 
l)elimiters (e.g. comma, (hush, l)arenthesis) occur 
to either side of a l)articular lexical expression to 
remove that exl)ression from the immediate syntactic 
42"/ 
context of the surrounding sentence (1). Tile 
delimited phr~e acts as a modifier to the adjacent 
phrase instead. 
(1) John, my friend, fell over and died. 
Separating marks come between similar gramma- 
tical items and indicate that the items form a list (2). 
They are therefore similar to conjunctions in their 
behavionr, and can sometimes replace conjunctions 
in a list. 
(2) I came, I saw, I conquered. 
I want butter, eggs and titan'. 
Disambiguating marks, usually commas, occur 
where an unintentional ambiguity coukl result if the 
marks were not there (3), and so perhaps illustrate 
best why tile use of puncttmtion within NL systems 
could be beneliciM. 
(3) Earlier, work was halted. 
In addition to the nature of different punctuation 
marks, there are several phenomena described by 
Nunberg \[1990\] which it is useful to consider before 
implementing any treatment of punctuation: 
Point absorption: strong point symbols (comma, 
dash, semicolon, etc.) absorb weaker adjacent 
ones (4). Commas are least powerfnl, and 
periods t most powerful; 
(4) It was John, my fi'iend. 
Bracket absorption: commas and dashes are removed 
if they occur directly before an end quote or 
parenthesis (5); 
(5) ... (my brother, ~.he teacher)... 
Quote tmt, sposition: punctuation directly to the 
right of an end quote is moved to the left of that 
character (6). This phenomenon occurs chiefly 
in American English, but can occur generally; 
(6) IIe said, "I love you." 
Graphic absorption: orthogral)hically, but not lingui- 
stically, similar coincident symbols are absorbed 
(7). Thus the dot marking an abbreviation will 
absorb an adjacent period whereas it would not 
absorb an adjacent comma. 
(7) I wm'k fro" the C.1.A., not the F.B.I. 
In addition to the phenomena associated with the 
interaction of punctuation, there are also distinct 
phenomena observable in the interaction of punctu- 
ation and lexical expressions. Thus delimited phrases 
cannot immediately contain delimited phrases of the 
IThroughout this paper I shall refer to sentence-final dots 
as periods rather than full-stops, to avoid confusion. 
same type (the sole exception may be with parenthe- 
tieals, though many people ob.iect to nested paren- 
theses) and a<buncts such as the colon-expansion 
cannot contain further similar adjuncts. Therefore, 
in tile context of colon and semicolon seeping, (8) is 
ambiguous, but (9) is not. 
(8) words : words ; words . 
(9) words : words ; words : words . 
THE GI~AMMA1L 
Recognition of punctuational phenomena does not 
imply tha.t they can be successfully encoded into a 
NL grammar, or whether the use of such a punctuated 
grammar will result in arty analytical advantages. 
Nunberg \[1990\] adw~cates two separate grammars, 
operatiug at different levels. A lexical grammar 
is proposed \['or the lexical expressions occurring 
between l~unctuation marl;s, and a text grammar is 
proposed for the structure of the punctuation, and the 
relation of those marks to the lexical expre.ssions they 
separate. The text gralllluar has within it distinct 
levels, such as phrasal and clausal, at which distinct 
punctuational phenomena can occur. 
This should, in theory, make for a very 
neat system: l.he lexical syntact, ic processes being 
kept separate from those that handle ImnCtUation. 
llowever, in pracl.ice, this system seems mdikely l,o 
succeed since in order to work, the lexical expressions 
that occur between punctuation marks must carry 
additional information about the syntactic categories 
occurring at their edges so that the text grammar can 
constrain the function of the punctuation marks. 
For example, if a sentence includes an itemised 
noun phrase (10), the lexical expression before the 
comma must be marked as ending with a noun 
phrase, and the lexleal expression after the comma 
must be marked as starting with a nottn phrase. 
A rule in the text grammar could then process the 
sel)arating comma as it clea,'ly Col nes between two 
similar syntactic elements. 
(10) lie lilies Willy, lan and Tom. 
\[e.d: ,,p\] \[sta,'~,: ,,p\] 
Ilowever, as (11) shows, the separating 
comina concept could require intbrmation about the 
categories at arbitrarily deep levels occurring a.t the 
ends of \]exical expressious surronnding punctuation 
rllarks, 
(u) 1 like to walk, skip, and rmt. 
I like to walk, to sldp, and to rtm. 
1 like to walk, like to skip, but hate to run. 
Even with the above edge-category information, 
the parsing process is not necessarily made any easier 
(since often the fllll partial parses of a.II the separate 
expressions have to be held and joined). Therefore we 
seem to be at no advantage if wc use this approach. 
422 
In add(lieu, it is dill\]cult to imagine what linguistic or 
psychological m0tivatidn such a separation of punctu- 
ation from lexical text could hold, since it seems 
rather unlikely that people process punctuation at a 
separate level to the text it surrounds. 
tIence it seems more sensible to use an integrated 
grammar, which handles both words and punctu- 
ation. This lets us describe the iuteraction of lulnetu - 
at(on and lexieal expressions far more logically and 
concisely than if the two were separated. Good 
examples of this are disaml)iguatillg comnlas I ill a 
unified grammar we can simply write rules with an 
optional comma among the daughters (12). ' 
(t2) .~ -~ np (~o,nm,O ~i'. 
S 4 lip (eonllna) s. 
A featnre-based tag grammar was written for 
this investigation (based loosely on one written by 
Briscoe and Waegner \[1992\]), and used in conjun- 
ction with tile parser inchlded in the Alvey Tools' 
Grammar Development Environment (ODE) \[Carroll 
etal, 1991\], which allows for rapid prototyping aud 
e,~sy analysis of parses. It should be stressed that 
this grammar is solely one of tags, aild so is not very 
detailed syntactically. 
In order to handle the additional complications of 
punctuation, tile notion of stoppedness of a category 
liars been introduced. Thus every category in the 
grammar has a stop feature which describes the 
punctu~Ltional character following it (13), and defaults 
to \[st -\] (unstopped) if there is no such character. 
(ca) tll~ man, = \[st el 
with the flowers, = \[st, f\] 
Since the rnles of the grammar further dictate 
that the mother category inherits the stop value of 
its rightmost daughter, ouly rules to specifically add 
pnnctuation for categories which could be lexicalised 
are necessary. Thus a rule for the additional of a 
punctuation marie after a lexicalised nouli would be 
as in lid). ('\['hc calligraphic letters rel)resellt unili- 
cation variables.) 
(14) n0\[st S\] --4- n0\[st, -1 \[punt N\] 
We can then specify that top level categories must 
be \[st f\] (period), that items in a list should be \[st c\] 
(comma), etc. In rules where we want to force a parti- 
cular punctuation mark to the right of a category, 
that mark can be included in the rule, with the 
preceding category unstopped: (15) illustrates the 
addition of a comlna-delimited noun phrase to a iloun 
llilrase. Specifically mentioning tile l)unctuation nlark 
prevents the delimited phrase from being unstopped, 
resulting in an unstopped mother category. Note 
(,hat Cite phenomenon of point absorption has beeu 
captured by unifying the wdue of the st feature of tile 
mother and the identity of tile final punctuation marie. 
Thus the possible vahies of st are all the possible 
values of punt in addition to \[st -\], 
(15) up\[st S\] -~ up\[st c\] np\[st -\] \[punt S\]. 
'J2hus the stop feature seems sufficient to cope with 
tile punctuational phenomena inl;roduccd M)ove. |li 
order to incorporate tile pllenomena of interaction 
betweeu plmctuatiou and lexical expressions (e.g. 
preventing immediate nesting of similar delimited 
phrases), we need to iiltroduce it small Ullnlber of 
additioual features into the graunnar. If, for example, 
we make a comma-delimited noun phr,~se \[cm +\], we 
can then stipulate that any noun phrase that inchides 
a comma-delimited phrase has the feature \[cm \], so 
that the two cannot unify (16). Note that the unifi- 
cation of nmtlter and right-lnost daughter stop values 
is onlitted t7)r clarity of prescntal, ion. 
(is) ,~I,\[<:,,: -\] -~ l,V\[.~t (:\] ,,pill,, +, st, \] 
~,'Ve can iUCOl'porato the relative scoping of coh)ns 
and semicolons, as discussed previously, into the 
granunar w;ry easily too. The semicolon rule (117) 
accepts any vahle of co in its arguments, but the eolou 
rule (18) only accepts fee -\]. The mother category 
of the eolou rule bears the feature fee t-\] to preveut 
inchlsiOll into further cololl-bearing sentences. Note 
that there are more versions of I, he colon rule, which 
deal witll dill'etch( constituents to either side of the 
colou, and also that, since the GI)E does not pel'nlit 
the disjunction of ligature values, the semicolon rule is 
merely an abbreviation of the innltiple rules required 
in the granlmar. ~top unilication is again omitted. 
(17) s\[co (dl V B)\] -~ s\[co A, sl, so\] s\[co B\]. 
(18) s\[,:o +1 -~ s\[<:o -, ~t ~,,\] .+o % 
Ilenc0 the inclusion of a few simph~ extra features 
in it aorlnal granllnar h;_lS achieved an acceptable 
I.reatnlent of lnu~ctuatioual phenomer:a. ,qincc this 
work ouly represents the initial steps of providing a 
full aim pl'Ol)er accounb of tile role of puuc.tuatiou, no 
claims are lllade for the theoretical validity or colriplc- 
teness of this approach! 
THE COIl.PUS 
For the current hlw~stigat\[on it was necessary to use 
a corpus sulliciently rich in lmltctuation to illustrate 
the possible advantages or (lisadvantages of uLilising 
punctual.ion within the parsing process. Obviously 
a sentence whMl inchldes no lmnctuation will be 
equally difficult to parse with both punctuated and 
Ulqmnctuated gralniuars. Sinlihu'ly, for s(~iltCllCes 
including only ()lie or two marks of pllnctilation, l.he 
llSO of punctliatlon is likely to bc raLller procedural, 
and hence not necessarily very revealing. 
Therefore the tagged Spoken English Corpus was 
chosen \['lh.ylor ,~ Knowles, 1988\]. This featlu'es some 
very long seutences, and includes rich and varied 
punctuation. Since IJle corpus has l)cen l)unctnated 
IYlallually, by several different people, some idiosyn- 
crasy occurs ill tile pnnctuatlollal style, I)ul, there is 
423 
little punctuation which wonld be deemed inappro- 
priate to the positidn it'oceurs in. 
A subset of 50 sentences w~ chosen from the 
whole corpus. Between them these sentences include 
material taken from news broadcasts, poetry readings, 
weather forecasts and programme reviews, so a wide 
variety of language is covered. 
The lengths of the sentences varied from 3 words to 
63 words, the average being 31 words; and the punctu- 
ational complexity of the sentences varied from one 
mark (just a period) to 16 marks, the average being 
4 punctuation marks. A sample tagged sentence is 
shown in (19), where fs denotes a period. 
(19) Their_APP$ meeting_NN1 involves~VVZ a_ATI 
ldnd_NNl of_|O life_NN1 swap_NN1 fs_l,'S 
The punctuated grammar, developed with this 
subset of the corpns, was used to parse the corpus 
subset, and then an unpunctuated version of the 
same grammar was used to parse the same subset. 
The reason that testing was performed on the 
training corpus was that, in the absence of a 
complete treatment of punctuation, the pnnetuational 
phenomena in the training corpus were the only ones 
the grammar could work with, and although they 
included almost all of the core phenomena mentioned, 
slightly different instances of the same phenomena 
could cause a parse failure. For reference, a small 
set of novel sentences were also parsed with the 
grammars, to determine their coverage outside the 
closed test. 
The unpunetuated version of the grammar was 
prepared by removing all the features relating 
to specifically punetuational phenomena, and also 
removing explicit mention of punctuation marks from 
the rules. This, of course, left behind certain rules 
that were fimetionally identical, and so duplicate rules 
were removed from the grammar. Similarly for rnles 
which performed the same function at different levels 
in tire grammar (e.g. attachment of prepositions to 
tile end of a sentence with a comma was also catered 
for by rules allowing prepositions to be attached to 
noun and verb phrases without a comma). 
I~ESULTS 
Results of parsing with the punctuated grammar were 
very good, yielding, on average, a surprisingly small 
number of parses. The number of parses ranged 
fi'om 1 to 520, with an average of 38. This average 
is unrepresentatively high, however, since only 4 
sentences had over 50 parses. These were, in general, 
those with high numbers of punctuation marks, all 
containing at least 5, as in (20). Ignoring the four 
smallest and four largest results then, the average 
number of parses is reduced to just 15. Example (21) 
is more representative of parsing. On examination, 
a great number of the ambiguities seem to be due 
to inaccuracies or over-generality in the lexieal tags 
assigned to words in the corpus. The word more, for 
example, is triple ambiguous as determiner, adjective 
and noun, irrespective of where it occurs in a sentence. 
(20) (The sunlit weeks between were fifll of maids: 
Sarah, with orange wig and horsy teeth, was so 
bad-tempered that she scarcely spoke; Maud 
was my hateful nurse who smelled of SOal) , an(I 
forced me to eat chewy bits of fish, thrusl;ing 
me I)ack t.o babyhood with threats of nappies, 
dummies, and the feeding bottle.) 
520 l)unct, uated parses 
(21) (More news about, the reverend Sun Myung 
Moon, lbunder of the Unification Church, who's 
currently in jail fox" tax evasion: he was awarded 
an lmnorary degree lasL week hy the Roman 
Catholic University of la Plata in l/uenos Aires, 
Argentina.) t8 punctuated parses 
Besides the ambiguity of corpus tags, a l)roblem 
arose with words that had been completely mistagged. 
If these caused the parse to fail completely, the 
tag was changed in the development phase of tile 
grammar, but even so, the number of complete 
mistags was rather small in the sub-corlms used: 
around 10 words in the 50 sentences used. 
Initial attempts at parsing the corpus subset using 
the nnpunctuated version of the grammar were unsuc- 
cessfl, l on even the most powerfifl machine awtilable. 
This was due to the failure of the machine to represent 
all the l)arses sinmltaneously when unpacking the 
parse forest produced by the chart parser. A speciM 
section of code written for the (~I)E (grateful thanks 
are due to John Carroll for supplying this piece of 
code) to estimate the munber of individual parses 
represented by the packed parse-forest showed that for 
all but the most basically punctuated sentences, the 
number of parses was ridiculously huge. The figure for 
the sentence in (211.) w,ts in excess of 6.3x 10 le parses! 
F, ven though this estimate is an upper bound, since 
effects of feature value percolation during nnpaeldng 
are ignored, it has been fairly accura.te with most 
grammars in the past and still indicates that rather 
too many parses are being produced! Not all sentences 
produced such a massive number of parses: the 
sentence in (22) yielded only 1.q2 parses with the 
unplnletuated granlmar which was by far the smallest 
nnmbcr or nnpttnctuated parses. Most sentences that 
managed to pass tile estimation process produced 
between 10 (; and 110 9 parses. 
(22) (Ih'otestants, however, are a tiny minority in 
Argentina, and tile delegation won't be 
including a. I~.oman Catholic.) 
9 punctuated parses 
On examination of tile grammar and tile corpus, 
it is possible to understand why this has happened. 
'I'he punctuated grammar had to allow for sentences 
including comma-delimited noun phrases adjacent to 
undelimited noun phrases, as illustrated by the rules 
(15) and (16). These are relatively easy to mark and 
recognise when the punctuation is available, Itowever, 
424 
without punctuational clues, and with the under- 
specific tagging system, any compound noun could 
appear as a set of delimited noun phrases with the 
unpunetuated grammar. 
Therefore the unpunetuated grammar was filrther 
trimmed, to such an extent that parses no longer 
accurately retlected the linguistic structure of the 
sentences, since, for example, comma delimited 
noun l)hr~es and compomtd nomls became indistin- 
guishable. Some manual preparation of the sentences 
was also carried out to prevent the reoccurrance of 
simple, but costly, misl)arses. 
"\['he results of the parse now became nmch more 
tractable. For bmsie sentences, as predict,ed, there 
was little difference in the performance of punctuated 
and unpunetuated gramlnars. Results were within 
an order of magnitude, showing that no signiticaut 
adwmtage w,'Ls gained through the use of lmnctuation. 
'l'he sentences in (23) and (24) received t and 11 
parses respectively with the unpunetuated grammar. 
(23) ('vVell, just recently, a (lay conference on 
miracles was convened by the research 
scientists, Christian Fellowship.) 
4 punctuated parses 
(24) (The assembly will also be discussing the Lit( 
immigration laws, lIong Kong, teenagers in the 
church, and of course, chur(:h mdl.y schemes.) 
2 punctuated parses 
(25) (They want to know whether, for instance, in a 
scientific age, Christians can really believe in 
the story of the feeding of the five thousmM as 
described, or was the miracle that those in the 
crowd with food shm'ed it with those who had 
none?) 24 punctuated parses 
l"or the most complex sentences, however, the 
number of parses with tl,e unlmnctuated grammar 
was t,ypically more than two orders of magnitude 
higher than with the punctuated grammar. The 
sentence in (25) had 12,096 unpunctuat,ed parses. 
Parsing a set of ten previously unseen l)UnCtU- 
ationally complex sentences with the l)uncttmted 
grammar resulted in seven of the ten being 
unparsable. The other three parsed successfully, 
with the number of parses failing within the range 
of the results of the first part, of the investigation. 
The parse failures, on examination, were due to 
novel punctuational construct,ions occurring in the 
sentences which the grammar had not been designed 
to handle. Parsing the unseen sente,~ces with the. 
unpunetuated grammar resulted in one parse failure, 
with the results for the other 9 sentences rel'lectiug 
the previous results for complex sentences. 
DISCUSSION 
This investigation seems to supl)ort the original 
premise --- that inclusion and use of punctuational 
phenomena within natural language syntax can assist, 
the general aims of natural language processing. 
We have seen that for the simplest sentences, use 
of punctuation gives us little or no advantage over 
the more simple grammar, but, conversely, does no 
harm and can reflect the actual linguistic construction 
a lit,t,h', more accurately. 
For the longer sentences of real language, however, 
a grammar which makes use of punctuation massively 
outperforms an otherwise similar grammar that 
ignores it. Indeed, it, is diiIieult to see how any 
grammar that takes no notice of punctuation eoukl 
ever become successful ~d. analysing such sentences 
mfless some huge amount of semantic and pragmatic 
knowledge is used to disambiguate the analysis. 
I\[owever, as was shown by the attempt at parsing 
the novel sentences, knowledge of the role of punet,u- 
alien is still severely limited. The grammar only 
performed reliably on those l)unctuational phelmmena 
it, had been designed with. Unexpected constructs 
caused it to fail totally. Therefore, following l, he 
recognition that l)unctuation can play a crucial role in 
natural language syntax, what is needed is a thorough 
investigation into the theory of lmnCtuation. Then 
theoretically based analyses of lmnctuation can play 
a full and important part in the analysis of language. 
ACKNOWLEDGEMENTS 
This work was carried out under Esprit Acquilex-II, 
lIRA 7315, and an ESRC l/,eseareh Stndentship, 
1/.004293:1,1171. '\['hanks tbr instrt{etive and helpful 
comments to Ted Briseoe, John Carroll, Rol)ert Dale, 
Ilenry 'Fhompson and anonymous CoLing reviewers. 
R+EFEI~ENCES 
Ih'iseoe, E J and N Waegner (1992). "Robust 
Stochastic Parsing Using the Inside-Outside 
Algorithm." In Proceedings, AAAI WorL'shop on 
Statistically-based NLP Techniques, San Jose, CA. 
Carroll, J; E J Briscoe; and C (.\]rover (1991). 
"A l)evelol)ment, Enviro,mmnt for l,arge Natural 
\],anguage Grammars." TechuieaI I{.eport 233, 
Carol)ridge University Computer Lal)oratory. 
Dale, I{ (1991). "l",xploring the Role of l)ulwtu- 
ation in the Signalling of Discourse Struetm'e." 
In Ibveeedings, Workshop of Text Representation 
and Domain Modelling, T. U. Berlin, ppll0-120. 
\])ale, 1~ (1990). "A Rule-based approach to 
Couaputer-Assisted Copy Editing." Computer 
Assisted Language Learning, 2, 1)1)59-67. 
lhmaphreys, l:L L (1993). "Book Review: The Lingui- 
stics of Punctuation." Mochlne Translation, 7. 
Markwardt, AlI (1942). httrodvction to the English 
Lam.luage, Oxford University Press, New York. 
Nunberg, O (1990). The Linguistics of Punctuation, 
CSLI Leetnre Notes 18, Star, ford, CA. 
Taylor, L J and G Knowles (1988), Mamtal of Infor- 
mation to Accompany the SEC Corpus, University 
of Lancaster. 
425 
