Tagging of very large corpora: Tol)ic-Focus Articulation 
Eva Burfifiovfi and Eva Haji~ovfi and Petr Sgall 
\]nsl;il;ul; of Formal and Applied Linguistics, 
Faculty (If' Math('.matics an(t Physics 
Charles University, Prague, Cze(;h \]2,etml)li(: 
Abstract 
A\['ter a. bri(;f chara(:teriz:~tion of the th(;ory of the 
tot)i('-fo('us a rticulatioi~ (if the s('.nt('al(:(,. ('I'FA), 
rules 3A'c formulated that (letermin(; I:he a,~sign- 
menI; of al)t)rol)riate values (If |;hi'. "J'\]'\~\ a ttril)uÁ;(~ 
ill I;he l)ro(:(',qs of synl;a(:i;i(:o-s('manti(: tagging of 
i~ very large ('orlms lit' Cz(;(:h. 
1 Introduction: The Prague 
Dependency Treebank (PDT) 
PDT is a corpus (a part fl'om the Czech Na.tional 
Cortms), tagged on th(; following h'x('.l>: 
1. mort)hemi(: (POS and a illlOl;al;ions using a, 
v(n'y large nmnl)('.r of i;ag G :is r/'quired 1)y 
the , with rich intl(~(:ti(/n; (:1". (\]lajiC: 
and llladk(~, \] 997)); 
2. 'mmlyl;ic' (del)en(lelwy syntax, with node,q 
for all word o(:cm'r(,.n(:(>, also for plln(;tlla- 
tion ma,rks etc., aim wit\]~ the tags for roof 
t)hemic units and for 1)asic kin(t,q of surfa(:e 
syntactic rch~tion.q (Slfl).je(:t, O1).j('.(:t, Adver- 
t)ial, A(ljun('t), (:f. (Ila.ji~,) 
3. tcctogrammatical (und('.l'lying) syntax, 
with a iIluch lllOr(; detailed classifit:ation 
of synl;actic relal;ions and with nodes tbr 
aul;o,q0.manl;ic lexical oc(-urren('es only 
(ra.|;her tha.n flln(:l;ion words), with indices 
corresponding to the syntactic relations, 
such as Actor, Addressee, Objective (Pa- 
tient), Locative, Mmmer, Means, etc., and 
to mort)hologieal values sudl as Preterite 
(Anterior), Conditional, Plural, etc., and 
also as the prototyl)ical values of 'in', 'into', 
%n', ~from', etc.; ('ol'r(!lates of flmctional 
words (a.nd morph(;m('~s) on this leve, l ha v('~ 
the form of indices of lexi('al nod(', labels.l 
1An exception concerns coordinating conjunctions, 
which, in PDT, are. treated as head nodes of the (:o- 
2 Representing Topic-Focus 
Articulation (TFA) in TGTSs 
2.1 A I)rief characterization of TFA 
'l'h(; te(:togranunatical tr(,.e struct;ures (TGTSs) 
should (:alIi;Ul'('. nol, only the syntactic ((l(,.1)(;n- 
/Mmy) relations, lint also the. TFA of the ut- 
t(;ran(:es in the corpus, sin(:('. TFA is cx1)resscd 
l/y grammal;i(:al me,ms and is releva.nt for the 
meaning of (;he sentenc(; (even for its trut\]t (:on- 
ditions), i.e. it; constitutes one of the basic as- 
1)e(:ts of un(l('rlying structures. Tlm scmanli(: 
reh',van/:c. (hi' TFA can be illustra.t('d 1)y (~xaml)lcs 
such as (1), wlfi(:h is a translal:i(m of the Czech 
(.'x. (1') (the capitals (l('amt(*. the. 1)la(:(;m(mt of 
th/'. int()naCion /:c.ntr(', i.e. I;tm focus t)rol)er): 2 
(1) 0,) 1;.,..d.i.4,. i.,..~.vo/..c.,, i.,. t/,.,; StlJ;7'LANI)S. 
(b) i',, l.h,e ,%cl, hm, ds, lz,'NGLI,2H is ,~'pol,:e',,. 
(~') (,~) A,..d.id..:,j .~, .,,,.l.,,..,,~ ,,.,. Shctl,.',,.a.~t,::,j4,. 
0,~ Tll, 0 VI~ CH. 
ordinl;('d groul)S. This makes it; l)ossibl(, to ret)resent 
l;he I;(}(;I;og~rantlllai;i(:al st;rll(:l;llres of all s('dlt('.ilt;es a.q I;lee.q 
(rath(,., than using more-dimensional net:works); in this 
point, PDT ditlers fl:om the theoretical assumt)tions of 
th('. l)ragnian lqmctional Gen('xativ('. Descril)t, ion (now 
discussed in (Haji~':ov(~ (¢ al., 1998)). 
~In the 1)rol, otyt)i(:at case the intonation (:e.ntre is char- 
acterized 1)y falling (or rising-falling) stress, but there are 
also cases in which (similarly as in questions, to a cer- 
tain degree) the centre has a rising stress. This concerns 
utterances displaying a featm'e of hesitation or incom- 
pleteness, of. (M.,); ofte.n also with greet, ings (such as 
Czech Dobrd jihv \[Good morning\]) a difference of this 
kind marks the 'starting' token, connected with the ex- 
pectation of an answering token, which exhibits a riffling 
sl;ress. Although in it S(~ll|;(*dlCC containing occurrences of 
l)oth a rising aild & falling sLress the former exl)resses a 
contrastive (part; of) topic, we l)retier to analyze it its the 
fOCIlS ill ~ SC'II|;CI\].C( '. wiLhoul, all ()c(;urrellCe of the lal;l;er; 
in such a l)osition, the rising stress regularly is carried 
1)3' an item referring to 'new' information. In written 
t;ext;s, some occurrences of |;he rising stress are marked 
1) 3, a semicolon or by '... '. 
139 
(b) Na Shetlandsk~jch ostvovech se mluv( 
ANGLICKY. 
The conmmnicative function of the sentence 
can basically be rendered by understanding its 
topic (T) as 'what is the sentence about', and 
its focus (F) as the information that is asserted 
about the topic, i.e., schematically, the interpre- 
tation of the sentence S can be understood as 
s - F(T) 
Thus, (1)(a) asserts, on its preferred reading 
(with just the locative modification constituting 
its focus) about where English is spoken that 
it is in the Shetlandt, which hardly can be ac- 
cepted as true w.r.t, what we know of the actual 
world, if no specific context is present. (1)(b) is 
understood as true, stating about E. that it is 
spoken in the S. 
In the TGTSs the order of nodes is such that 
all parts of T precede all parts of F. Moreover, 
the order of nodes corresponds to the scale of 
communicative dynamism (CD, see Section 3 
below); a less dynamic node prototypically has 
the broader scope than a more dynamic one (if 
the nodes correspond to operators). F proper is 
then the most dynamic (the rightmost) node. 
TFA is relevant also tbr the semantics of nega- 
tion: 
(2) John 
(a) 
(b) 
didn't come because he was ILL. 
The reason for Jolm's not-coming was 
his illness. 
The reason for John's coming (e.g. to 
the doctor) was not his illness but 
something else (e.g. he wanted to in- 
vite the doctor for a party). 
With the paraphrase (a), the negated verb 
'come' is included in T, i.e. the fact that John's 
being ill is the cause of an event is asserted about 
the event that he did not come. With (b), the 
main verb 'come' alto belongs to T, but what it 
negated, is the relation between T and F: John 
came, but what is asserted about his coming is 
that the cause of this event was not his illness 
(he might have been ill, though). 
Every node in a TGTS is either contextually 
bound (CB) or non-bound (NB); this opposi- 
tion is a linguistic couterpart of the cognitive 
dichotomy of 'given' vs. hmw', where also an 
item, if corresponding to a 'given' referent pre- 
sented as occupying a newly characterized spe- 
cific position (often in relation to one or more 
'given' items), has the feature NB, cf.: 
(3) Give th, is to YOUR mother. (My parents 
don't like s~tch gifts.) 
kno',,,s  oth  ete," ,lane.) Ho,. 
ever, th.is time she only invited IIER. 
The indexical pronoun 'your' in (3) and the 
anaphoric pronoun 'her' in (4) can only rethr to 
items that in a sense are 'known' in the given 
situation. However, in these examples, both of 
them occur as NB; their stress indicates their 
flmction as F proper of the respective sentence. 
Prototypically, an NB node belongs to F and a 
CB node is in T; however, a node not dependent 
immediately on a finite verb (esp. an adjunct) 
need not meet this condition. Thus, in (5), 'my' 
as a shifter, directly determined by the condi- 
tions of the discourse, is CB, although belong- 
ing to F, since it; depends on a part of F (see 
(HajiSovi~ et al., 1998) fbr a definition of T and 
F on the basis of contextual boundness and of 
syntactic dependency, as well as for other details 
of the given descriptive frmnework). 
2.2 The attribute TFA in PDT 
Three values of the attribute TFA are distin- 
guished with every node in a TGTS: 
1. T a non-contrastive CB node, which always 
has a lower degree of CD than its governor, 
if any; 
2. F an NB node (if different from the main 
verb, then following after its head word in 
the TGTS) 
3. C a contrastive CB node 
Examples: 
(5) (VoIby v Izracli.) Po volbdeh.(T) si 
IzvaeIci(T) zvykaji(F) na novdho(F) pre- 
mid,'a(F). 
(Headline in tile newspapers: Elections ill 
Israel.) After the elections(T), the It- 
raelis(T) get used(F) to a new(F) Prime 
Minister(F). 
140 
(c,) &o,.~,o,,,(,(:(c) o,,,(T).#(P) (lol,,.:,;(F), (,,~(., 
.j(,ko politik(C) v.evynikd(F). 
(As a) St)ortsman(C ) he(T) is(F) g(,od(F), 
but as a politician(C) he does not ex(:el(l?). 
The instructions for the assigmnent of the 
values of TFA can be briefly sl)e('itied as fol- 
lows, if the surface word order and the 1)osi- 
tion of the intonation center (IC, see fl)otnote 
2 above) is taken illtO account, as well as /;he 
%ysi:(;mie' (canonical) ordering of the kinds of 
dependents (wtfich, in fact, (:ml difl'er with dif- 
fer(mr hc.ad words; SO is Sl)e(:itie(t either in the 
valen(:y flames i1: the in(livi(hml lexi(:al entries, 
or, if i)ossibh.', fl)r whole lexical (:lasses and sub- 
(:lasses): 
1. 
2. 
3. 
4. 
( " • ,. ,, : the bearer )\] \] C ~ I i' t;vt)i(:allv the right- 
most del)endent of the verl) 
if the IC is placed (m a nod(~ other than 
the rightmost one, th(', (:Oml)lem(',ntai;ions 
1)laced after IC ~> T 
a left side (lepend(mt; of the verl) ~ T o1' C, 
except for cases in which it (:learly ('arri(> 
1C 
th(: verb and lhose of its d(:l)endents tlmt 
stan(l \])el;weell l;he ver\]) all(l the F-llotl(: (se(: 
1) and thai; re'e. or(h'.red (without all inter- 
v(ufing sisW.r node) a(:(:or(lil~g to SO ~ F; 
among sisi;(:r nodes, all those carrying ~.\[" %l- 
low afl, er all those with C, and all those ('at- 
tying F follow after all those with T; there 
a.rc two sets of (':(:eptions: 
(a,) a. fo(;llS sensitive i)m'tiele can (:arry 
F even when i)l"e(:edillg its governing 
node that carries C, of. Se(:ti(m :3.2 be- 
\]O\V 
(b) ~ node M ca,rrying T or C can tbl- 
h)w after its nlol;her node if a node 
with F is 1)resent alnong the nodes 
subordilmte to M, })ut is M)sent both 
mnong the sisters of M mM among 
its superot'dinate nodes (here the re- 
h~tion of 'superor(linate' and %ubor(li- 
nat(;' is the tra nsil:ive (:losm(: of 'gov- 
erning' mid Mq)(;ndent'); (:f. the :lO- 
tion of 'l)roxy fo(:us', (:hara(:terized in 
(ilaji~ovit el; al., 1998), and extort- 
ples such as (Kierdh, o u(7-it, ele j.si tam 
vidS.lQ l/idS1 jscm tam u~.ite.Ic ch.emie 
\[lit. (Which t;eacher.A(:cus have-you 
the,e ,~ee.?) I s~w the,'e (the) te.cl~e," 
of-chemisl;ry\], with which the Patieltt 
'ltrTitch', follows after the verl) in the ui:- 
derlying tle(}, although it carries 3.' 
Note: For Cze('h, the SO of the main tyl)eS 
of dependency has 1)een found (on the 1)asis 
of eml)irical mmlysis of texts and of exper- 
iments with groups of speakers, see (Sgall 
eL al., 1995)) to h~vc (with most verbs and 
other heads) the tbllowing form, as for the 
main kinds of dependents: 
A(:tor- i rl'(~mt)oral ,:: Lo(:atiolt -:; 
lnstrmnent ,: Addressee-; 1)aticnt 
1,2Ithet a 
5. eml)(~(hh',d a.t;tril)utes =~> F (unless they are 
on\]y re, l)eat(',(l or restored) 
6. il:dexic, a l expre, ssiolm (jd lIl, l,v \[youl, l,(,d' 
I,:,)wl, t(.z:,j Ihei'e.l, we~,k for,.s of p~'o- 
nouns, pronomina.1 expressions with a gen- 
e,.',~l .,(;~:.i,,g (,.;Z~do I,~o,nebodyl, :i~d,~o',~ 
\[once upon a timel...) ~ T (except ill 
cases of (:ontrast or as bearers of IC) 
7. si;rong forms of pronouns --> F (after 
t)rel)osil:ions an(l in coordinated (:OllStru(:- 
t;ions: l;he, assignment of T or F in @zc(:h is 
gui(lcd by (;it(', g(mcral rules l through 4) 
8. restored lmdes, deleted in the surf:we forms 
of s(~,ll{,(~llces ~ T; we devote Section 2.3 
below to l;he 1)lacelllOllt of the, restored 
nodes Note: There are special cases of (:o- 
ordination, both in Cech and in English, 
which do not mee, t this eolMition: e.g. in 
"l'hey drank white a.nd red win('? the firsl; 
occurr(m('e of %vine', which m~y be NB, is 
delet;ed in the surface (and restored in the 
TGTS). 
9. a node N dei)endent to the left in a way 
not meeting the conditiol: of 1)rojectivity: 
C (this node is then placed lllore to 
the right, to meet that condition; these and 
;~Let us not(: that Dirc(:tional.3 ('where to') tbllows 
aft, er Patient in Czc(:h as well as in Fmglish and also in 
Gc, rman, a(:cording to the Cml)irical research discussed in 
(M.,); t:lms i( is not exact to characot;riz(; the canonical 
order of German as a "mirror image" of that of English. 
141 
other movements are discussed in Section 
2.4: below) 
10. the nodes subor(linate to such an N move 
together with it and get T or F (according 
to the rules above) 
Note: The resulting TGTSs are projective, 
i.e. t br every pair of nodes x, y in a TGTS it 
holds that if x depends on y and x follows (pre- 
cedes) 37, then every node z following (preceding) 
y and preceding (following) x is subordinate to 
y. Thus, 'not to meet the condition of projec- 
tivity' concerns tim 'analytic' trees; this means, 
in other words, that this condition would not be 
met if the positions of x and y in the left-to-right 
order of the nodes in the TOTS (in the 'under- 
lying word order') always corresponded to their 
positions in the surface (morphemic and %na- 
lytic') word order. 
Example (with a very simplified linearized no- 
tation of the TGTS, in which every dependent 
is closed in its pair of parentheses): 
(7) K jdsotu(C) neni(F) nejmen.~:f(F) 
For triumphing is-not the-least 
&~vod(F). 
reasoll 
(r') (neg.F) bTjt.F ((jdsot.C) d,fi, vod.F 
(neg.F) be.F ((triumlflfing.C) reason.F 
(least.F)) 
A sentence with a non-prototypic~fi placement 
of the IC: 
(8) (Vdtgina m, inistr'gt St@aginovy novd vlddy 
patti k v&'n~m dr,uh,~irn ncjzndmgj,~'\[h,o 
ruskdho intrikdna Berezovskdh, o. ) 
(The majority of the miifisters of St6pa~i- 
ney's new government belongs to faithfifl 
fi'iends of the best known Russian intriguer 
Berezovskij.) 
I(F) AKSJONENKO(F) u(h'2ujc(T) 
Even(F) AKSJONENKO(F) keeps(T) 
s Bcrezovsk~rn(T) blfzkd(F) 
with Bere ovskij(T) dose(F) 
styky(T). 
contacts(T). 
2.3 The position of a restored node 
The degree of CD of a node that is being re- 
stored (i.e. supposed to have been deleted in 
the surface form of the sentence), and thus also 
its position in the underlying word order, is de- 
termined on the basis of its relationship to its 
governing node. Since such a node ahnost al- 
ways is contextually bound (with the exception 
of the specific case of coordinated structures, see 
the Note after point 8 in Section 2.2 above), it 
is placed to the left of its governing word; more 
specifically: 
(a) if the restored node RN depends on a verb, 
then: 
(b) 
(c) 
(aa) if RN is not the single item depending 
on the given verb token, then RN is 
to be added in the 'Wackernagel posi- 
tion'; 
(ab) if RN has no sister nodes, then it is 
placed at the beginning of the clause; 
if RN is restored as depending on a noun 
(or adjective), I{N is placed as the least dy- 
namic dependent of this governing word; 
if more than one node are inserted as de- 
pending on one and the same item, then 
their order should confornl to tile systemic 
(%anonical') ordering of the valency slots 
(see the remark on SO in Section 2.2 above, 
point 4). 
Point (a) appears to be substantiated by the 
fact that e.g. the subject t)ronolln appears ill the 
zero form in Czech under similar conditions as 
the weak, clitic pronouns, for which the position 
imlnediately to the left of the verb is typical, cf. 
sentences such as VSera (on) p~'igel pozd5 \]Yes- 
terday (he) canto here late\], Janu (oni) nevidSli 
\[lit.: .Jane-Accus they have-not-seen\], o1" (On) 
spal \[He was-sleeping\]. This concerns also such 
deletable items as e.g. the Directional with pfi- 
jet \[arrive\], cf. Jan dnes (sere/tam) ncp~'~;jcl \[lit. 
.John to-day (he,'e/there) has-not-arrived\]. 
The appropriateness of these preliminary 
rules is being checked during the tagging proce- 
dure, the results of which will be of importance 
for a more exact (and more complete) formula- 
tion of the relevant parts of the description of 
the sentence structure of Czech. This aspect 
142 
of the useflflness of the corpus tagging concerns 
also many ol;h(;r 1)oinl;s of grammar. 
2.4 Underlying and surface word order 
Within the tagging procedure, tim differences 
between the two levels of the left-to-right order 
can be described 1)y movelnent rules, a prelimi- 
nary tbrm of which can be brietly characterized 
as follows: 
1. if a node 1111 carries C and a node M2 de- 
l)ending on M1 is 1)laced to the right of a 
node M3 superordinate to M1 in the surtkce 
word order, then M1 is placed immediately 
to the left of M2 in the resulting tree; cf. 
e.g. &o,'tov~',c (M1) o,, .# (M3) dob,",'j (M2) 
Ilit. (As a) sportsman he is goodl, see ex. 
(6) in Section 2.2 
2. if the 1)ositions of the nodes MI, M2 and 
M3 differ front l)oint \] only in t;hat M1 (h> 
pends on M2, then again M1 is placed im- 
mediately to the left of M2 ill the resul/:ing 
tree; of. exanll)le (7) ill Se(:i;ion 2.2 a\])ove, 
in which jdsot occut)ies the position of M\], 
d,ivod that of M2, and neni that of M3, or: 
(9) ,lirku (M1)j.sme pld'novdi(M3) 
po,~'la, l, (M2) do F'r~n(:i('. 
IliL George.Ac(:us (M1) we-1)la\],ned 
(M3) 1;o-send (M2)to \]Clan(:e\] 
3. ~ compar~tive of an ~Mje(:tive thai; \])rece(les 
its governing 1).OUll in t;he surface is moved 
to the right of this noun in (,,xamt)les such as 
vdt.¢i re&to nc~ 13o,s'to',, \[a. hn'ger town than 
Boston\]; I;his surface order probably should 
be limited (by a rule of grammar) to cases 
in which the two nouns 1)elong to a single 
semantic sul)class. 
4. in sentences exhibiting a secondary place- 
ment of IC, the bearer of IC occupies the 
rightmost 1)osition in the resulting l:ree; cf. 
example (\])(b) in Section 2.1 al)ove, in 
whi('h 'English' is tile t bt:us prol)er; the as- 
suinl)gion underlying the. t)lacemenl; of IC 
in a written text is that g~ written form of ~ 
sentence may correspond to dit\[erent (silo- 
ken) sentences, according to the differences 
of the 1)lacement of IC in the al)l)ropriate 
way of 1)renouncing t;he sentence. 
3 The special case of focus sensitive 
particles 
Since the focus sensitive particles are idengified 
(1)y the flmctor value RHEM for 'rhematizer' or 
'focalizer'), it is possitfle to use PDT also for 
a sl)ecitication of their occurrences in different 
positions 1)oth in the det)endency structure of 
the sentence and in its TFA. Tile starting hy- 
l)otheses, which might be checked on the basis of 
PDT, are. as tbllows (of. (Hajieov5 ctal., 1998)): 
3.1 Focus sensitive particles in 
i)rototypical positions 
The 1)rotol;yl)ical syntactic position of a foc, al- 
izer ca.ll t)e understood as that of a dependent 
of a verb node; thus, in examples like (10) or 
(11), it is 1)ossible to specit:y lhe scope of the 
foealizer as the whole subtree subordinated to 
lhe verl) (where "sul)ordilml;ed' is undersl;ood as 
t\]le transitive closure of klel)en(lent' in the re- 
flexive s('.nsc, so I:hat the, verl) itself is in('luded); 
the st'Ol)e is divided into 1)a(:kground and focus 
of the fl)calizer (ti:'), as will 1)e specified in 3.2. 
Thus, in the interl)retation of (10) on the read- 
ing ret)rcsented (with many siml)lifications) by 
(10') it is included that (according to what P. 
knows) among l;hose whom % saw there was 
noone else t;han M (i.e. while 'T. saw' consti- 
l;lll;es l;he 1)ackground of 'only', its fl" is 'Mary'). 
Similarly, if in (11) the negation (all;hough ex- 
\]n'css('d l)y ~t prefix in Czech) is handled as a 
det)cn(lelfl: of the \,er\]), its bad{ground is the 
subject and tt' includes 1)oth the vcrl) an(l t;he 
oh.iect. 
(10) Pavcl v'\[, ~'. Tomd.# 
'Paul knows that Thomas 
vidH .je'n MAIUL 
saw only MAI~Y.' 
(Paul) knows ((Tholnas) sa.w (only) 
(Mary)) 
Martin ne(~te NOVINK 
~Marl;in nol;-rea(ts NEWSPAPERS.' 
(10') 
(11) 
\]n (12) only the adjective constitutes the ff of 
'only', its background consisting of 'car' (among 
all cars, P. only wants a blue one); thus, the fo- 
calizer can best })e described here as dq)ending 
ell ~car'. 
(12) Pct, r ch, ce .jc.n MODIU2 auto. 
'Petr wants o1:1.5, (a) 13\],UE car.' 
143 
3.2 Focus sensitive particles in the 
hierarchy of comnmnicative 
dynamism 
The primary position of a focalizer ill a TR is at 
the boundary between tile topic and the focus 
of the verb clause and the tbcus of tile clause is 
then identical to tile focus of tile focalizer. If a 
fbcalizer is included in the topic, then its focus 
contains those items which in the TR are placed 
between this focalizer mid the next item ularked 
as C to tile right and are nlore dynamic than the 
tbcalizer). 
It should be noted that CD is understood here 
as a partial ordering defined so that: 
(i) in every set of a head and its daughter 
nodes, every daughter node placed to the 
right of its head is more dynamic than ev- 
cry daughter node placed to the left of its 
head; 
(ii) the relation 'more dynanfic' is deternlined 
by the irrettexive trausil;ive closure of (i). 
~i'hus, e.g. in the TI{ (10'), 'knows' is more dy- 
nalnic than 'Paul' and less dynmnic than 'saw' 
according to tile point (i), and both 'only' and 
'Mary', being more dynanlic titan 'saw', are 
more dynmnie than ~knows' according to the 
point (ii); however, ~Thomas' is neither more 
nor less dynamic than 'knows'. If (10) is cut- 
bedded into a more conlplex sentence as (a part 
of) its topic, titan 'Mary' is more dynanfic thml 
%nly' and has the f~atm'e C; thus, e.g. with 
'Since Paul knows that Thomas saw only Mary, 
he is not afraid', 'Mary' constitutes the whole fl 
of 'only', similarly as in (10'). 
Tile underlying word order W (a linear order- 
ing) is then defined on the basis of CD, with (iii) 
and (iv) holding tbr every two nodes x and y in 
a tree: 
(iii) if node x is nlore dynamic than node y, then 
x tbllows y under W; 
(iv) if node x follows node y under W, node u is 
subordinated to x and node z is subordinate 
to y, then u tbllows both y and z, and x 
follows z under W. 
Among tile non-prototyt)ical , secondary posi- 
tions of tbcalizers, there are also the cases of 
their clustering (e.g. 'not only'), as well as the 
sentences in which a focalizer itself constitutes 
the whole locus of tile sentence ('He DID realize 
this'). 
4 Summary 
After a brief characterization of the Prague De- 
pendency Treebank and of tile Praguian theory 
of Topic-Focus Articulation we have presented 
a proposal how the main aspects of tile intbr- 
nlation structure of the sentence (i.e. of its 
topic-focus articulation) cml be integrated into 
the tagging system that captures the underly- 
ing structures. The present form of the system 
nmkes it possible to check our hypotheses on a 
large text corpus, and thus perhaps to achieve 
a higher degree of automation (and reliability) 
of the proposed procedure. The last section ex- 
emplifies how the t)roposed approach makes it 
possible to analyze structures with the so-called 
focus sensitive operators. 

References 

Jan Hajic. Building a syntactically anno- 
tated corpus: Tile prague dependency tree- 
bank. In E. Hajif:ov{~, editor, Ls's'ues of Va- 
lency and Meaning, Studies in Honour of 
Jarnlila Panevov5, pages 106 132. Karolinum, 
Prague. 

Jan Hajic and Barbora Hladkfi. 1997. Probabilistic and rule-based tagger of an inflective 
 - a comparison. In Proceedings of 
the Fifth Conference on Applied Natural Language Processing, pages 111-118, Washington, D.C. 

Eva Ha.iiSovi~, B. Partee, and Petr Sgall. 1998. 
Topic-focus articulation, tripartite structures, 
and semantic content. Kluwer, Amsterdam. 

Steedlnml M. hlformation structure and 
the syntax-phonology interface, unpublished 
ntanuscript. 

Petr Sgall, O. Pfeiffer, W. U. Dressier, and 
M. Pfieek. 1995. Experimental research on 
systemic ordering. Theoretical Linguistics. 
