Parse Forest Computation of Expected Governors
Helmut Schmid
Institute for Computational Linguistics
University of Stuttgart
Azenbergstr. 12
70174 Stuttgart, Germany
schmid@ims.uni-stuttgart.de
Mats Rooth
Department of Linguistics
Cornell University
Morrill Hall
Ithaca, NY 14853, USA
mats@cs.cornell.edu
Abstract
In a headed tree, each terminal word
can be uniquely labeled with a gov-
erning word and grammatical relation.
This labeling is a summary of a syn-
tactic analysis which eliminates detail,
reflects aspects of semantics, and for
some grammatical relations (such as
subject of finite verb) is nearly un-
controversial. We define a notion
of expected governor markup, which
sums vectors indexed by governors and
scaled by probabilistic tree weights.
The quantity is computed in a parse for-
est representation of the set of tree anal-
yses for a given sentence, using vector
sums and scaling by inside probability
and flow.
1 Introduction
A labeled headed tree is one in which each non-
terminal vertex has a distinguished head child,
and in the usual way non-terminal nodes are la-
beled with non-terminal symbols (syntactic cat-
egories such as NP) and terminal vertices are
labeled with terminal symbols (words such as
The governor algorithm was designed and implemented
in the Reading Comprehension research group in the 2000
Workshop on Language Engineering at Johns Hopkins Uni-
versity. Thanks to Marc Light, Ellen Riloff, Pranav Anand,
Brianne Brown, Eric Breck, Gideon Mann, and Mike Thelen
for discussion and assistance. Oral presentations were made
at that workshop in August 2000, and at the University of
Sussex in January 2001. Thanks to Fred Jelinek, John Car-
roll, and other members of the audiences for their comments.
Sa0a2a1a4a3a6a5
NPa7 a1a9a8a10a1a11a0
Peter
VPa0a2a1a4a3a6a5
Va0a12a1a4a3a6a5
reads
NPa13
a3
a13
a1a9a0
NPa13 a3 a13 a1a11a0
Da1a11a14a6a1a9a0a12a15
every
Na13 a3 a13 a1a9a0
paper
PP:ona16 a3a6a0a2a17a19a18 a13
P:ona20a4a21
on
NPa16 a3a6a0a2a17a19a18 a13
Na16
a3a6a0a2a17a19a18
a13
markup
Figure 1: A tree with percolated lexical heads.
reads).1 We work with syntactic trees in which
terminals are in addition labeled with uninflected
word forms (lemmas) derived from the lexicon.
By percolating lemmas up the chains of heads,
each node in a headed tree may be labeled with
a lexical head. Figure 1 is an example, where lex-
ical heads are written as subscripts. We use the
notation a22a24a23a26a25a28a27 for the lexical head of a vertex a25 ,
and a29 a23a26a25a28a27 for the ordinary category or word label
of a25 .
The governor label for a terminal vertex a25 in
such a labeled tree is a triple which represents
the syntactic and lexical environment at the top
of the chain of vertices headed by a25 . Where a30 is
the maximal vertex of which a25 is a head vertex,
and a30a32a31 is the parent of a30 , the governor label for a25
1Headed trees may be constructed as tree domains, which
are sets of addresses of vertices. 0 is used as the relative ad-
dress of the head vertex, negative integers are used as relative
addresses of child vertices before the head, and positive in-
tegers are used as relative addresses of child vertices after
the head. A headed tree domain is a set of finite sequences
of integers a33 such that (i) if a34a36a35a36a37a38a33 , then a34a39a37a38a33 ; (ii) if a34a36a40a41a37a38a33 and
a42a44a43a46a45a48a47
a40 or a40
a47a46a45a48a43a46a42 , then
a34
a45
a37a38a33 .
position word governor label
1 Peter a49 NP,S,reada50
2 reads a49 S,STARTC,startwa50
3 every a49 D,NP,papera50
4 paper a49 NP,VP,reada50
5 on a49 P:ON,PP:ON,markupa50
6 markup a49 NP,PP:ON,papera50
Figure 2: Governor labels for the terminals in the
tree of Figure 1. For the head of the sentence,
special symbols startc and startw are used as the
parent category and parent lexical governor.
is the tuple a49a38a29 a23 a30 a27a19a51 a29 a23 a30a32a31 a27a19a51a6a22a24a23 a30a52a31 a27 a50 .2 Governor labels
for the example tree are given in Figure 2.
As observed in Chomsky (1965), grammatical
relations such as subject and object may be re-
constructed as ordered pairs of category labels,
such as a49 NP,Sa50 for subject. So, a governor label
encodes a grammatical relation and a governing
lexical head.
Given a unique tree structure for a sentence,
governor markup may be read off the tree. How-
ever, in view of the fact that robust broad coverage
parsers frequently deliver thousands, millions, or
thousands of millions of analyses for sentences of
free text, basing annotation on a unique tree (such
as the most probable tree analysis generated by a
probabilistic grammar) appears arbitrary.
Note that different trees may produce the same
governor labels for a given terminal position.
Suppose for instance that the yield of the tree in
Figure 1 has a different tree analysis in which the
PP is a child of the VP, rather than NP. In this
case, just as in the original tree, the label for the
fourth terminal position (with word label paper)
is a49 NP,VP,reada50 . Supposing that there are only
two tree analyses, this label can be assigned to the
fourth word with certainty, in the face of syntac-
tic ambiguity. The algorithm we will define pools
governor labels in this way.
2 Expected Governors
Suppose that a probabilistic grammar licenses
headed tree analyses a53a55a54 a51a57a56a58a56a58a56a58a51 a53 a21 for a sentence a59 ,
and assigns them probabilistic weights a60a61a54 a51a57a56a58a56a58a56a58a51 a60 a21 .
2In a headed tree domain,
a62 is a head of a63 if a62 is of the
form a63 a42a65a64 for a40a67a66a69a68 .
that NP SC deprive .95 .99
would MD2 VFP deprive .98 1
deprive S STARTC startw 1 1
all DETPL2 NC student .83 .98
beginning NSG1 NPL1 student .75 .98
students NP VFP deprive .82 .98
NP VGP begin .16
of PP NP student .53
PP VFP deprive .38 .99
their DETPL2 NC lunch .92 .98
high ADJMOD NPL2 lunch .78 .23
ADJMOD NSG2 school .15 .76
school NCHAIN NPL1 lunch .16
NSG1 NPL1 lunch .76 .98
lunches NP PP of .91 .98
. PERC S deprive .88 .86
PERC X deprive .14
Figure 3: Expected governors in the sentence That would
deprive all beginning students of their high school lunches.
For a label a70 in column 2, column 3 gives a71a73a72a58a70a75a74 as com-
puted with a PCFG weighting of trees, and column 4 gives
a71a73a72a58a70a75a74 as computed with a head-lexicalized weighting of
trees. Values below 0.1 are omitted. According to the lexi-
calized model, the PP headed by of probably attaches to VFP
(finite verb phrase) rather than NP.
Let a76a77a54 a51a57a56a58a56a58a56a58a51 a76 a21 be the governor labels for word
position a78 determined by a53a55a54 a51a57a56a58a56a58a56a58a51 a53 a21 respectively.
We define a scheme which divides a count of 1
among the different governor labels.
For a given governor tuple a76 , let
a79 a23
a76
a27 defa80a82a81 a16a84a83a86a85a87a16 a60a52a88
a81 a54a2a89a90a88a26a89 a21
a60 a88
a56 (1)
The definition sums the probabilistic weights of
trees with markup a76 , and normalizes by the sum
of the probabilities of all tree analyses of a59 .
The definition may be justified as follows. We
work with a markup space a91 a80a93a92a95a94a96a92a97a94a96a98 ,
where a92 is the set of category labels and a98 is the
set of lemma labels. For a given markup triple a76 ,
let a99
a16a101a100 a91 a102a104a103a105
be the function which maps a76 to 1, and a76a106a31 to 0
for a76a107a31a109a108a80 a76 . We define a random variate
a110
a100 Trees a102a112a111a91 a102a104a103a105a114a113
which maps a tree a53 to
a99
a16 , where a76 is the gov-
ernor markup for word position a78 which is de-
termined by tree a53 . The random variate a110 is de-
fined on labeled trees licensed by the probabilistic
grammar. Note that a111a91 a102a82a103a105a114a113 is a vector space
(with pointwise sums and scalar products), so that
expectations and conditional expectations may be
defined. In these terms, a79 is the conditional ex-
pectation of a110 , conditioned on the yield being a59 .
This definition, instead of a single governor la-
bel for a given word position, gives us a set of
pairs of a markup a76 and a real number a79 a23 a76 a27 in
[0,1], such that the real numbers in the pairs sum
to 1. In our implementation (which is based on
Schmid (2000a)), we use a cutoff of 0.1, and print
only indices a76 where a79 a23 a76 a27 is above the cutoff.
Figure 3 is an example.
A direct implementation of the above definition
using an iteration over trees to compute a79 would
be unusable because in the robust grammar of En-
glish we work with, the number of tree analyses
for a sentence is frequently large, greater than a115a57a116a28a117
for about 1/10 of the sentences in the British Na-
tional Corpus. We instead calculate a79 in a parse
forest representation of a set of tree analyses.
3 Parse Forests
A parse forest (see also Billot and Lang (1989))
in labeled grammar notation is a tu-
ple a118
a80
a49a38a119a120a7
a51a19a121
a7
a51
a105a122a7
a51a19a123
a7
a51a125a124
a7a126a50 where
a49a38a119a67a7
a51a19a121
a7
a51
a105a122a7
a51a19a123
a7a126a50 is a context free gram-
mar (consisting of non-terminals a119a127a7 , terminals
a121
a7 , rules a105 a7 , and a start symbol
a123
a7 ) and
a124
a7 is a function which maps elements of a119a127a7
to non-terminals in an underlying grammar
a128 a80
a49a38a119
a51a19a121a114a51
a105
a51a19a123
a50 and elements of
a121
a7 to termi-
nals in a128 . By using a124 a7 on symbols on the left
hand and right hand sides of a parse forest rule,
a124
a7 can be extended to map the set of parse forest
rules a105a75a7 to the set of underlying grammar rules
a105 .
a124
a7 is also extended to map trees licensed
by the parse forest grammar to trees licensed by
the underlying grammar. An example is given in
figure 4.
Where a129a87a130a125a119a131a7a48a132 a121 a7a133a132a48a105a122a7 , let a103a134a7 a23 a129 a27 be the set of
trees licensed by a49a38a119a131a7 a51a19a121 a7 a51 a105a122a7 a51a19a123 a7a135a50 which have
root symbol a129 in the case of a symbol, and the set
of trees which have a129 as the rule expanding the
root in the case or a rule. a103 a23 a129 a27 is defined to be
the multiset image of a103 a7 a23 a129 a27 under a124 a7 . a103 a23 a129 a27 is
the multiset of inside trees represented by parse
Sa54a136a102 NPa54 VPa54
VPa54a125a102 Va54 NPa137
VPa54a125a102 VPa137 PPa54
NPa137 a102 NPa138 PPa54
NPa138 a102 Da54 Na54
PPa54a90a102 Pa54 NPa139
VPa137a55a102 Va54 NPa138
NPa139a140a102 Na137
NPa54a19a102 Peter
Va54a141a102 reads
Da54 a102 every
Na54a141a102 paper
Pa54a142a102 on
Na137a143a102 markup
Figure 4: Rule set a105a144a7 of a labeled grammar rep-
resenting two tree analyses of John reads every
paper on markup. The labeling function drops
subscripts, so that a124 a7 a23 VPa54 a27 a80 VP.
forest symbol or rule a129 .3 Let a92 a7 a23 a129 a27 be the set of
trees in a103a134a7 a23a145a123 a7 a27 which contain a129 as a symbol or
use a129 as a rule. a92 a23 a129 a27 is defined to be the multiset
image of a92 a7 a23 a129 a27 under a124 a7 . a92 a23 a129 a27 is the multiset
of complete trees represented by the parse forest
symbol or rule a129 .
Where a60 is a probability function on trees li-
censed by the underlying grammar and a129 is a sym-
bol or rule in a118 ,
a146 a23
a129
a27 defa80 a147
a8a10a148a38a149a55a150a152a151a140a153
a60
a23
a53
a27 (2)
a154a61a23
a129
a27 defa80 a81
a8a10a148a156a155a24a150a157a151a140a153 a60
a23
a53
a27
a81
a8a10a148a38a155a24a150a58a158a160a159a32a153 a60
a23
a53
a27
a56 (3)
a146 a23
a129
a27 is called the inside probability for
a129 and
a154a61a23
a129
a27
is called the flow for a129 .4
Parse forests are often constructed so that all
inside trees represented by a parse forest nonter-
minal a161a162a130a6a119a120a7 have the same span, as well as the
same parent category. To deal with headedness
and lexicalization of a probabilistic grammar, we
construct parse forests so that, in addition, all in-
side trees represented by a parse forest nontermi-
nal have the same lexical head. We add to the la-
beled grammar a function a163a46a7 which labels parse
forest symbols with lexical heads. In our imple-
mentation, an ordinary context free parse forest is
3We use multisets rather than set images to achieve cor-
rectness of the inside algorithm in cases where a164 represents
some tree more than once, something which is possible given
the definition of labeled grammars. A correct parser pro-
duces a parse forest which represents every parse for the in-
put sentence exactly once.
4These quantities can be given probabilistic interpreta-
tions and/or definitions, for instance with reference to con-
ditionally expected rule frequencies for flow.
PF-INSIDE(a118
a51a2a165 )
1 Initial. float array a146 a111a105 a7 a132a166a119 a7 a132 a121 a7 a113a90a167 0
2 for a25a46a168a106a121 a7
3 do a146 a111a25 a113a169a167a170a115
4 for a110 in a105a122a7 in bottom-up order
5 do a146 a111
a110
a113a171a167
a165a120a23a38a124
a7
a23
a110
a27a2a27a39a172
a14a134a173a160a0a2a174a140a175a125a150a157a0a125a153
a146
a111
a25
a113
6 a146 a111lhsa23 a110 a27 a113a171a167 a146 a111lhsa23 a110 a27 a113a28a176 a146 a111a110 a113
7 return a146
Figure 5: Inside algorithm.
first constructed by tabular parsing, and then in a
second pass parse forest symbols are split accord-
ing to headedness. Such an algorithm is shown
in appendix B. This procedure gives worst case
time and space complexity which is proportional
to the fifth power of the length of the sentence.
See Eisner and Satta (1999) for discussion and an
algorithm with time and space requirements pro-
portional to the fourth power of the length of the
input sentence in the worst case. In practical ex-
perience with broad-coverage context free gram-
mars of several languages, we have not observed
super-cubic average time or space requirements
for our implementation. We believe this is be-
cause, for our grammars and corpora, there is lim-
ited ambiguity in the position of the head within
a given category-span combination.
The governor algorithm stated in the next sec-
tion refers to headedness in parse forest rules.
This can be represented by constructing parse for-
est rules (as well as ordinary grammar rules) with
headed tree domains of depth one.5 Where a30 is
a parse forest symbol on the right hand side of a
parse forest rule a110 , we will simply state the con-
dition “a30 is the head of a110 ”.
The flow and governor algorithms stated be-
low call an algorithm PF-INSIDEa23 a118 a51a2a165a122a27 which
computes inside probabilities in a118 , where a165 is a
function giving probability parameters for the un-
derlying grammar. Any probability weighting of
trees may be used which allows inside probabil-
ities to be computed in parse forests. The inside
5See footnote 1. Constructed in this way, the first rule in
parse forest in Figure 4 has domain a177a6a37a12a178a12a179a135a68a65a178 a42a57a180 , and labeling
function a177a57a181a10a37a12a178 Sa182a9a183a11a178a12a181a38a179a135a68a65a178 NPa182a9a183a11a178a2a181a42 a178 VPa182a11a183 a180 . When parse forest
rules are mapped to underlying grammar rules, the domain is
preserved, so that a184
a159 applied to the parse forest rule just de-
scribed is the tree with domain a177a125a37a2a178a4a179a135a68a65a178 a42a57a180 and label function
a177a57a181a10a37a2a178 Sa183a11a178a2a181a38a179a135a68a19a178 NPa183a11a178a2a181
a42
a178 VPa183
a180 .
a37 is the empty string.
PF-FLOW(a118
a51 a146 )
1 Initial. float array a154 a111a105 a7 a132a166a119 a7 a132 a121 a7 a113a90a167 0
2 a154 a111a123 a7a24a113a171a167a170a115
3 for a110 in a105a122a7 in top-down order
4 do a154 a111a110 a113a171a167 a88a38a185a0a4a186
a88a38a185a187
a174a140a175a125a150a157a0a125a153a58a186
a154
a111lhs
a23
a110
a27
a113
5 for a25 in rhsa23 a110 a27
6 do a154 a111a25 a113a169a167 a154 a111a25 a113a28a176 a154 a111a110 a113
7 return a154
Figure 6: Flow algorithm.
algorithm for ordinary PCFGs is given in figure
5. The parameter a165 maps the set of underlying
grammar rules a105 which is the image of a124 a7 on
a124a44a188 to reals, with the interpretation of rule proba-
bilities. In step 5, a124 a7 maps the parse forest rule
a110 to a grammar rule
a124
a7
a23
a110
a27 which is the argument
of a165 . The functions lhs and rhs map rules to their
left hand and right hand sides, respectively.
Given an inside algorithm, the flow a154 may be
computed by the flow algorithm in Figure 6, or
by the inside-outside algorithm.
4 Governors Algorithm
The governor algorithm annotates parse forest
symbols and rules with functions from governor
labels to real numbers. Let a53 be a tree in the parse
forest grammar, let a25 be a symbol in a53 , let a30 be the
maximal symbol in a53 of which a25 is a head, or a25
itself if a25 is a non-head child of its parent in a53 , and
let a30a52a31 be the parent of a30 in a53 . Recall that
a99a19a189
a159a87a150a152a18a160a153a38a190
a189
a159a87a150a157a18a140a191a192a153a156a190a193a24a159a87a150a152a18a134a191a58a153 (4)
is a vector mapping the markup triple
a49
a124
a7
a23
a30
a27a19a51a125a124
a7
a23
a30 a31
a27a19a51
a163a127a7
a23
a30 a31
a27
a50 to 1 and other markups
to 0. We have constructed parse forests such
that a49 a124 a7 a23 a30 a27a19a51a125a124 a7 a23 a30a52a31 a27a19a51 a163 a7 a23 a30a32a31a27 a50 agrees with the
governor label for the lexical head of the node
corresponding to a25 in a124 a7 a23 a53 a27 .
A parse forest tree a53 and symbol a25 in a53 thus de-
termine the vector (4), where a30 and a30 a31 are defined
as above. Call the vector determined in this waya99
a23
a53
a51a2a25a28a27 . Where a25 is parse forest symbol in
a118 and
a110 is a parse forest rule in
a118 , let
a194
a23a26a25a28a27 defa80 a81
a8a10a148a38a155a39a159a171a150a157a14a19a153 a60
a23a38a124
a7
a23
a53
a27a2a27
a99
a23
a53
a51a2a25a39a27
a81
a8a10a148a156a155a39a159a171a150a192a158a195a159a90a153
a60
a23a38a124
a7
a23
a53
a27a2a27 (5)
PF-GOVERNORS(a118
a51a2a165 )
1 a146 a167 PF-INSIDEa23 a118 a51a2a165a75a27
2 a154 a167 PF-FLOWa23 a118 a51 a146 a27
3 Initialize array a194 a111a105a144a7a196a132a122a119a120a7a144a132 a121 a7a24a113 to empty maps from governor labels to float
4 a194 a111a123 a7a24a113a171a167
a99
a155 a159 a150a192a158 a159 a153a38a190startca190startw
5 for a110 in a105a122a7 in top-down order
6 do a194 a111a110 a113a87a167 a88a38a185a0a4a186
a88a38a185a187
a174a140a175a125a150a157a0a125a153a58a186
a194
a111lhs
a23
a110
a27
a113
7 for a30 in rhsa23 a110 a27
8 do if a30 is the head of a110
9 then a194 a111a30a52a113a171a167 a194 a111a30a90a113a28a176 a194 a111a110 a113
10 else a194 a111a30a90a113a169a167 a194 a111a30a52a113a28a176 a154 a111a110 a113
a99
a155a39a159a171a150a152a18a160a153a38a190a155a39a159a171a150
a187
a174a160a175a2a150a152a0a2a153a86a153a38a190a193a109a159a171a150
a187
a174a140a175a125a150a157a0a125a153a197a153
11 return a194
Figure 7: Parse forest computation of governor vector.
a194
a23
a110
a27 defa80 a81
a8a10a148a156a155a39a159a87a150a152a0a2a153 a60
a23a38a124
a7
a23
a53
a27a2a27
a99
a23
a53
a51 lhsa23
a110
a27a2a27
a81
a8a10a148a156a155a39a159a169a150a192a158a195a159a90a153 a60
a23a38a124
a7
a23
a53
a27a2a27
a56 (6)
Assuming that a118 a80 a49a38a119 a7 a51a19a121 a7 a51 a105 a7 a51a19a123 a7 a51a125a124 a7 a50 is
a parse forest representing each tree analysis for
a sentence exactly once, the quantity a79 for termi-
nal position a78 (as defined in section 1) is found
by summing a194 a23a26a25a39a27 for terminal symbols a25 in a121 a7
which have string position a78 .6
The algorithm PF-GOVERNORS is stated in Fig-
ure 3. Working top down, if fills in an array
a194
a111a152a198a113 which is supposed to agree with the quan-
tity a194 a23 a198a27 defined above. Scaled governor vectors
are created for non-head children in step 10, and
summed down the chain of heads in step 9. In
step 6, vectors are divided in proportion to inside
probabilities (just as in the flow algorithm), be-
cause the set of complete trees for the left hand
side of a110 are partitioned among the parse forest
rules which expand the left hand side of a110 .
Consider a parse forest rule a110 , and a parse for-
est symbol a30 on its right hand side which is not
the head of a110 . In each tree in a92 a7 a23 a110 a27 , a30 is the top
of a chain of heads, because a30 is a non-head child
in rule a110 . In step 10, the governor tuple describing
the syntactic environment of a30 in trees in a92 a7 a23 a110 a27
(or rather, their images under a124 a7 ) is constructed
6This procedure requires that symbols in
a199
a159 correspond
to a unique string position, something which is not enforced
by our definition of parse forests. Indeed, such cases may
arise if parse forest symbols are constructed as pairs of gram-
mar symbols and strings (Tendeau, 1998) rather than pairs
of grammar symbols and spans. Our parser constructs parse
forests organized according to span.
as
a99
a155 a159 a150a157a18a160a153a156a190a155 a159 a150
a187
a174a140a175a125a150a157a0a125a153a197a153a156a190a193 a159 a150
a187
a174a160a175a2a150a152a0a2a153a86a153 . The scalar multi-
plier a154 a111a110 a113 is
a81
a8a10a148a38a155a39a159a169a150a157a0a125a153 a60
a23
a53
a27
a81
a8a10a148a38a155a39a159a169a150a192a158a195a159a90a153 a60
a23
a53
a27
a51
the relative weight of trees in a92 a7 a23 a110 a27 . This is ap-
propriate because a194 a23 a30 a27 as defined in equation (5)
is to be scaled by the relative weight of trees in
a92
a7
a23
a30
a27 .
In line 9 of the algorithm, a194 is summed into the
head child a30 . There is no scaling, because every
tree in a92 a7 a23 a110 a27 is a tree in a92 a7 a23 a30 a27 .
A probability parameter vector a165 is used in the
inside algorithm. In our implementation, we can
use either a probabilistic context free grammar, or
a lexicalized context free grammar which condi-
tions rules on parent category and parent lexical
head, and conditions the heads of non-head chil-
dren on child category, parent category, and par-
ent head (Eisner, 1997; Charniak, 1995; Carroll
and Rooth, 1998). The requisite information is di-
rectly represented in our parse forests by a92 a7 and
a163a127a7 . Thus the call to PF-INSIDE in line 1 of PF-
GOVERNORS may involve either a computation
of PCFG inside probabilities, or head-lexicalized
inside probabilities. However, in both cases the
algorithm requires that the parse forest symbols
be split according to heads, because of the ref-
erence to a163 a7 in line 10. Construction of head-
marked parse forests is presented in the appendix.
The LoPar parser (Schmid, 2000a) on which
our implementation of the governor algorithm is
based represents the parse forest as a graph with
at most binary branching structure. Nodes with
more than two daughter nodes in a conventional
parse forest are replaced with a right-branching
tree structure and common sub-trees are shared
between different analyses. The worst-case space
complexity of this representation is cubic (cmp.
Billot and Lang (1989)).
LoPar already provided functions for the com-
putation of the head-marked parse forest, for the
flow computation and for traversing the parse for-
est in depth-first and topologically-sorted order
(see Cormen et al. (1994)). So it was only neces-
sary to add functions for data initialization, for the
computation of the governor vector at each node
and for printing the result.
5 Pooling of grammatical relations
The governor labels defined above are derived
from the specific symbols of a context free gram-
mar. In contrast, according to the general markup
methodology of current computational linguis-
tics, labels should not be tied to a specific gram-
mar and formalism. The same markup labels
should be produced by different systems, making
it possible to substitute one system for another,
and to compare systems using objective tests.
Carroll et al. (1998) and Carroll et al. (1999)
propose a system of grammatical relation markup
to which we would like to assimilate our proposal.
As grammatical relation symbols, they use atomic
labels such as dobj (direct object) an ncsubj (non-
clausal subject). The labels are arranged in a hier-
archy, with for instance subj having subtypes nc-
subj, xsubj, and csubj.
There is another problem with the labels we
have used so far. Our grammar codes a variety
of features, such as the feature VFORM on verb
projections. As a result, instead of a single object
grammatical relation a49 NP,VPa50 , we have grammati-
cal relations a49 NP,VP.Na50 , a49 NP,VP.FINa50 , a49 NP,VP.TOa50 ,
a49 NP,VP.BASEa50 , and so forth. This may result in
frequency mass being split among different but
similar labels. For instance, a verb phrase will
have read every paper might have some analy-
ses in which read is the head of a base form
VP and paper is the head of the object of read,
and others where read is a head of a finite form
VP, and paper is the head of the object of read.
In this case, frequencies would be split between
a49 NP,VP.BASE,reada50 and a49 NP,VP.FIN,reada50 as gov-
ernor labels for paper.
To address these problems, we employ a pool-
ing function a128 a105 which maps pairs of categories
to symbols such as ncsubj or obj. The gover-
nor tuple a49a38a29 a23 a30 a27a19a51 a29 a23 a30a32a31 a27a19a51a6a22a24a23 a30a52a31 a27 a50 is then replaced by
a49
a128
a105
a23
a29
a23
a30
a27a19a51
a29
a23
a30 a31
a27a2a27a19a51a6a22a162a23
a30
a27
a50 in the definition of the
governor label for a terminal vertex a25 . Line 10
of PF-GOVERNORS is changed to
a194
a111a30a90a113a171a167
a194
a111a30a90a113a26a176
a154
a111
a110
a113
a99a55a200
a188
a150a58a155a39a159a169a150a152a18a160a153a38a190a155a39a159a169a150
a187
a174a140a175a2a150a152a0a125a153a197a153a197a153a156a190a193a24a159a171a150
a187
a174a160a175a2a150a152a0a2a153a86a153
a56
More flexibility could be gained by using a rule
and the address of a constituent on the right hand
side as arguments of a128 a105 . This would allow the
following assignments.
a128
a105
a23 VP.FIN
a102 VC.FIN’ NP NPa51 a115
a27 a80 dobj
a128
a105
a23 VP.FIN
a102 VC.FIN’ NP NP
a51a6a201a202a27 a80 obj2
a128
a105
a23 VP.FIN
a102 VC.FIN’ VP.TO
a51
a115
a27 a80 xcomp
a128
a105
a23 VP.FIN
a102 VP.FIN’ VP.TO
a51
a115
a27 a80 xmod
The head of a rule is marked with a prime. In the
first pair, the objects in double object construction
are distinguished using the address. In each case,
the child-parent category pair is a49 NP,VP.FINa50 , so
that the original proposal could not distinguish the
grammatical relations. In the second pair, a VP.TO
argument is distinguished from a VP.TO modifier
using the category of the head. In each case, the
child-parent category pair is a49 VP.TO,VP.FINa50 . No-
tice that in Line 10 of PF-GOVERNORS, the rule
a110 is available, so that the arguments of
a128
a105 could
be changed in this way.
6 Discussion
The governor algorithm was designed as a com-
ponent of Spot, a free-text question answering
system. Current systems usually extract a set
of candidate answers (e.g. sentences), score
them and return the n highest-scoring candidates
as possible answers. The system described in
Harabagiu et al. (2000) scores possible answers
based on the overlap in the semantic represen-
tations of the question and the answer candi-
dates. Their semantic representation is basically
identical to the head-head relations computed by
the governor algorithm. However, Harabagiu
et al. extract this information only from maxi-
mal probability parses whereas the governor al-
gorithm considers all analyses of a sentence and
returns all possible relations weighted with esti-
mated frequencies. Our application in Spot works
as follows: the question is parsed with a spe-
cialized question grammar, and features including
the governor of the trace are extracted from the
question. Governors are among the features used
for ranking sentences, and answer terms within
sentences. In collaboration with Pranav Anand
and Eric Breck, we have incorporated governor
markup in the question answering prototype, but
not debugged or evaluated it.
Expected governor markup summarizes syn-
tactic structure in a weighted parse forest which
is the product of exhaustive parsing and inside-
outside computation. This is a strategy of
dumbing down the product of computation-
ally intensive statistical parsing into unstructured
markup. Estimated frequency computations in
parse forests have previously been applied to
tagging and chunking (Schulte im Walde and
Schmid, 2000). Governor markup differs in that
it is reflective of higher-level syntax. The strat-
egy has the advantage, in our view, that it allows
one to base markup algorithms on relatively so-
phisticated grammars, and to take advantage of
the lexically sensitive probabilistic weighting of
trees which is provided by a lexicalized probabil-
ity model.
Localizing markup on the governed word in-
creases pooling of frequencies, because the span
of the phrase headed by the governed item is
ignored. This idea could be exploited in other
markup tasks. In a chunking task, categories and
heads of chunks could be identified, rather than
categories and boundaries.
A Relation Between Flow and
Inside-Outside Algorithm
The inside-outside algorithm computes inside
probabilities a146 a111a25 a113 and outside probabilities a203a39a111a25 a113 .
We will show that these quantities are related
to the flow a154a61a23a26a25a28a27 by the equation a154 a111a25 a113 a80
a203a39a111
a25
a113
a146
a111
a25
a113a26a204
a146
a111
a123
a7a61a113 .
a146
a111
a123
a7a24a113 is the inside probability of
the root symbol, which is also the sum of the
probabilities of all parse trees.
According to Charniak (1993), the outside
probabilities in a parse forest are computed by:
a203a39a111
a25
a113
a80 a147
a0a65a205a86a14a57a173a195a0a2a174a140a175a2a150a152a0a125a153
a203a39a111lhs
a23
a110
a27
a113
a146
a111
a110
a113
a146
a111
a25
a113
The outside probability of the start symbol is 1.
We prove by induction over the depth of the parse
forest that the following relationship holds:
a154
a111
a25
a113
a80
a203a28a111
a25
a113
a146
a111
a25
a113
a146
a111
a123
a7a162a113
It is easy to see that the assumption holds for
the root symbol a123 a7 :
a154
a111
a123
a7a24a113
a80
a115
a80
a203a39a111
a123
a7a24a113
a146
a111
a123
a7 a113
a146
a111
a123
a7a162a113
The flow in a parse forest is computed by:
a154
a111
a25
a113
a80 a147
a0a65a205a197a14a134a173a160a0a2a174a140a175a125a150a157a0a125a153
a154
a111lhs
a23
a110
a27
a113
a146
a111
a110
a113
a146
a111lhs
a23
a110
a27
a113
Now, we insert the induction hypothesis:
a154
a111
a25
a113
a80 a147
a0a65a205a197a14a57a173a160a0a2a174a140a175a125a150a157a0a125a153
a203a39a111lhs
a23
a110
a27
a113
a146
a111lhs
a23
a110
a27
a113
a146
a111
a110
a113
a146
a111
a123
a7a24a113
a146
a111lhs
a23
a110
a27
a113
After a few transformations, we get the equation
a154
a111
a25
a113
a80
a146
a111
a25
a113
a146
a111
a123
a7a24a113
a147
a0a65a205a197a14a57a173a195a0a2a174a140a175a2a150a152a0a125a153
a203a39a111lhs
a23
a110
a27
a113
a146
a111
a110
a113
a146
a111
a25
a113
which is equivalent to
a154
a111
a25
a113
a80
a203a39a111
a25
a113
a146
a111
a25
a113
a146
a111
a123
a7a109a113
according to the definition of a203a39a111a25 a113 . So, the in-
duction hypothesis is generally true.
B Parse Forest Lexicalization
The function LEXICALIZE below takes an unlex-
icalized parse forest as argument and returns a
lexicalized parse forests, where each symbol is
uniquely labeled with a lexical head. Symbols are
split if they have more than one lexical head.
LEXICALIZE(a118 )
1 initialize a118 a31 as an empty parse forest
2 initialize array a206a12a111a119a207a7a69a132 a121 a7a162a113a171a167a209a208
3 for a25 in a121 a7
4 do a25 a31a52a167 NEWTa23a26a165 a203
a110a211a210
a23a26a25a39a27a2a27
5 a206a2a111a25 a113a169a167a213a212 a25 a31a26a214
6 for a110 in a105a122a7 in bottom-up order
7 do assume rhsa23 a110 a27 a80 a212 a25 a54 a51a2a25 a137 a51a57a56a57a56a57a56a140a51a2a25 a21 a214
8 assume a25 a88 is the head of a110
9 for a25 a31
a54
a25
a31
a137
a56a57a56a57a56a125a25
a31
a21
a168
a206a2a111
a25
a54a12a113
a94 a56a57a56a57a56 a94
a206a2a111
a25
a21 a113
10 do if a25 a88 a168a215a121 a7
11 then a22 a167 LEMa23a38a124 a7
a23
a110
a27a2a27
12 else a22 a167a104a163 a7 a111a25 a31
a88
a113
13 a210 a31 a167a112a49 a25 a31
a54
a56a57a56a57a56a125a25
a31
a21
a50
14 a25 a16 a167 lhsa23 a110 a27
15 r’ a167 ADD(a118 a31 a51 a110 a51a6a22a169a51 a210 a31 )
16 a206a2a111a25 a16 a113a171a167a209a206a12a111a25 a16 a113a36a132a106a212 lhsa23 a110 a31a27 a214
17 return a118a144a31
LEXICALIZE creates new terminal symbols by
calling the function NEWT. The new symbols are
linked to the original ones by means of a206a12a111a152a198a113 . For
each rule in the old parse forest, the set of all
possible combinations of the lexicalized daugh-
ter symbols is generated. The function LEMa23 a110 a27
returns the lemma associated with lexical rule a110 .
ADD(a118 a51 a110 a51a6a22a216a51 a210 )
1 if a217 a25a218a168 a206a2a111lhsa23 a110 a27 a113 s.t. a163a218a7a219a111a25 a113 a80 a22
2 then a25 a31 a167 a25
3 else a25 a31a52a167 NEWNTa23 a118
a27
4 a124 a7 a23a26a25 a31 a27 a167 a124 a7 a23 lhsa23 a110 a27a2a27
5 a163a127a7a219a111
a25
a31a152a113a171a167
a22
6 a110 a31a52a167 NEWRULEa23 a118 a51a2a25 a31 a51 a210 a27
7 a124 a7 a23 a110 a31a27 a167 a124 a7 a23 a110 a27
8 return a110 a31
For each combination of lexicalized daughter
symbols, a new rule is inserted by calling ADD.
ADD calls NEWNT to create new non-terminals
and NEWRULE to generate new rules. A non-
terminal is only created if no symbol with the
same lexical head was linked to the original node.

References

Sylvie Billot and Bernard Lang. 1989. The structure
of shared forests in ambiguous parsing. In Proceedings of the 27th Annual Meeting of the ACL, University of British Columbia, Vancouver, B.C., Canada.

Glenn Carroll and Mats Rooth. 1998. Valence induction with a head-lexicalized PCFG. In Proceedings
of Third Conference on Empirical Methods in Natural Language Processing, Granada, Spain.

John Carroll, Antonio Sanfilippo, and Ted Briscoe.
1998. Parser evaluation: a survey and a new proposal. In Proceedings of the International Conference of Language Resources and Evaluation, pages
447-454, Granada, Spain.

John Carroll, Guido Minnen, and Ted Briscoe. 1999.
Corpus annotation for parser evaluation. In Proceedings of the EACL 99 workshop on Linguistically Interpreted Corpora (LINC), Bergen, Norway,
June.

Eugene Charniak. 1993. Statistical Language Learn-
ing. The MIT Press, Cambridge, Massachusetts.

Eugene Charniak. 1995. Parsing with context-
free grammars and word statistics. Technical Re-
port CS-95-28, Department of Computer Science,
Brown University.

Noam Chomsky. 1965. Aspects of the Theory of Syn-
tax. M.I.T. Press, Cambridge, MA.

Thomas H. Cormen, Charles E. Leiserson, and
Ronald L. Rivest. 1994. Introduction to Algo-
rithms. The MIT Press, Cambridge, Massachusetts.

Jason Eisner and Giorgio Satta. 1999. Efficient parsing for bilexical context-free grammars and head
automaton grammars. In Proceedings of the 37th
Annual Meeting of the Association for Computational Linguistics (ACL '99), College Park, MD.

Jason Eisner. 1997. Bilexical grammars and a cubic-time probabilistic parser. In Proceedings of the 4th
international Workshop on Parsing Technologies,
Cambridge, MA.

S. Harabagiu, D. Moldovan, M. Pasca, R. Mihalcea,
M. Surdeanu, R. Bunescu, R. Gurju, V. Rus, and
P. Morarescu. 2000. Falcon: Boosting knowledge
for answer engines. In Proceedings of the Ninth
Text REtrieval Conference (TREC 9), Gaithersburg,
MD, USA, November.

Helmut Schmid. 2000a. LoPar: Design and Imple-
mentation. Number 149 in Arbeitspapiere des Son-
derforschungsbereiches 340. Institute for Computa-
tional Linguistics, University of Stuttgart.

Helmut Schmid. 2000b. Lopar man pages. Insti-
tute for Computational Linguistics, University of
Stuttgart.

Sabine Schulte im Walde and Helmut Schmid. 2000.
Robust german noun chunking with a probabilistic
context-free grammar. In Proceedings of the 18th
International Conference on Computational Linguistics, pages 726-732, Saarbrucken, Germany,
August.

Frederic Tendeau. 1998. Computing abstract decorations of parse forests using dynamic programming
and algebraic power series. Theoretical Computer
Science, (199):145-166.
