Optimizing Typed Feature Structure Grammar Parsing through
Non-Statistical Indexing
Cosmin Munteanu and Gerald Penn
University of Toronto
10 King’s College Rd.
Toronto M5S 3G4
Canada
a0 mcosmin,gpenn
a1 @cs.toronto.edu
Abstract
This paper introduces an indexing method based on
static analysis of grammar rules and type signatures
for typed feature structure grammars (TFSGs). The
static analysis tries to predict at compile-time which
feature paths will cause unification failure during
parsing at run-time. To support the static analysis,
we introduce a new classification of the instances
of variables used in TFSGs, based on what type of
structure sharing they create. The indexing actions
that can be performed during parsing are also enu-
merated. Non-statistical indexing has the advan-
tage of not requiring training, and, as the evalua-
tion using large-scale HPSGs demonstrates, the im-
provements are comparable with those of statistical
optimizations. Such statistical optimizations rely
on data collected during training, and their perfor-
mance does not always compensate for the training
costs.
1 Introduction
Developing efficient all-paths parsers has been a
long-standing goal of research in computational lin-
guistics. One particular class still in need of pars-
ing time improvements is that of TFSGs. While
simpler formalisms such as context-free grammars
(CFGs) also face slow all-paths parsing times when
the size of the grammar increases significantly, TF-
SGs (which generally have fewer rules than large-
scale CFGs) become slow as a result of the com-
plex structures used to describe the grammatical cat-
egories. In HPSGs (Pollard and Sag, 1994), one cat-
egory description could contain hundreds of feature
values. This has been a barrier in transferring CFG-
successful techniques to TFSG parsing.
For TFSG chart parsers, one of the most time-
consuming operations is the retrieval of categories
from the chart during rule completion (closing of
constituents in the chart under a grammar rule).
Looking in the chart for a matching edge for a
daughter is accomplished by attempting unifications
with edges stored in the chart, resulting in many
failed unifications. The large and complex structure
of TFS descriptions (Carpenter, 1992) leads to slow
unification times, affecting the parsing times. Thus,
failing unifications must be avoided during retrieval
from the chart.
To our knowledge, there have been only four
methods proposed for improving the retrieval com-
ponent of TFSG parsing. One (Penn and Munteanu,
2003) addresses only the cost of copying large cate-
gories, and was found to reduce parsing times by an
average of 25% on a large-scale TFSG (MERGE).
The second, a statistical method known as quick-
check (Malouf et al., 2000), determines the paths
that are likely to cause unification failure by pro-
filing a large sequence of parses over representa-
tive input, and then filters unifications at run-time
by first testing these paths for type consistency.
This was measured as providing up to a 50% im-
provement in parse times on the English Resource
Grammar (Flickinger, 1999, ERG). The third (Penn,
1999b) is a similar but more conservative approach
that uses the profile to re-order sister feature values
in the internal data structure. This was found to im-
prove parse times on the ALE HPSG by up to 33%.
The problem with these statistical methods is that
the improvements in parsing times may not jus-
tify the time spent on profiling, particularly during
grammar development. The static analysis method
introduced here does not use profiling, although it
does not preclude it either. Indeed, an evaluation of
statistical methods would be more relevant if mea-
sured on top of an adequate extent of non-statistical
optimizations. Although quick-check is thought to
produce parsing time improvements, its evaluation
used a parser with only a superficial static analysis
of chart indexing.
That analysis, rule filtering (Kiefer et al., 1999),
reduces parse times by filtering out mother-daughter
unifications that can be determined to fail at
compile-time. True indexing organizes the data
(in this case, chart edges) to avoid unnecessary re-
trievals altogether, does not require the operations
that it performs to be repeated once full unification
is deemed necessary, and offers the support for eas-
ily adding information extracted from further static
analysis of the grammar rules, while maintaining
the same indexing strategy. Flexibility is one of the
reasons for the successful employment of indexing
in databases (Elmasri and Navathe, 2000) and auto-
mated reasoning (Ramakrishnan et al., 2001).
In this paper, we present a general scheme for in-
dexing TFS categories during parsing (Section 3).
We then present a specific method for statically an-
alyzing TFSGs based on the type signature and the
structure of category descriptions in the grammar
rules, and prove its soundness and completeness
(Section 4.2.1). We describe a specific indexing
strategy based on this analysis (Section 4), and eval-
uate it on two large-scale TFSGs (Section 5). The
result is a purely non-statistical method that is com-
petitive with the improvements gained by statistical
optimizations, and is still compatible with further
statistical improvements.
2 TFSG Terminology
TFSs are used as formal representatives of rich
grammatical categories. In this paper, the formal-
ism from (Carpenter, 1992) will be used. A TFSG
is defined relative to a fixed set of types and set of
features, along with constraints, called appropriate-
ness conditions. These are collectively known as
the type signature (Figure 3). For each type, ap-
propriateness specifies all and only the features that
must have values defined in TFSs of that type. It
also specifies the types of the values that those fea-
tures can take. The set of types is partially ordered,
and has a unique most general type (a0 – “bottom”).
This order is called subsumption (a1 ): more specific
(higher) types inherit appropriate features from their
more general (lower) supertypes. Two types t1 and
t2 unify (t1
a2
t2 a3 ) iff they have a least upper bound
in the hierarchy. Besides a type signature, TFSGs
contain a set of grammar (phrase) rules and lexical
descriptions. A simple example of a lexical descrip-
tion is: johna4a6a5 SYNSEM : a7 SYN: npa8 SEM : ja9 , while
an example of a phrase rule is given in Figure 1.
a7 SYN : s a8 SEM : a7 V PSema10 AGENT : NPSema9a11a9a11a4a12a5
a7 SYN : np a8 AGR : Agr a8 SEM : NPSema9 ,
a7 SYN : vp a8 AGR : Agr a8 SEM : VPSema9 .
Figure 1: A phrase rule stating that the syntactic category
s can be combined from np and vp if their values for
agr are the same. The semantics of s is that of the verb
phrase, while the semantics of the noun phrase serves as
agent.
2.1 Typed Feature Structures
A TFS (Figure 2) is like a recursively defined record
in a programming language: it has a type and fea-
tures with values that can be TFSs, all obeying
the appropriateness conditions of the type signature.
TFSs can also be seen as rooted graphs, where arcs
correspond to features and nodes to substructures. A
node typing function θa7 qa9 associates a type to every
node q in a TFS. Every TFS F has a unique starting
or root node, qF . For a given TFS, the feature value
partial function δa7 f a10 qa9 specifies the node reachable
from q by feature f when one exists. The path value
partial function δa7 pia10 qa9 specifies the node reachable
from q by following a path of features pi when one
exists. TFSs can be unified as well. The result repre-
sents the most general consistent combination of the
information from two TFSs. That information in-
cludes typing (by unifying the types), feature values
(by recursive unification), and structure sharing (by
an equivalence closure taken over the nodes of the
arguments). For large TFSs, unification is compu-
tationally expensive, since all the nodes of the two
TFSs are visited. In this process, many nodes are
collapsed into equivalence classes because of struc-
ture sharing. A node x in a TFS F with root qF and
a node xa13 in a TFS Fa13 with root qF
a14
are equivalent
(a15a17a16 ) with respect to F
a2
Fa13 iff x a4 qF and x a4 qF
a14
,
or if there is a path pi such that δF
a18 Fa14
a7 pia10 qF a9a19a4 x and
δF
a18 Fa14
a7 pia10 qF
a14
a9a19a4 xa13 .
NUMBER:
PERSON:
GENDER: masculine
third
[1]singular
NUMBER:
PERSON:
GENDER:
third
neuter
[1]
throwing
THROWER: index
THROWN: index
Figure 2: A TFS. Features are written in uppercase,
while types are written with bold-face lowercase. Struc-
ture sharing is indicated by numerical tags, such as [1].
THROWER:
THROWN:
index
index
masculine  feminine  neuter  singular   plural first  second  third
numgend pers
PERSON:
GENDER:
NUMBER:
pers
num
gend
throwing index
Figure 3: A type signature. For each type, appropriate-
ness declares the features that must be defined on TFSs
of that type, along with the type restrictions applying to
their values.
2.2 Structure Sharing in Descriptions
TFSGs are typically specified using descriptions,
which logically denote sets of TFSs. Descriptions
can be more terse because they can assume all of
the information about their TFSs that can be in-
ferred from appropriateness. Each non-disjunctive
description can be associated with a unique most
general feature structure in its denotation called a
most general satisfier (MGSat). While a formal
presentation can be found in (Carpenter, 1992), we
limit ourselves to an intuitive example: the TFS
from Figure 2 is the MGSat of the description:
throwing a8 THROWER : a7 PERSON : third a8 NUMBER :
a7 singular a8 Nra9 a10 GENDER : masculinea9 a8 THROWN :
a7 PERSON : third a8 NUMBER : Nra10 GENDER : neutera9 .
Descriptions can also contain variables, such as Nr.
Structure sharing is enforced in descriptions
through the use of variables. In TFSGs, the scope
of a variable extends beyond a single description, re-
sulting in structure sharing between different TFSs.
In phrase structure rules (Figure 1), this sharing
can occur between different daughter categories in
a rule, or between a mother and a daughter. Unless
the term description is explicitly used, we will use
“mother” and “daughter” to refer to the MGSat of a
mother or daughter description.
We can classify instances of variables based on
what type of structure sharing they create. Inter-
nal variables are the variables that represent inter-
nal structure sharing (such as in Figure 2). The oc-
currences of such variables are limited to a single
category in a phrase structure rule. External vari-
ables are the variables used to share structure be-
tween categories. If a variable is used for struc-
ture sharing both inside a category and across cat-
egories, then it is also considered an external vari-
able. For a specific category, two kinds of external
variable instances can be distinguished, depending
on their occurrence relative to the parsing control
strategy: active external variables and inactive ex-
ternal variables. Active external variables are in-
stances of external variables that are shared between
the description of a category D and one or more de-
scriptions of categories in the same rule as D vis-
ited by the parser before D as the rule is extended
(completed). Inactive external variables are the ex-
ternal variable instances that are not active. For ex-
ample, in bottom-up left-to-right parsing, all of a
mother’s external variable instances would be active
because, being external, they also occur in one of
the daughter descriptions. Similarly, all of the left-
most daughter’s external variable instances would
be inactive because this is the first description used
by the parser. In Figure 1, Agr is an active external
variable in the second daughter, but it is inactive in
the first daughter.
The active external variable instances are im-
portant for path indexing (Section 4.2), because
they represent the points at which the parser must
copy structure between TFSs. They are therefore
substructures that must be provided to a rule by
the parsing chart if these unifications could poten-
tially fail. They also represent shared nodes in the
MGSats of a rule’s category descriptions. In our
definitions, we assume without loss of generality
that parsing proceeds bottom-up, with left-to-right
of rule daughters. This is the ALE system’s (Car-
penter and Penn, 1996) parsing strategy.
Definition 1. If D1a10 a1a2a1a2a1 a10 Dn are daughter de-
scriptions in a rule and the rules are extended
from left to right, then Ext a7 MGSata7 Dia9a11a9 is the
set of nodes shared between MGSat a7 Dia9 and
MGSat a7 D1a9
a1a2a1a2a1 MGSat
a7 Dia3 1a9 . For a mother de-
scription M, Ext a7 MGSat a7 Ma9a11a9 is the set of nodes
shared with any daughter in the same rule.
Because the completion of TFSG rules can cause
the categories to change in structure (due to exter-
nal variable sharing), we need some extra notation
to refer to a phrase structure rule’s categories at dif-
ferent times during a single application of that rule.
By
a4
M we symbolize the mother M after M’s rule is
completed (all of the rule’s daughters are matched
with edges in the chart).
a4
D symbolizes the daugh-
ter D after all daughters to D’s left in D’s rule were
unified with edges from the chart. An important re-
lation exists between M and
a4
M: if qM is M’s root and
a5q
M is
a4
M’s root, then a6 x a7 Ma10
a6 a4x
a7
a4
M such that a8 pi for
which δa7 pia10 qMa9 a4 x and δa7 pia10 a5qMa9 a4 a4x, θa7 xa9a10a9 θa7 a4xa9 .
In other words, extending the rule extends the in-
formation states of its categories monotonically. A
similar relation exists between D and
a4
D. The set of
all nodes x in M such that a8 pi for which δa7 pia10 qMa9 a4 x
and δa7 pia10 a5qMa9 a4 a4x will be denoted by a11a4xa12 a3 1 (and like-
wise for nodes in D). There may be more than one
node in a11a4xa12 a3 1 because of unifications that occur dur-
ing the extension of M to
a4
M.
3 The Indexing Timeline
Indexing can be applied at several moments dur-
ing parsing. We introduce a general strategy for in-
dexed parsing, with respect to what actions should
be taken at each stage.
Three main stages can be identified. The first
one consists of indexing actions that can be taken
off-line (along with other optimizations that can be
performed at compile-time). The second and third
stages refer to actions performed at run time.
Stage 1. In the off-line phase, a static analysis
of grammar rules can be performed. The complete
content of mothers and daughters may not be ac-
cessible, due to variables that will be instantiated
during parsing, but various sources of information,
such as the type signature, appropriateness specifi-
cations, and the types and features of mother and
daughter descriptions, can be analyzed and an ap-
propriate indexing scheme can be specified. This
phase of indexing may include determining: (1a)
which daughters in which rules will certainly not
unify with a specific mother, and (1b) what informa-
tion can be extracted from categories during parsing
that can constitute indexing keys. It is desirable to
perform as much analysis as possible off-line, since
the cost of any action taken during run time pro-
longs the parsing time.
Stage 2. During parsing, after a rule has been
completed, all variables in the mother have been ex-
tended as far as they can be before insertion into
the chart. This offers the possibility of further in-
vestigating the mother’s content and extracting sup-
plemental information from the mother that con-
tributes to the indexing keys. However, the choice
of such investigative actions must be carefully stud-
ied, since it might burden the parsing process.
Stage 3. While completing a rule, for each
daughter a matching edge is searched in the chart.
At this moment, the daughter’s active external vari-
ables have been extended as far as they can be be-
fore unification with a chart edge. The information
identified in stage (1b) can be extracted and unified
as a precursor to the remaining steps involved in cat-
egory unification. These steps also take place at this
stage.
4 TFSG Indexing
To reduce the time spent on failures when search-
ing for an edge in the chart, each edge (edge’s cat-
egory) has an associated index key which uniquely
identifies the set of daughter categories that can po-
tentially match it. When completing a rule, edges
unifying with a specific daughter are searched for in
the chart. Instead of visiting all edges in the chart,
the daughter’s index key selects a restricted number
of edges for traversal, thus reducing the number of
unification attempts.
The passive edges added to the chart represent
specializations of rules’ mothers. When a rule is
completed, its mother M is added to the chart ac-
cording to M’s indexing scheme, which is the set of
index keys of daughters that might possibly unify
with M. The index is implemented as a hash, where
the hash function applied to a daughter yields the
daughter’s index key (a selection of chart edges).
For a passive edge representing M, M’s index-
ing scheme provides the collection of hash entries
where it will be added.
Each daughter is associated with a unique index
key. During parsing, a specific daughter is searched
for in the chart by visiting only those edges that have
a matching key, thus reducing the time needed for
traversing the chart. The index keys can be com-
puted off-line (when daughters are indexed by posi-
tion), or during parsing.
4.1 Positional Indexing
In positional indexing, the index key for
each daughter is represented by its position
(rule number and daughter position in the
rule). The structure of the index can be de-
termined at compile-time (first stage). For
each mother M in the grammar, a collection
L a7 Ma9 a4a1a0 a7 Ria10 D ja9a3a2 daughters that can match Ma4 is
created (M’s indexing scheme), where each element
of L a7 Ma9 represents the rule number Ri and daughter
position D j inside rule Ri (1 a5 j a5 aritya7 Ria9 ) of a
category that can match with M.
For TFSGs it is not possible to compute off-line
the exact list of mother-daughter matching pairs, but
it is possible to rule out certain non-unifiable pairs
before parsing — a compromise that pays off with a
very low index management time.
During parsing, each time an edge (representing
a rule’s mother M) is added to the chart, it is in-
serted into the hash entries associated with the po-
sitions a7 Ria10 D ja9 from the list L a7 Ma9 (the number of
entries where M is inserted is a2L a7 Ma9a3a2). The entry
associated with the key a7 Ria10 D ja9 will contain only
categories that can possibly unify with the daughter
at position a7 Ria10 D ja9 in the grammar.
Because our parsing algorithm closes categories
depth-first under leftmost daughter matching, only
daughters Di with i a6 2 are searched for in the
chart (and consequently, indexed). We used the
EFD-based modification of this algorithm (Penn
and Munteanu, 2003), which needs no active edges,
and requires a constant two copies per edges, rather
than the standard one copy per retrieval found in
Prolog parsers. Without this, the cost of copying
TFS categories would have overwhelmed the bene-
fit of the index.
4.2 Path Indexing
Path indexing is an extension of positional index-
ing. Although it shares the same underlying prin-
ciple as the path indexing used in automated rea-
soning (Ramakrishnan et al., 2001), its functionality
is related to quick check: extract a vector of types
from a mother (which will become an edge) and a
daughter, and test the unification of the two vectors
before attempting to unify the edge and the daugh-
ter. Path indexing differs from quick-check in that
it identifies these paths by a static analysis of gram-
mar rules, performed off-line and with no training
required. Path indexing is also built on top of po-
sitional indexing, therefore the vector of types can
be different for each potentially unifiable mother-
daughter pair.
4.2.1 Static Analysis of Grammar Rules
Similar to the abstract interpretation used in pro-
gram verification (Cousot and Cousot, 1992),
the static analysis tries to predict a run-time
phenomenon (specifically, unification failures) at
compile-time. It tries to identify nodes in a mother
that carry no relevant information with respect to
unification with a particular daughter. For a mother
M unifiable with a daughter D, these nodes will
be grouped in a set StaticCut a7 Ma10 Da9 . Intuitively,
these nodes can be left out or ignored while com-
puting the unification of
a4
M and
a4
D. The StaticCut
can be divided into two subsets: StaticCuta7 Ma10 Da9 a4
RigidCut a7 Ma10 Da9a1a0 VariableCut a7 Ma10 Da9 a1
The RigidCut represents nodes that can be left out
because neither they, nor one of their δpi-ancestors,
can have their type values changed by means of ex-
ternal variable sharing. The VariableCut represents
nodes that are either externally shared, or have an
externally shared ancestor, but still can be left out.
Definition 2. RigidCut a7 Ma10 Da9 is the largest subset
of nodes x a7 M such that, a6 y a7 D for which x a15a17a16 y:
1. x a2a7 Exta7 Ma9 , y a2a7 Ext a7 Da9 ,
2. a6 xa13 a7 M s.t. a8 pi s.t. δa7 pia10 xa13 a9 a4 x, xa13 a2a7 Ext a7 Ma9 , and
3. a6 ya13 a7 D s.t. a8 pi s.t. δa7 pia10 ya13 a9 a4 y, ya13 a2a7 Ext a7 Da9 .
Definition 3. VariableCut is the largest subset of
nodes x a7 M such that:
1. x a2a7 RigidCut a7 Ma10 Da9 , and
2. a6 y a7 D for which x a15a17a16 y, a6 s a3 θa7 xa9 a10 a6 t a3 θa7 ya9 ,
s
a2
t exists.
In words, a node can be left out even if it is ex-
ternally shared (or has an externally shared ances-
tor) if all possible types this node can have unify
with all possible types its corresponding nodes in
D can have. Due to structure sharing, the types of
nodes in M and D can change during parsing, by
being specialized to one of their subtypes. Condi-
tion 2 ensures that the types of these nodes will re-
main compatible (have a least upper bound), even if
they specialize during rule completion. An intuitive
example (real-life examples cannot be reproduced
here — a category in a typical TFSG can have hun-
dreds of nodes) is presented in Figure 4.
y2
y1
y3 y5 t1t6 t6
y4 t1
t5
F:
G:
H:
G:
K:
Dx1
x2 x3
x4
F: H:
G:
I:
t7 t7
t3
t1
G:t1
H:t6
F:t6
K:t1
I:t3
t1
t5 t3G:t5 t4
t2 J:t5
t7
t6
t0
T
t8
M
Figure 4: Given the above type signature, mother M and
daughter D (externally shared nodes are pointed to by
dashed arrows), nodes x1
a4
x2
a4
and x3 from M can be left
out when unifying M with D during parsing. x1 and x3
a5 RigidCut
a6 M
a4
Da7 , while x2 a5 VariableCuta6 M
a4
Da7 (θa6 y2a7
can promote only to t7, thus x2 and y2 will always be
compatible). x4 is not included in the StaticCut, because
if θa6 y5a7 promotes to t5, then θa6 y4a7 will promote to t5 (not
unifiable with t3).
When computing the unification between a
mother and a daughter during parsing, the same out-
come (success or failure) will be reached by using
a reduced representation of the mother (
a4
MsD), with
nodes in StaticCut a7 Ma10 Da9 removed from
a4
M.
Proposition 1. For a mother M and a daughter D,
if M
a2
D a3 before parsing, and
a4
M (as an edge in the
chart) and
a4
D exist, then during parsing: (1)
a4
MsD
a2a4
D a3 a4a6a5
a4
M
a2
a4
D a3 , (2)
a4
MsD
a2
a4
D a8 a4a6a5
a4
M
a2
a4
D a8 .
Proof. The second part (
a4
MsD
a2
a4
D a8 a4a6a5
a4
M
a2
a4
D a8 )
of Proposition 1 has a straightforward proof: if
a4
MsD
a2
a4
D a8 , then a8 a4z a7
a4
MsD a0
a4
D such that a9 a8 t for
which a6 a4x a7 a11a4za12a11a10a13a12 a10 t a3 θa7 a4xa9 . Since
a4
MsD a14
a4
M, a8 a4z a7
a4
M a0
a4
D such that a9 a8 t for which a6 a4x a7 a11a4za12 a10a13a12 a10 t a3 θa7 a4xa9 ,
and therefore,
a4
M
a2
a4
D a8 .
The first part of the proposition will be proven by
showing that a6 a4z a7
a4
M a0
a4
D, a consistent type can be
assigned to a11
a4z
a12a11a10a13a12 , where a11
a4z
a12a11a10a13a12 is the set of nodes in
a4
M
and
a4
D equivalent to a4z with respect to the unification
of
a4
M and
a4
D.1
Three lemmata need to be formulated:
Lemma 1. If a4x a7
a4
M and x a7 a11a4xa12 a3 1, then θa7 a4xa9a15a3 θa7 xa9 .
Similarly, for a4y a7
a4
D, y a7 a11a4ya12 a3 1, θa7 a4ya9a16a3 θa7 ya9 .
Lemma 2. If types t0a10 t1a10 a1a2a1a2a1 a10 tn are such that a6 ta130 a3
t0a10 a6 i a7 a7 1a10 a1a2a1a2a1 a10 na9 , ta130
a2
ti a3 , then a8 t a3 t0 such that a6 i a7
a7 1a10
a1a2a1a2a1
a10 na9 , t a3 ti.
1Because we do not assume inequated TFSs (Carpenter,
1992) here, unification failure must result from type inconsis-
tency.
Lemma 3. If a4x a7
a4
M and a4y a7
a4
D for which a4x a15a17a16 a4y, then
a8 x a7 a11
a4x
a12
a3 1
a10 a8 y a7 a11
a4y
a12
a3 1 such that x
a15a17a16 y.
In proving the first part of Proposition 1, four
cases are identified: Case A: a2 a11a4za12 a10 a12 a1
a4
Ma2 a4 1 and
a2 a11
a4z
a12 a10 a12
a1
a4
Da2a6a4 1, Case B: a2 a11a4za12a11a10a13a12 a1
a4
Ma2 a4 1 and a2 a11a4za12a11a10a13a12 a1
a4
Da2a12a5 1, Case C: a2 a11a4za12a11a10 a12 a1
a4
Ma2a12a5 1 and a2 a11a4za12a11a10a13a12 a1
a4
Da2 a4 1,
Case D: a2 a11a4za12a11a10a13a12 a1
a4
Ma2 a5 1 and a2 a11a4za12a11a10a13a12 a1
a4
Da2 a5 1. Case A
is trivial, and D is a generalization of B and C.
Case B. It will be shown that a8 t a7 Type such that
a6 a4y
a7 a11
a4z
a12 a10 a12
a1
a4
D and for a0 a4xa4 a4 a11a4za12a11a10a13a12 a1
a4
M, t a3 θa7 a4ya9 and
t a3 θa7 a4xa9 .
Subcase B.i: a4x a7
a4
Ma10 a4x a2a7
a4
MsD. a6 a4y a7 a11a4za12 a10a13a12 a1
a4
D,
a4y
a15a17a16
a4x. Therefore, according to Lemma 3,
a8 x a7
a11
a4x
a12
a3 1
a10 a8 y a7 a11
a4y
a12
a3 1 such that x
a15a17a16 y. Thus, according
to Condition 2 of Definition 3, a6 s a3 θa7 ya9 a10 a6 t a3 θa7 xa9 ,
s
a2
t a3 . But according to Lemma 1, θa7
a4y
a9 a3 θa7 ya9 and
θa7 a4xa9 a3 θa7 xa9 . Therefore, a6 a4y a7 a11a4za12 a10a13a12 a1
a4
D, a6 s a3 θa7 a4ya9 ,
a6 t
a3 θa7
a4x
a9 , s
a2
t a3 , and hence, a6 a4y a7 a11a4za12 a10a13a12 a1
a4
Da10 a6 t a3
θa7 a4xa9 a10 t
a2
θa7 a4ya9 a3 . Thus, according to Lemma 2, a8 t a3
θa7 a4xa9 a10 a6 a4y a7 a11a4za12 a10 a12 a1
a4
D, t a3 θa7 a4ya9 .
Subcase B.ii: a4x a7
a4
Ma10 a4x a7
a4
MsD. Since
a4
MsD
a2
a4
D a3 ,
a8 t a3 θa7
a4x
a9 such that
a6 a4y
a7 a11
a4z
a12 a10a13a12
a1
a4
D, t a3 θa7 a4ya9 .
Case C. It will be shown that a8 t a3 θa7 a4ya9 such
that a6 a4x a7 a11
a4z
a12a11a10a13a12 , t a3 θa7
a4x
a9 . Let a0
a4y
a4 a4 a11
a4z
a12a11a10 a12
a1
a4
D. The
set a11a4za12a11a10a13a12 a1
a4
M can be divided into two subsets: Sii a4
a0
a4x
a7 a11
a4z
a12 a10a13a12
a1
a4
Ma2a4x a7
a4
MsDa4 , and Si a4 a0 a4x a7 a11a4za12 a10a13a12 a1
a4
Ma2a4x a7
a4
Ma10
a4x
a2a7
a4
MsD , and x a7 VariableCuta7 Ma10 Da9 a4 . If x
were in RigidCut a7 Ma10 Da9 , then necessarily a2 a11a4za12 a10a13a12 a1
a4
Ma2
would be 1. Since Sii a14
a4
MsD and
a4
MsD
a2
a4
D a3 , then
a8 ta13 a3 θa7
a4y
a9 such that
a6 a4x
a7 Siia10 ta13 a3 θa7
a4x
a9 (*). How-
ever, a6 a4x a7 Sii, a4x a15a17a16 a4y. Therefore, according to
Lemma 3, a6 a4x a7 Sii a10 a8 x a7 a11a4xa12 a3 1a10 a8 y a7 a11a4ya12 a3 1 such that
x a15a17a16 y. Thus, since x a7 VariableCuta7 Ma10 Da9 , Condi-
tion 2 of Definition 3 holds, and therefore, accord-
ing to Lemma 1, a6 s1 a3 θa7 a4xa9 a10 a6 s2 a3 θa7 a4ya9 a10 s1
a2
s2 a3 .
More than this, since ta13 a3 θa7
a4y
a9 (for the type ta13 from
(*)), a6 s1 a3 θa7
a4x
a9 a10
a6 s
a132 a3 ta13 a10 s1
a2
sa132 a3 , and hence, a6 sa132 a3
ta13 a10 sa132
a2
θa7 a4xa9 a3 . Thus, according to Lemma 2 and to
(*), a8 t a3 ta13 a3 θa7 a4ya9 such that a6 a4x a7 Siia10 t a3 θa7 a4xa9 a1 Thus,
a8 t such that
a6 a4x
a7 a11
a4z
a12 a10a13a12 , t a3 θa7
a4x
a9 .
While Proposition 1 could possibly be used by
grammar developers to simplify TFSGs themselves
at the source-code level, here we only exploit it for
internally identifying index keys for more efficient
chart parsing with the existing grammar. There may
be better static analyses, and better uses of this static
analysis. In particular, future work will focus on us-
ing static analysis to determine smaller representa-
tions (by cutting nodes in Static Cuts) of the chart
edges themselves.
4.2.2 Building the Path Index
The indexing schemes used in path indexing are
built on the same principles as those in positional
indexing. The main difference is the content of the
indexing keys, which now includes a third element.
Each mother M has its indexing scheme defined as:
L a7 Ma9a19a4 a0 a7 Ria10 D ja10 Vi
a2 j
a9 a4 . The pair a7 Ria10 D ja9 is the po-
sitional index key (as in positional indexing), while
Vi
a2 j is the path index vector containing type values
extracted from M. A different set of types is ex-
tracted for each mother-daughter pair. So, path in-
dexing uses a two-layer indexing method: the po-
sitional key for daughters, and types extracted from
the typed feature structure. Each daughter’s index
key is now given by L a7 D ja9 a4 a0 a7 Ria10 Vi
a2 j
a9 a4 , where Ri
is the rule number of a potentially matching mother,
and Vi
a2 j
is the path index vector containing types ex-
tracted from D j.
The types extracted for the indexing vectors
are those of nodes found at the end of indexing
paths. A path pi is an indexing path for a mother-
daughter pair a7 Ma10 Da9 iff: (1) pi is defined for both M
and D, (2) a8 x a7 StaticCut a7 Ma10 Da9 a10 a8 f s.t. δa7 f a10 xa9 a4
δa7 pia10 qMa9 (qM is M’s root), and (3) δa7 pia10 qMa9 a2a7
StaticCut a7 Ma10 Da9 . Indexing paths are the “frontiers”
of the non-statically-cut nodes of M.
A similar key extraction could be performed dur-
ing Stage 2 of indexing (as outlined in Section 3),
using
a4
M rather than M. We have found that this on-
line path discovery is generally too expensive to be
performed during parsing, however.
As stated in Proposition 1, the nodes in
StaticCut a7 Ma10 Da9 do not affect the success/failure
of
a4
M
a2
a4
D. Therefore, the types of first nodes
not included in StaticCut a7 Ma10 Da9 along each path
pi that stems from the root of M and D are in-
cluded in the indexing key, since these nodes might
contribute to the success/failure of the unifica-
tion. It should be mentioned that the vectors Vi
a2 j
are filled with values extracted from
a4
M after M’s
rule is completed, and from
a4
D after all daugh-
ters to the left of D are unified with edges in the
chart. As an example, assuming that the index-
ing paths are THROWER:PERSON, THROWN, and
THROWN:GENDER, the path index vector for the
TFS shown in Figure 2 is a7 thirda10 indexa10 neutera9 .
4.2.3 Using the Path Index
Inserting and retrieving edges from the chart using
path indexing is similar to the general method pre-
sented at the beginning of this section. The first
layer of the index is used to insert a mother as
an edge into appropriate chart entries, according to
the positional keys for the daughters it can match.
Along with the mother, its path index vector is in-
serted into the chart.
When searching for a matching edge for a daugh-
ter, the search is restricted by the first indexing layer
to a single entry in the chart (labeled with the posi-
tional index key for the daughter). The second layer
restricts searches to the edges that have a compati-
ble path index vector. The compatibility is defined
as type unification: the type pointed to by the el-
ement Vi
a2 j
a7 na9 of an edge’s vector Vi
a2 j
should unify
with the type pointed to by the element Vi
a2 j
a7 na9 of the
path index vector Vi
a2 j
of the daughter on position D j
in a rule Ri.
5 Experimental Evaluation
Two TFSGs were used to evaluate the performance
of indexing: a pre-release version of the MERGE
grammar, and the ALE port of the ERG (in its final
form). MERGE is an adaptation of the ERG which
uses types more conservatively in favour of rela-
tions, macros and complex-antecedent constraints.
This pre-release version has 17 rules, 136 lexical
items, 1157 types, and 144 introduced features. The
ERG port has 45 rules, 1314 lexical entries, 4305
types and 155 features. MERGE was tested on 550
sentences of lengths between 6 and 16 words, ex-
tracted from the Wall Street Journal annotated parse
trees (where phrases not covered by MERGE’s vo-
cabulary were replaced by lexical entries having the
same parts of speech), and from MERGE’s own
test corpus. ERG was tested on 1030 sentences of
lengths between 6 and 22 words, extracted from the
Brown Corpus and from the Wall Street Journal an-
notated parse trees.
Rather than use the current version of ALE, TFSs
were encoded as Prolog terms as prescribed in
(Penn, 1999a), where the number of argument po-
sitions is the number of colours needed to colour
the feature graph. This was extended to allow for
the enforcement of type constraints during TFS uni-
fication. Types were encoded as attributed variables
in SICStus Prolog (Swedish Institute of Computer
Science, 2004).
5.1 Positional and path indexing evaluation
The average and best improvements in parsing times
of positional and path indexing over the same EFD-
based parser without indexing are presented in Ta-
ble 1. The parsers were implemented in SICStus
3.10.1 for Solaris 8, running on a Sun Server with 16
GB of memory and 4 UltraSparc v.9 processors at
1281 MHz. For MERGE, parsing times range from
10 milliseconds to 1.3 seconds. For ERG, parsing
times vary between 60 milliseconds and 29.2 sec-
onds.
Positional Index Path Index
average best average best
MERGE 1.3% 50% 1.3% 53.7%
ERG 13.9% 36.5% 12% 41.6%
Table 1: Parsing time improvements of positional and
path indexing over the non-indexed EFD parser.
5.2 Comparison with statistical optimizations
Non-statistical optimizations can be seen as a first
step toward a highly efficient parser, while statistical
optimization can be applied as a second step. How-
ever, one of the purposes of non-statistical index-
ing is to eliminate the burden of training while of-
fering comparable improvements in parsing times.
A quick-check parser was also built and evaluated
and the set-up times for the indexed parsers and
the quick-check parser were compared (Table 2).
Quick-check was trained on a 300-sentence training
corpus, as prescribed in (Malouf et al., 2000). The
training corpus included 150 sentences also used in
testing. The number of paths in path indexing is dif-
ferent for each mother-daughter pair, ranging from
1 to 43 over the two grammars.
Positional Path Quick
Index Index Check
Compiling grammar 6’30”
Compiling index 2” 1’33” -
Training - - 3h28’14”
Total set-up time: 6’32” 8’3” 3h34’44”
Table 2: The set-up times for non-statistically indexed
parsers and statistically optimized parsers for MERGE.
As seen in Table 3, quick-check alone surpasses
positional and path indexing for the ERG. How-
ever, it is outperformed by them on the MERGE,
recording slower times than even the baseline. But
the combination of quick-check and path indexing
is faster than quick-check alone on both grammars.
Path indexing at best provided no decrease in per-
formance over positional indexing alone in these ex-
periments, attesting to the difficulty of maintaining
efficient index keys in an implementation.
Positional Path Quick Quick +
Indexing Indexing Check Path
MERGE 1.3% 1.3% -4.5% -4.3%
ERG 13.9% 12% 19.8% 22%
Table 3: Comparison of average improvements over non-
indexed parsing among all parsers.
The quick-check evaluation presented in (Malouf
et al., 2000) uses only sentences with a length of
at most 10 words, and the authors do not report the
set-up times. Quick-check has an additional advan-
tage in the present comparison, because half of the
training sentences were included in the test corpus.
While quick-check improvements on the ERG
confirm other reports on this method, it must be
Grammar Successful Failed unifications Failure rate reduction (vs. no index)
unifications EFD Positional Path Quick Positional Path Quick
non-indexed Index Index Check Index Index Check
MERGE 159 755 699 552 370 7.4% 26.8% 50.9%
ERG 1078 215083 109080 108610 18040 49.2% 49.5% 91.6%
Table 4: The number of successful and failed unifications for the non-indexed, positional indexing, path indexing, and
quick-check parsers, over MERGE and ERG (collected on the slowest sentence in the corresponding test sets.)
noted that quick-check appears to be parochially
very well-suited to the ERG (indeed quick-check
was developed alongside testing on the ERG). Al-
though the recommended first 30 most probable
failure-causing paths account for a large part of
the failures recorded in training on both grammars
(94% for ERG and 97% for MERGE), only 51 paths
caused failures at all for MERGE during training,
compared to 216 for the ERG. Further training with
quick-check for determining a better vector length
for MERGE did not improve its performance.
This discrepancy in the number of failure-causing
paths could be resulting in an overfitted quick-check
vector, or, perhaps the 30 paths chosen for MERGE
really are not the best 30 (quick-check uses a greedy
approximation). In addition, as shown in Table 4,
the improvements made by quick-check on the ERG
are explained by the drastic reduction of (chart look-
up) unification failures during parsing relative to the
other methods. It appears that nothing short of a
drastic reduction is necessary to justify the overhead
of maintaining the index, which is the largest for
quick-check because some of its paths must be tra-
versed at run-time — path indexing only uses paths
available at compile-time in the grammar source.
Note that path indexing outperforms quick-check on
MERGE in spite of its lower failure reduction rate,
because of its smaller overhead.
6 Conclusions and Future Work
The indexing method proposed here is suitable for
several classes of unification-based grammars. The
index keys are determined statically and are based
on an a priori analysis of grammar rules. A ma-
jor advantage of such indexing methods is the elim-
ination of the lengthy training processes needed
by statistical methods. Our experimental evalu-
ation demonstrates that indexing by static analy-
sis is a promising alternative to optimizing parsing
with TFSGs, although the time consumed by on-line
maintenance of the index is a significant concern —
echoes of an observation that has been made in ap-
plications of term indexing to databases and pro-
gramming languages (Graf, 1996). Further work
on efficient implementations and data structures is
therefore required. Indexing by static analysis of
grammar rules combined with statistical methods
also can provide a higher aggregate benefit.
The current static analysis of grammar rules used
as a basis for indexing does not consider the effect
of the universally quantified constraints that typi-
cally augment the signature and grammar rules. Fu-
ture work will investigate this extension as well.
References
B. Carpenter and G. Penn. 1996. Compiling typed
attribute-value logic grammars. In H. Bunt and
M. Tomita, editors, Recent Advances in Parsing
Technologies, pages 145–168. Kluwer.
B. Carpenter. 1992. The Logic of Typed Feature
Structures. Cambridge University Press.
P. Cousot and R. Cousot. 1992. Abstract interpre-
tation and application to logic programs. Journal
of Logic Programming, 13(2–3).
R. Elmasri and S. Navathe. 2000. Fundamentals of
database systems. Addison-Wesley.
D. Flickinger. 1999. The English Resource Gram-
mar. http://lingo.stanford.edu/erg.html.
P. Graf. 1996. Term Indexing. Springer.
B. Kiefer, H.U. Krieger, J. Carroll, and R. Malouf.
1999. A bag of useful techniques for efficient and
robust parsing. In Proceedings of the 37th An-
nual Meeting of the ACL.
R. Malouf, J. Carrol, and A. Copestake. 2000. Effi-
cient feature structure operations without compi-
lation. Natural Language Engineering, 6(1).
G. Penn and C. Munteanu. 2003. A tabulation-
based parsing method that reduces copying. In
Proceedings of the 41st Annual Meeting of the
ACL, Sapporo, Japan.
G. Penn. 1999a. An optimised Prolog encoding of
typed feature structures. Technical Report 138,
SFB 340, T¨ubingen.
G. Penn. 1999b. Optimising don’t-care non-
determinism with statistical information. Techni-
cal Report 140, SFB 340, T¨ubingen.
C. Pollard and I. Sag. 1994. Head-driven Phrase
Structure Grammar. The University of Chicago
Press.
I.V. Ramakrishnan, R. Sekar, and A. Voronkov.
2001. Term indexing. In Handbook of Auto-
mated Reasoning, volume II, chapter 26. Elsevier
Science.
Swedish Institute of Computer Science. 2004. SIC-
Stus Prolog 3.11.0. http://www.sics.se/sicstus.
