Abductive Explanation-based Learning Improves Parsing Accuracy and
Efficiency
Oliver Streiter
Language and Law, European Academy, Bolzano, Italy
ostreiter@eurac.edu
Abstract
Natural language parsing has to be accu-
rate and quick. Explanation-based Learn-
ing (EBL) is a technique to speed-up pars-
ing. The accuracy however often declines
with EBL. The paper shows that this accu-
racy loss is not due to the EBL framework
as such, but to deductive parsing. Abduc-
tive EBL allows extending the deductive
closure of the parser. We present a Chi-
nese parser based on abduction. Exper-
iments show improvements in accuracy
and efficiency.1
1 Introduction
The difficulties of natural language parsing, in gen-
eral, and of parsing Chinese, in particular, are due to
local ambiguities of words and phrases. Extensive
linguistic and non-linguistic knowledge is required
for their resolution (Chang, 1994; Chen, 1996). Dif-
ferent parsing approaches provide different types of
knowledge. Example-based parsing approaches of-
fer rich syntagmatic contexts for disambiguation,
richer than rule-based approaches do (Yuang et al.,
1992). Statistical approaches to parsing acquire
mainly paradigmatic knowledge and require larger
corpora, c.f. (Carl and Langlais, 2003). Statisti-
cal approaches handle unseen events via smoothing.
Rule-based approaches use abstract category labels.
1This research has been carried out within Logos Gaias
project, which integrates NLP technologies into a Internet-
based natural language learning platform (Streiter et al., 2003).
Example-based parsing generalizes examples dur-
ing compilation time, e.g. (Bod and Kaplan, 1998),
or performs a similarity-based fuzzy match during
runtime (Zavrel and Daelemans, 1997). Both tech-
niques may be computationally demanding, their ef-
fect on parsing however is quite different, c.f. (Stre-
iter, 2002a).
Explanation-based learning (EBL) is a method to
speed-up rule-based parsing via the caching of ex-
amples. EBL however trades speed for accuracy.
For many systems, a small loss in accuracy is accept-
able if an order of magnitude less computing time
is required. Apart from speed, one generally rec-
ognizes that EBL acquires some kind of knowledge
from texts. However, what is this knowledge like
if it does not help with parsing? Couldn’t a system
improve by learning its own output? Can a system
learn to parse Chinese by parsing Chinese? The pa-
per sets out to tackle these questions in theory and
practice.
1.1 Explanation-based Learning (EBL)
Explanation-based learning techniques transform a
general problem solver (PS) into a specific and op-
erational PS (Mitchel et al., 1986). The caching of
the general PS’s output accounts for this transfor-
mation. The PS generates, besides the output, a doc-
umentation of the reasoning steps involved (the ex-
planation). This determines which output the system
will cache.
The utility problem questions the claim of
speeding-up applications (Minton, 1990): Retriev-
ing cached solutions in addition to regular process-
ing requires extra time. If retrieval is slow and
cached solutions are rarely re-used, the cost-benefit
ratio is negative.
The accuracy of the derived PS is generally be-
low that of the general PS. This may be due to the
EBL framework as such or the deductive base of
the PS. Research in abductive EBL (A-EBL) seems
to suggest the latter: A-EBL has the potential to
acquire new knowledge (Dimopoulos and Kakas,
1996). The relation between knowledge and accu-
racy however is not a direct and logical one. The
U-shaped language learning curves in children ex-
emplifies the indirect relation (Marcus et al., 1992).
Wrong regular word forms supplant correct irregu-
lar forms when rules are learned. We therefore can-
not simply equate automatic knowledge acquisition
and accuracy improvement, in particular for com-
plex language tasks.
1.2 EBL and Natural Language Parsing
Previous research has applied EBL for the speed-up
of large and slow grammars. Sentences are parsed.
Then the parse trees are filtered and cached. Sub-
sequent parsing uses the cached trees. A com-
plex HPSG-grammar transforms into tree-structures
with instantiated values (Neumann, 1994). One
hash table lookup of POS-sequences replaces typed-
feature unification. Experiments conducted in EBL-
augmented parsing consistently report a speed-up
of the parser and a drop in accuracy (Rayner and
Samuelsson, 1994; Srinivas and Joshi, 1995).
A loss of information may explain the drop of ac-
curacy. Contextual information, taken into account
by the original parser, may be unavailable in the
new operational format (Sima’an, 1997), especially
if partial, context-dependent solutions are retrieved.
In addition, the set of cached parse trees, judged to
be ”sure to cache”, is necessarily biased (Streiter,
2002b). Most cached tree structures are short noun
phrases. Parsing from biased examples will bias the
parsing.
A further reason for the loss in accuracy are incor-
rect parses which leak into the cache. A stricter filter
does not solve the problem. It increases the bias in
the cache, reduces the size of the cache, and evokes
the utility problem.
EBL actually can improve parsing accuracy (Stre-
iter, 2002b) if the grammar does not derive the
parses to be cached via deduction but via abduction.
The deductive closure2 which cannot increase with
EBL from deductive parsing may increase with ab-
ductive parsing.
2 A Formal View on Parsing and Learning
We use the following notation throughout the paper:
a0a2a1 a3a5a4a7a6a9a8a11a10 (function a0 applied to a4 yields x),
a0a13a12a1a14a3a5a4a7a6a15a8a16a10 (relation a0 applied to a4 yields x).
a17a19a18a20a18a20a18a22a21 and
a23
a18a20a18a20a18a25a24 represent tuples and sets respec-
tively. The a26 prefix denotes the cardinality of a col-
lection, e.g. a26a27a23a29a28a31a30a33a32a34a28a36a35 a24 a8a38a37 .
Uppercase variables stand for collections and
lowercase variables for elements. Collections may
contain the anonymous variable a39 (the variable _ in
PROLOG). Over-braces or under-braces should fa-
cilitate reading: a40a42a41a43a40
a44 a45a47a46 a48
a35
a8a49a37a51a50
a40a44 a45a47a46 a48
a35
.
A theory a52 is a17a54a53 a32a56a55a57a32a59a58 a21 where a58 is a set of
rules a60 . a53 and a55 are two disjoint sets of attributes
a61 and
a62 (e.g.
a53
a8
a23a29a63a36a64a65a28a67a66a68a64a65a63a69a32a34a63a36a70a72a71a33a60a74a73a75a63a69a32
a18a20a18a20a18a25a24a74a76
a55
a8
a23a29a63a78a77a79a28a78a73a75a63a69a32a34a63a74a80a7a62a82a81a83a81a83a63a69a32
a18a20a18a20a18a25a24 ). A rule is written as
a60
a8
a17
a28a72a32
a61
a21 or
a60
a12a1 a3
a28
a6a84a8a85a61 . A rule specifies the rela-
tion between an observable fact a28 and an attribute a61
assigned to it. a86 is the set of observable data with
each a28a84a87a88a86 being a tuple a28 a8 a17 a61 a32a59a62 a21 .3
a89 is the set of data classified according to
a52 , with
a90a51a8
a17
a28a72a32
a61
a21 .
a28 , a62 and
a61 may have an internal struc-
ture in the form of ordered or unordered collections
of more elementary a28 , a62 and a61 respectively.
Transferring this notation to the description of
parsing, a52 is a syntactic formalism and a58 a gram-
mar. a53 is the union of syntax trees and morpho-
syntactic tags. a86 is a corpus tagged with a53 . a55 cor-
responds to a list of words, phrases or sentences (the
surface strings). a89 is a treebank, a cache of parse
trees, or a history of explanations.
a90a92a91a33a93a34a94a34a95a97a96a98a8
a17a99a17
a61a100a91a33a101a59a95
a32a59a62a82a102
a96a97a103a33a96a82a104a98a96
a21
a32
a61a106a105a20a94a59a96a97a96
a21 (1)
2.1 Parsing: a107 a12a1a108a3 a28 a6a15a8 a23 a90a75a109a36a96a97a110 a24
A parser defines a relation between a86 and a89 (c.f.
2). Parsing is a relation between a28 and a subset of a89
(c.f. 3).
a107
a12a1a108a3
a86
a6a15a8
a89 (2)
2The deductive closure of the set of axioms
a111 is the set a112
which can be proved from it.
3The formalization follows (Dimopoulos and Kakas, 1996).
a107
a12a1a14a3
a28
a6a15a8
a23
a90a113a109a67a96a97a110
a24 (3)
Simplifying, we can assume that a107 is defined as
the set of rules, i.e. a107 a8 a17 a86a114a32 a89 a21 a8 a58 . A spe-
cific parser a107 is derived by the application of a115 to the
training material (e.g. a89 ): a115 a12a1a116a3 a89 a6a117a8 a107 . The set of
possible relations a115 is a118 . Elements of a118 are caching
(no generalization), induction (hypothesis after data
inspection) and abduction (hypothesis during classi-
fication). Equation (5) describes the cycle of gram-
mar learning and grammar application.
a115
a12a1a108a3
a89
a6a15a8
a107 (4)
a3
a115
a12a1a14a3
a89
a101
a102a120a119
a6a59a6
a44 a45a121a46 a48
a122
a12a1a108a3
a86
a6a15a8
a89
a109a67a96a97a110 (5)
2.1.1 Memory-based Parsing
a107 is based on memory if
a3
a115
a12a1a14a3a83a90a33a6a59a6a123a8
a17
a90
a32
a90
a21
a8
a17
a60a67a32a59a60
a21 .
a124 in (6) is the trivial formalization of
caching. Parsing proceeds via recalling a125 defined
in (7). The cycle of grammar learning and parsing
a125
a12a1 a3
a124
a6 is defined in (8): The training material a90a113a110
yields the parsing output a90 a110 .4
a124
a12a1a14a3
a17
a28a100a126a31a32
a61a31a110
a21
a6a127a8
a17
a28a67a126a74a32
a61a106a110
a21
a44 a45a121a46 a48
a128
(6)
a125
a12a1a14a3
a28a67a126
a6a15a8
a17
a28a100a126a31a32
a61a31a110
a21 (7)
parsing a28 a126
a46 a48a47a44 a45
learning a125 from a90
a46 a48a121a44 a45
a3
a124
a12a1a14a3
a129a131a130
a46 a48a121a44 a45
a17
a28a67a126a74a32
a61a106a110
a21
a6a59a6a15a12a1a108a3
a28a100a126
a6a127a8
a17
a28 a126 a32
a61 a110
a21
a44 a45a121a46 a48
a129a131a130
(8)
2.1.2 Deduction-based Parsing
Let a132a106a71a33a81a56a71a33a133a92a71 be a function which replaces one or
more elements of a collection by a named variable
or a39 . a107 is a deductive inference if a60 is obtained
from an induction (a reduction of a28 with the help of
a132a106a71a29a81a56a71a47a133a59a71 ). The following expressions define induction
a134 (9), deduction
a135 (10) and the inductive-deductive
cycle a135 a12a1a108a3a131a134a82a6 (11):
4We use subscripts to indicate the identity of variables. The
same subscript of two variables implies the identity of both vari-
ables. Different subscripts imply nothing. The variables may
be identical or not identical. In memory-based parsing, learning
material and parsing output are identical.
a134a136a12a1a43a3
a17a99a17
a61a31a110
a32a59a62
a110
a21
a32
a61
a126
a21
a44 a45a47a46 a48
a129a131a137
a6a15a8
a17
a132a106a71a33a81a56a71a33a133a92a71
a1a108a3
a17
a61a106a110
a32a59a62
a110
a21
a6
a44 a45a121a46 a48
a138
a93
a130a140a139a141a97a142
a32
a61
a126
a21
a44 a45a47a46 a48
a94
a137a131a143a34a144a146a145
(9)
a135
a12a1a108a3
a17
a61a106a110
a32a59a62
a103
a21
a44 a45a121a46 a48
a101
a130
a6a15a8
a17a147a17
a61a31a110
a32a59a62
a103
a21
a32
a61
a126
a21 (10)
parsing a28a36a148
a46 a48a47a44 a45
a3a131a134a149a12a1
a129 a137
a46 a48a121a44 a45
a3
a17a147a17
a61a31a110
a32a59a62
a110
a21
a32
a61
a126
a21
a6a127a12a1a14a3
a17
a61a106a110
a32a59a62
a103
a21
a6a59a6a127a8
a17a99a17
a61a31a110
a32a59a62
a103
a21
a32
a61
a126
a21
(11)
2.1.3 Abduction-based Parsing
Abduction, defined as
a150
a46 a48a47a44 a45
a3a5a151a152a12a1a108a3a83a90a33a6a59a6a117a12a1a153a3
a28
a6 is a run-
time generalization which is triggered by a concrete
a28 to be classified. We separate
a151 and
a154 for presen-
tation purpose only.5 The relation a155 may express a
similarity, a temporal or causal relation. (12) and the
cycle of a154 a12a1a108a3a5a151a117a6 (13) define abduction.
a151a156a12a1a108a3a83a90a47a6a123a8
a155
a12a1a14a3a83a90a33a6a15a8
a60 (12)
parsing a28a67a157
a46 a48a47a44 a45
learning a154 from a90
a46 a48a121a44 a45
a3a5a151a152a12a1a14a3
a17a147a17
a61 a103
a32a59a62
a103
a21
a32
a61
a129
a21
a6a59a6a127a12a1
a101a92a158
a46 a48a121a44 a45
a3
a17
a61
a157 a32a59a62 a157
a21
a6a127a8
a17a159a17
a61
a157a31a32a59a62a131a157
a21
a44 a45a121a46 a48
a101 a158
a32
a61a106a94
a21
(13)
Abduction subsumes reasoning by analogy. Ab-
duction is an analogy, if a155 describes a similarity.
Reasoning from rain to snow is a typical analogy.
Reasoning from wet street to rain is an abductive
reasoning. For a parsing approach based on analogy
c.f. (Lepage, 1999).
5Abduction is a process of hypothesis generation. Deduc-
tion and abduction may work conjointly whenever deductive in-
ferences encounter gaps. A deductive inference stops in front of
a gap between the premises and a possible conclusion. Abduc-
tion creates a new hypothesis, which allows to bridge the gap
and to continue the inference.
2.2 Learning: a89a114a160 a3a59a3 a115 a12a1a14a3 a89 a6a59a6a15a12a1a14a3 a28 a6a59a6
In this section, we formalize EBL. We mechanically
substitute a115 in the definition of EBL by a124a161a32a162a135a29a32a162a154 to
show their learning potentials.
A learning system changes internal states, which
influence the performance. The internal states of a107
are determined by a89 and a118 . We assume that, for a
given a107 , a118 remains identical before and after learn-
ing. Therefore, the comparison of a89 (before learn-
ing) with a89a163a160 a90a113a109a36a96a82a110 (after learning) reveals the ac-
quired knowledge.
We define EBL in (14). a3 a115 a12a1 a3 a89 a6a59a6 is the parser
before learning. This parser applies to a28 and yields
a90a113a109a67a96a97a110 , formalized as a3
a115
a12a1 a3
a89
a6a59a6a164a12a1 a3
a28
a6 . The new
parser is the application of a118 to the union of a89 and
a90 a109a67a96a97a110 .
a107
a109a36a96a82a110a165a8
a115
a12a1a14a3
a89a114a160
a3a59a3
a115
a12a1a108a3
a89
a6a59a6
a44 a45a121a46 a48
a122a33a166a82a167a168
a12a1a14a3
a28
a6
a44 a45a121a46 a48
a169
a129a131a170a113a171 a130a173a172
a6a59a6
(14)
From two otherwise identical parsers, the parser
with a90a174a8 a17a147a17 a61a106a101 a32a162a39 a21 a32 a61 a129 a21 not present in the other
has a greater deductive closure. The cardinality of
a17a147a17
a61a106a101
a32a162a39
a21
a32
a61
a129
a21
a87
a89 reflects an empirical knowl-
edge. The empirical knowledge does not allow to
conclude something new, but to resolve ambigui-
ties in accordance with observed data, e.g. for a
sub-language as shown in (Rayner and Samuelsson,
1994). Both learning techniques have the potential
of improving the accuracy.
2.2.1 Learning through Parsing
A substitution of a118 with a124a161a32a162a135a29a32a162a154 reveals the trans-
formation of a90 a101 a102a175a119 to a90a113a109a67a96a97a110 . We start with caching and
recalling (Equation 15).
a107
a109a67a96a97a110a176a8
a124
a12a1a14a3
a23
a90a113a177
a24
a160
a3
a124
a12a1a14a3a83a90a113a177a131a6a59a6
a44 a45a121a46 a48
a128
a12a1a14a3
a28
a177a56a6
a44 a45a121a46 a48
a129a131a178
a6 (15)
Parsing a28 a177 with the cache of a90 a177 yields a90 a177 . The de-
ductive closure is not enlarged. Quantitative rela-
tions with respect to a28 change ina89 . If a90a113a177 is not cached
twice, memory-based EBL is idempotent.6
6Idempotence is the property of an operation that results in
the same state no matter how many times it is executed.
EBL with induction and deduction is shown in
(16). Here the subscripts merit special attention:
a28
a8
a17
a61a106a103
a32a59a62a131a157
a21 is parsed from
a90a179a8
a17a147a17
a61a31a103
a32a59a62
a103
a21
a32
a61
a126
a21 . This yields
a90a113a109a67a96a97a110a180a8
a17a147a17
a61a31a103
a32a59a62a131a157
a21
a32
a61
a126
a21 . In-
tegrating a90 a109a67a96a97a110 into C changes the empirical knowl-
edge with respect to a61 and a62 . If the empirical
knowledge does not influence a134 , D-EBL is idem-
potent. The deductive closure does not increase as
a17a147a17
a61a106a103
a32a162a39
a21
a32
a61
a126
a21
a87
a89 .
a107
a109a36a96a97a110a181a8a38a134a149a12a1a14a3
a23
a17
a61a106a103
a32a59a62
a103
a21
a32
a61
a126
a21a182a24
a160
a3a59a3a131a134a161a12a1a183a3
a17a147a17
a61a106a103
a32a59a62
a103
a21
a32
a61
a126
a21
a44 a45a121a46 a48
a129
a6a59a6a15a12a1a14a3
a17
a61a31a103
a32a59a62a131a157
a21
a6a59a6
a44 a45a121a46 a48
a138a184a138
a93a162a185
a139
a177a158
a142a184a139
a93
a137a47a142
a6
(16)
Abductive EBL (A-EBL) is shown in (17). A-
EBL acquires empirical knowledge similarly to D-
EBL. In addition, a new a17a147a17 a61 a157a31a32a162a39 a21 a32 a61a31a94 a21 is ac-
quired. This a90a75a109a36a96a97a110 may differ from a90 a101 a102a175a119 with respect
to a61 a157 and/or a61a31a94 . In the experiments in A-EBL we
reported below, a61 a157a114a186a8a49a61 a103 and a61 a94 a8a49a61 a129 holds.
a107
a109a36a96a97a110a181a8a16a151a152a12a1a14a3
a23
a17
a61a106a103
a32a59a62
a103
a21
a32
a61
a129
a21a182a24
a160
a3a59a3a5a151a187a12a1a14a3
a17a147a17
a61a106a103
a32a59a62
a103
a21
a32
a61
a129
a21
a44 a45a121a46 a48
a129
a6
a44 a45a121a46 a48
learning a154
a6a127a12a1a108a3
a17
a61
a157a74a32a59a62a131a157
a21
a44 a45a121a46 a48
a101a92a158
a6a59a6
a44 a45a121a46 a48
a138a184a138
a93 a158
a139
a177a158
a142a184a139
a93a75a188
a142
a6
(17)
2.2.2 Recursive Rule Application
Parsing is a classification task in which a61 a87 a53 is
assigned to a28a84a87a13a86 . Differently from typical classifi-
cation tasks in machine learning, natural language
parsing requires an open set a53 . This is obtained
via the recursive application of a58 , which unlike
non-recursive styles of analysis (Srinivas and Joshi,
1999) yields a53 (syntax trees) of any complexity.
Then a132a72a71a33a81a56a71a47a133a59a71 is applied to a53 so that a132a106a71a33a81a56a71a33a133a92a71 a12a1 a3a83a189a190a6
can be matched by further rules (c.f. 18). With-
out this reduction, recursive parsing could not go be-
yond memory-based parsing.
a60
a104a43a8
a17a147a17a43a17
a61a106a104
a32a59a62
a104
a21
a32
a17
a132a72a71a33a81a56a71a47a133a59a71
a12a1a108a3
a60
a91a191a12a1a14a3
a28
a91a31a6a59a6
a32a59a62
a91
a21a99a21
a32
a17
a61a106a109
a32
a17
a61a106a104
a32a59a60
a91a99a12a1a14a3
a28
a91a106a6
a21a147a21a147a21
(18)
Figure 1: An explanation produced by OCTOPUS. At the top, the final parse obtained via deductive substi-
tutions. Abductive term identification bridges gaps in the deduction (X a192 Y). The marker ’?’ is a graphical
shortcut for the set of lexemes a23a33a62 a24 in a90 .
The function a132a72a71a33a81a56a71a47a133a59a71 defines an induction and re-
cursive parsing is thus a deduction. Combinations of
memory-based and deduction-based parsing are de-
ductions, combinations of abduction-based parsing
with any another parsing are abductions.
Macro Learning is the common term for the com-
bination of EBL with recursive deduction (Tade-
palli, 1991). A macro a60 a104a98a93 a129 a94a59a101 is a rule which
yields the same result as a set of rules a58a194a193 with
a26a195a58a114a193a197a196
a37 and
a60
a104a42a93
a129
a94a92a101a199a198
a87a11a58a194a193 does. In terms of
a grammar, such macros correspond to redundant
phrases, i.e. phrases that are obtained by composing
smaller phrases of a58 . Macros represent shortcuts
for the parser and, possibly, improved likelihood es-
timate of the composed structure compared to the
estimates under independency assumption (Abney,
1996). When the usage of macros excludes certain
types of analysis, e.g. by trying to find longest/best
matches we can speak of pruning. This is the contri-
bution of D-EBL for parsing.
3 Experiments in EBL
3.1 Experimental purpose and setup
The aim of the experiments is to verify whether new
knowledge is acquired in A-EBL and D-EBL. Sec-
ondly, we want to test the influence of new knowl-
edge on parsing accuracy and speed.
The general setup of the experiment is the follow-
ing. We use a section of a treebank as seed-corpus
(a89 a95a82a96a200a96 a119 ). We train the seed-corpus to a corpus-based
parser. Using a test-corpus we establish the parsing
Figure 2: The main parsing algorithm of OCTO-
PUS. The parser interleaves memory-based, deduc-
tive, and abductive parsing strategies in five steps:
Recalling, non-recursive deduction, deduction via
chunk substitution, first with lexemes, then without
lexemes and finally abduction.
a201a190a202a54a203a83a204a194a205a74a206a83a207a68a208a161a209a211a210
# 1 recalling from POS (a) and lexeme (i)
RETURN a212 IF (a212a136a213a27a214 a202a54a203a83a204a194a205a74a206a83a207a68a208a161a209 )
# 2 deduction on the basis of POS (a)
RETURN a212 IF (a212a136a213a216a215 a203a83a204a194a205a78a206a131a217a184a208a161a209 )
# 3 deductive, recursive parsing with POS and lexeme
# Substitutions are defined as in TAGs (Joshi, 2003) IF
(a203a83a204a114a205a33a218a83a219a34a220 a170a121a221 a206a131a207a222a218a56a219a59a220 a170a121a221 a208a127a206a92a204a194a205 a188 a171 a168 a220a78a206a83a207 a188 a171 a168 a220a117a208a161a209 a213
a223a92a224a162a225a200a226
a212a92a227a67a228a67a229a106a230
a202a231a203a83a204a194a205a78a206a56a207a173a208a161a209 ) a210
RETURN a225 a228 a223a59a225a200a226 a207a226 a228 a226 a207a222a232 a229 a202a231a203 # deduction
a201a190a202a54a203a83a204a194a205a29a218a83a219a59a220 a170a113a221 a206a56a207a222a218a56a219a59a220 a170a121a221 a206a59a208a161a209a97a206
a201a190a202a54a203a83a204a194a205
a188
a171 a168 a220a67a206a131a207
a188
a171 a168 a220a29a209a140a208 )
a233
# 4a deductive recursive parsing with lexeme,
# 4b compared to abductive parsing
IF (a203a83a204a114a205 a218a83a219a34a220 a170a121a221 a206a56a207a218a56a219a59a220 a170a121a221 a208a127a206a200a204a194a205 a188 a171 a168 a220 a206a83a207 a188 a171 a168 a220 a208a161a209 a213
a223a92a224a162a225a200a226
a212a92a227a67a228a67a229a106a230
a202a231a203a83a204a194a205a78a206a131a217a136a208a161a209 ) a210
RETURN a205a33a234a162a235a113a236a140a237 a185 a202 (
a238 a202a231a203a83a204a194a205a78a206a83a207a68a208a161a209 , #abduction
a225
a228
a223a59a225a200a226
a207
a226
a228
a226
a207a222a232
a229
a202a231a203 #deduction
a201a190a202a54a203a20a205a33a218a56a219a59a220 a170a113a221 a206a56a207a222a218a83a219a34a220 a170a121a221 a209a97a206
a201a190a202a54a203a20a205
a188
a171 a168 a220a36a206a56a207
a188
a171 a168 a220a100a209 ))
a233
# 5 abduction as robust parsing solution
RETURN a238 a202a231a203a83a204a194a205a78a206a56a207a7a208a161a209 a233
Figure 3: Abductive parsing with k-nn retrieval and
adaptation of retrieved examples.
a238 a202a231a203a83a204a194a205a78a206a83a207a68a208a161a209a69a210
RETURN a205a29a239a47a205a34a240 a226 a205 a226 a207a222a232 a229 a202 a203a20a241a47a207a226a83a224 a234 a223 a207a56a242a205a29a243a244a207a20a235 a229 a202
a203
a230
a242
a229a74a229
a242a234
a224a92a226
a234a162a207
a224
a241a47a205a29a243a74a202a54a203a83a204a114a205a78a206a56a207a68a208a161a209a56a209a56a209
a233
accuracy and speed of the parser (a71a33a70 a61 a81a245a66 a61 a133a59a71 a3 a107 a12a1
a3a131a246a57a105a222a96a200a95a82a105a97a6a59a6a15a8 (recall,precision,f-score,time)). Then, we
parse a large corpus (a107 a12a1 a3a131a246a99a6a187a8 a23 a90 a109a67a96a97a110 a24 ). A
filter criterion that works on the explanation ap-
plies. We train those trees which pass the filter
to the parser (a115 a12a1 a3a140a247 a95a82a96a200a96 a119 a160 a23 a90a113a109a67a96a97a110 a24 a6a181a8 a107 a109a36a96a82a110a149a6 ).
Then the parsing accuracy and speed is tested
against the same training corpus (a71a33a70 a61 a81a5a66 a61 a133a92a71 a3 a107 a109a36a96a82a110a163a12a1
a3a131a246a57a105a222a96a200a95a82a105a97a6a59a6a15a8 (recall,precision,f-score,time)).
Sections of the Chinese Sinica Treebank (Huang
et al., 2000) are used as seed-treebank and gold stan-
dard for parsing evaluation. Seed-corpora range be-
tween 1.000 and 20.000 trees. We train them to
the parser OCTOPUS (Streiter, 2002a). This parser
integrates memory- deduction- and abduction-based
parsing in a hierarchy of preferences, starting from
1 memory-based parsing, 2 non-recursive deductive
parsing, 3 recursive deductive parsing and 5 finally
abductive parsing (Fig. 2).
Learning the seed corpora (a115 a12a1a248a3a83a90 a30a82a249a59a249a59a249 a18a47a18a47a18a35a59a249a113a250a249a59a249a59a249 a6 )
results in a107 a30a82a249a59a249a59a249 a18a47a18a47a18 a107 a35a59a249a113a250a249a59a249a59a249 . For each a107 a87
a23a29a107 a30a82a249a59a249a59a249
a18a47a18a47a18
a107 a35a59a249a113a250a249a59a249a59a249
a24 , a POS tagged corpus
a86 with
a26a174a86
a8a248a37a36a251a78a251
a18
a251a78a251a78a251 is parsed, producing the corpora
a89
a101a162a252a254a253a56a253a56a253
a18a47a18a47a18
a89
a101a59a255a83a253a1a0a253a56a253a56a253 . The corpus used is a subset of
the 5 Million word Sinica Corpus (Huang and Chen,
1992).
For every a28 a87a43a86 the parser produces one parse-
tree a90a88a8 a17 a28a72a32 a61 a21 and an explanation. The expla-
nation has the form of a derivation tree in TAGs, c.f
(Joshi, 2003). The deduction and abduction steps are
visible in the explanation. Filters apply on the ex-
planation and create sub-corpora that belong to one
inference type.
The first filter requires the explanation to contain
only one non-recursive deduction, i.e. only pars-
ing step 2. As deductive parsing is attempted after
memory-based parsing (1), a62 a103 a186a8 a62a131a126 holds.
A second filter extracts those structures, which
are obtained by parsing step 4a or 5 where only
one POS-labels may be different in the last char-
acters (e.g. a155 a12a1 a3 a63 a2 a61 a73a113a63 a6 a8 a63 a2 a61a72a90 a63 ). The re-
sulting corpora are a89 a101a75a252a254a253a56a253a56a253 a168 a171 a168 a220a162a218a4a3 a18a47a18a47a18 a89 a101a59a255a83a253a1a0a253a56a253a56a253 a168 a171 a168 a220a34a218a5a3 and
a89
a101a162a252a254a253a56a253a56a253
a237a7a6 a168 a220a34a218a5a3
a18a47a18a47a18
a89
a101a59a255a83a253a1a0a253a56a253a56a253
a237a7a6 a168 a220a34a218a5a3 .
3.2 The Acquired Knowledge
We want to know whether or not new knowledge has
been acquired and what the nature of this acquired
knowledge is. As parsing was not recursive, we can
approach the closure by the types of POS-sequences
from all trees and their subtrees in a corpus. We con-
trast this with to the types of lexeme-sequences. The
data show that only A-EBL increases the closure.
But even when looking at lexemes, i.e. empirical
knowledge, the A-EBL acquires richer information
than D-EBL does.
Figure 4: The number of types of POS-sequences as
approximation of the closure with a247 a95a82a96a200a96 a119 , A-EBL and
D-EBL. Below the number of type of LEXEME-
sequences.
0a8 5000 10000a8 15000a8 20000a8 25000a8 30000a8
size of seed corpusa9
0
10000
20000
30000
40000
closure: number of POS−sequences
closure with C_seed
closure with C_seed + C_A
closure with C_seed + C_D
0a10 5000 10000a10 15000a10 20000a10 25000a10 30000a10 35000a10
size of seed corpusa11
0
10000
20000
30000
40000
50000
60000
70000
number of LEXEME−sequences
C_seed
C_seed + C_A
C_seed + C_D
The representatives of the cached parses is gauged
by the percentage of top NPs and VPs (including Ss)
as top-nodes. Fig 5 shows the bias of cached parses
which is more pronounced with D-EBL than with
A-EBL.
Figure 5: The proportion of top-NPs and top-VP(S)
in abduced and deduced corpora.
0a12 5000 10000a12 15000a12 20000a12size of seed corpus
a13
0.00
20.00
40.00
60.00
80.00
% top−NP in C_D% top−NP in C_A
% top−VP in C_D% top−VP in C_A
% top−NP standard% top−VP standard
3.3 Evaluating Parsing
The experiments consist in evaluating
the parsing accuracy and speed for each
a89
a95a82a96a200a96
a119
a160a194a89
a101a75a252a254a253a56a253a56a253
a168 a171 a168 a220a162a218a4a3
a18a47a18a47a18
a89
a95a97a96a97a96
a119
a160a114a89
a101a59a255a83a253a1a0a253a56a253a56a253
a237a7a6 a168 a220a34a218a5a3 .
Figure 6: The parsing accuracy with abductive EBL
(a247 a95a97a96a97a96 a119a123a41 a247a15a14 ) and deductive EBL (a247 a95a97a96a97a96 a119a123a41 a247a17a16 ).
0a18 5000a18 10000a18 15000a18 20000a18 25000a18
size of seed corpusa19
0.68
0.69
0.70
0.71
0.72
0.73
0.74
coverage (f−score)
parsing accuracy with C_seeda20
parsing accuracy with C_seed + C_Aa20
parsing accuracy with C_seed + C_Da20
We test the parsing accuracy on 300 untrained and
randomly selected sentences using the f-score on un-
labeled dependency relations. Fig. 6 shows parsing
accuracy depending on the size of the seed-corpus.
The graphs show side branches where we introduce
the EBL-derived training material. This allows com-
paring the effect of A-EBL, D-EBL and hand-coded
trees (the baseline). Fig. 7 shows the parsing speed
in words per second (Processor:1000 MHz, Mem-
ory:128 MB) for the same experiments. Rising lines
indicate a speed-up in parsing. We have interpolated
and smoothed the curves.
Figure 7: The parsing time with A-EBL (a247 a95a97a96a200a96 a119a190a41
a247 a14 ) and D-EBL (a247 a95a82a96a200a96
a119a123a41
a247 a16 ).
0a21 5000a21 10000a21 15000a21 20000a21 25000a21
size of seed corpusa22
44.00
46.00
48.00
50.00
52.00
54.00
words per second
a23
parsing speed with C_seeda24
parsing speed with C_seed + C_Aa24
parsing speed with C_seed + C_Da24
The experimental results confirm the drop in pars-
ing accuracy with D-EBL. This fact is consistent
across all experiments. With A-EBL, the parsing ac-
curacy increases beyond the level of departure.
The data also show a speed-up in parsing. This
speed-up is more pronounced and less data-hungry
with A-EBL. Improving accuracy and efficiency are
thus not mutually exclusive, at least for A-EBL.
4 Conclusions
Explanation-based Learning has been used to speed-
up natural language parsing. We show that the
loss in accuracy results from the deductive basis of
parsers, not the EBL framework. D-EBL does not
extend the deductive closure and acquires only em-
pirical (disambiguation) knowledge. The accuracy
declines due to cached errors, the statistical bias the
filters introduce and the usage of shortcuts with lim-
ited contextual information.
Alternatively, if the parser uses abduction, the de-
ductive closure of the parser enlarges. This makes
accuracy improvements possible - not a logical con-
sequence. In practice, the extended deductive clo-
sure compensates for negative factors such as wrong
parses or unbalanced distributions in the cache.
On a more abstract level, the paper treats the prob-
lem of automatic knowledge acquisition for Chi-
nese NLP. Theory and practice show that abduction-
based NLP applications acquire new knowledge and
increase accuracy and speed. Future research will
maximize the gains.

References
Steven Abney. 1996. Partial Parsing via Finite-State Cas-
cades. In Proceedings of the ESSLLI ’96 Robust Pars-
ing Workshop.
Rens Bod and Ronald M. Kaplan. 1998. A probabilistic
corpus-driven model for lexical-functional analysis. In
COLING-ACL’98.
Michael Carl and Philippe Langlais. 2003. Tuning gen-
eral translation knowledge to a sublanguage. In Pro-
ceedings of CLAW 2003, Dublin, Ireland, May, 15-17.
Hsing-Wu Chang. 1994. Word segmentation and sen-
tence parsing in reading Chinese. In Advances in the
Study of Chinese Language Processing, National Tai-
wan University, Taipei.
Keh-Jiann Chen. 1996. A model for robust Chinese
parser. Computational Linguistics and Chinese Lan-
guage, 1(1):183–204.
Yanis Dimopoulos and Antonis Kakas. 1996. Abduc-
tion and inductive learning. In L. De Taedt, editor, Ad-
vances in Inductive Logic Programming, pages 144–
171. IOS Press.
Chu-Ren Huang and Keh-Jiann Chen. 1992. A Chi-
nese corpus for linguistics research. In COLING’92,
Nantes, France.
Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, Zhao-
ming Gao, and Kuang-Yu Chen. 2000. Sinica tree-
bank: Design criteria, annotation guidelines and on-
line interface. In M. Palmer, M. Marcus, A. K. Joshi,
and F. Xia, editors, Proceedings of the Second Chinese
Language Processing Workshop, Hong Kong, October.
ACL.
Aravind K. Joshi. 2003. Tree-adjoining grammars. In
R. Mitkov, editor, The Oxford Handbook of Computa-
tional Linguistics. Oxford University Press, Oxford.
Yves Lepage. 1999. Open set experiments with direct
analysis by analogy. In Proceedings NLPRS’99 (Nat-
ural Language Processing Pacific Rim Symposium),
pages 363–368, Beijing.
Gary F. Marcus, Steven Pinker, Michael Ullman,
Michelle Hollander, John T. Rosen, and Fei Xu. 1992.
Overregularization in Language Learning. Mono-
graphs of the Society for Research in Child Develop-
ment, 57 (No. 4, Serial No. 228).
Steven Minton. 1990. Quantitative results concerning
the utility problem of explanation-based learning. Ar-
tificial Intelligence, 42:363–393.
Tom S. Mitchel, R. Keller, and S. Kedar-Cabelli. 1986.
Explanation-based generalization: A unifying view.
Machine Learning, 1(1).
G¨unter Neumann. 1994. Application of explanation-
based learning for efficient processing of constraint-
based grammars. In The 10th Conference on Artificial
Intelligence for Applications, San Antonio, Texas.
Manny Rayner and Christer Samuelsson. 1994. Corpus-
based grammar specification for fast analysis. In
Spoken Language Translator: First Year Report, SRI
Technical Report CRC-043, pg. 41-54.
Khalil Sima’an. 1997. Explanation-based leaning
of partial-parsers. In W. Daelemans, A. van den
Bosch, and A. Weijters, editors, Workshop Notes of
the ECML/ML Workshop on Empirical Learning of
Natural Language Processing Tasks, pages 137–146,
Prague, Czech Republic, April.
Bangalore Srinivas and Aravind K. Joshi. 1995. Some
novel applications of explanation-based learning to
parsing lexicalized tree-adjoining grammars. In 33th
Annual Meeting of the ACL, Cambridge, MA.
Bangalore Srinivas and Aravind K. Joshi. 1999. Su-
pertagging: An approach to almost parsing. Compu-
tational Linguistics, 25(2):237–265.
Oliver Streiter, Judith Knapp, and Leonhard Voltmer.
2003. Gymn@zilla: A browser-like repository for
open learning resources. In ED-Media, World Con-
ference on Educational Multimedia, Hypermedia &
Telecommunications, Honolulu, Hawaii, June, 23-28.
Oliver Streiter. 2002a. Abduction, induction and memo-
rizing in corpus-based parsing. In ESSLLI-2002 Work-
shop on ”Machine Learning Approaches in Computa-
tional Linguistics”, pages 73–90, Trento, Italy, August
5-9.
Oliver Streiter. 2002b. Treebank development with de-
ductive and abductive explanation-based learning: Ex-
ploratory experiments. In Workshop on Treebanks and
Linguistic Theories 2002, Sozopol, Bulgaria, Septem-
ber 20-21.
Prasad Tadepalli. 1991. A formalization of explanation-
based macro-operator learning. In IJCAI, Proceedings
of the International Joint Conference of Artificial In-
telligence, pages 616–622, Sydney, Australia. Morgan
Kaufmann.
Chunfa Yuang, Changming Huang, and Shimei Pan.
1992. Knowledge acquisition and Chinese parsing
based on corpus. In COLING’92.
Jakub Zavrel and Walter Daelemans. 1997. Memory-
based learning: Using similarity for smoothing. In
W. Daelemans, A. van den Bosch, and A. Weijters, ed-
itors, Workshop Notes of the ECML/ML Workshop on
Empirical Learning of Natural Language Processing
Tasks, pages 71–84, Prague, Czech Republic, April.
