Adapting a Semantic Question Answering System to the Web
Sven Hartrumpf
Intelligent Information and Communication Systems (IICS)
University of Hagen (FernUniversit¨at in Hagen)
58084 Hagen, Germany
Sven.Hartrumpf@fernuni-hagen.de
Abstract
This paper describes how a question an-
swering (QA) system developed for small-
sized document collections of several mil-
lion sentences was modified in order to
work with a monolingual subset of the
web. The basic QA system relies on com-
plete sentence parsing, inferences, and se-
mantic representation matching. The ex-
tensions and modifications needed for use-
ful and quick answers from web docu-
ments are discussed. The main extension
is a two-level approach that first accesses a
web search engine and downloads some of
its document hits and then works similar
to the basic QA system. Most modifica-
tions are restrictions like a maximal num-
ber of documents and a maximal length
of investigated document parts; they en-
sure acceptable answer times. The result-
ing web QA system is evaluated on the
German test collection from QA@CLEF
2004. Several parameter settings and
strategies for accessing the web search en-
gineareinvestigated. Themainresultsare:
precision-oriented extensions and exper-
imentally derived parameter settings are
needed to achieve similar performance on
the web as on small-sized document col-
lectionsthatshowhigherhomogeneityand
quality of the contained texts; adapting a
semanticQAsystemtothewebisfeasible,
but answering a question is still expensive
in terms of bandwidth and CPU time.
1 Introduction
There are question answering (QA) systems in-
tended for small-sized document collections (tex-
tual QA systems) and QA systems aiming at
the web (web(-based) QA systems). In this pa-
per, a system of the former type (InSicht, see
(Hartrumpf, 2005b)) is transformed into one of
the latter type (called InSicht-W3 for short). In
Sect. 2, the textual QA system for German is pre-
sented. The extensions and modifications required
to get it working with the web as a virtual doc-
ument collection are described and discussed in
Sect. 3. The resulting system is evaluated on a
well-known test collection and compared to the
basic QA system (Sect. 4). After the conclusion,
some directions for further research are indicated.
2 The Basic QA System
The semantic1 QA system that was turned into a
web QA system is InSicht. It relies on complete
sentence parsing, inferences, and semantic repre-
sentation matching and comprises six main steps.
In the document processing step, all docu-
ments from a given collection are transformed into
a standard XML format (CES, corpus encoding
standard, see http://www.cs.vassar.edu/CES/) with
word, sentence, and paragraph borders marked
up by XML elements w, s, and p, respectively.
Then, all preprocessed documents are parsed by
WOCADI (Hartrumpf, 2003) yielding a syntactic
dependency structure and, more importantly, a se-
mantic representation, a semantic network of the
MultiNet formalism (Helbig, 2006) for each doc-
ument sentence. The parser can produce intersen-
tential coreference links for a document.
In the second step (query processing), the user’s
question is parsed by WOCADI. Determining the
sentence type (here, often a subtype of question)
is especially important because it controls some
parts of two later steps: query expansion and an-
swer generation.
Next comes query expansion: Equivalent and
similar semantic networks are derived from the
original query network by means of lexico-
1Semantic in the sense that formal semantic representa-
tions of documents and questions are automatically produced
by a parser and form the system center.
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
61
semantic relations from HaGenLex (Hagen Ger-
man Lexicon, (Hartrumpf et al., 2003)) and a lex-
ical database (GermaNet), equivalence rules, and
inferential rules like entailments for situations (ap-
plied in backward chaining). The result is a set
of disjunctively connected semantic networks that
trytocovermanypossiblekindsofrepresentations
of sentences possibly containing an explicit or im-
plicit answer to the user’s question.
In the fourth step (semantic network matching),
all document sentences matching at least one of
the semantic networks from query expansion are
collected. A two-level approach is chosen for effi-
ciencyreasons. First, anindexof concepts(disam-
biguated words with IDs from the lexicon) is con-
sulted with the relevant concepts from the query
networks. Second, the retrieved documents are
compared sentence network by sentence network
to find a match with a query network.
Answer generation is next: Natural language
(NL) generation rules are applied to semantic net-
works that match a query network in order to gen-
erate an NL answer string from the deep seman-
tic representations. The sentence type and the se-
mantic network control the selection of generation
rules. The rules also act as a filter for uninforma-
tive or bad answers. The results are tuples of gen-
erated answer string, numerical score, supporting
document ID, and supporting sentence ID.
To deal with different answer candidates, an
answer selection step is required. It realizes a
quite simple but successful strategy that combines
a preference for more frequent answers and a pref-
erence for more elaborate answers. The best an-
swers (by default only the single best answer) and
the supporting sentences are presented to the user
that posed the question.
The first step, document processing, is run off-
line in InSicht; it is run online in InSicht-W3 to
avoid unacceptable parsing costs before the sys-
tem can be used. The remaining five steps will be
leftmostlyunchangedforInSicht-W3,inpartsjust
differently parameterized.
3 QA System Extensions for the Web
3.1 A Naive Approach to Web-based QA
The simplest approach to turning InSicht into a
web QA system would be to collect German web
pages and work with the resulting document col-
lection as described in Sect. 2. However, a deep
semantic analyzer will need several years to parse
all web pages in German (even if excluding the
pages from the much larger deep web). The fol-
lowing formula provides a rough estimate2 of the
CPU years needed:
t = #documents·#sent per document[sent]parser speed[sent/h]
= 500,000,000·320[sent]4000[sent/h]
= 40,000,000h ≈ 4,566a
This long time indicates that the naive approach
is currently not an option. Therefore a multi-level
approach was investigated. In this paper, only a
two-level approach is discussed, which has been
tried in shallow QA systems.
3.2 A Two-Level Approach to Web-based QA
As in other applications, a web search engine can
be used (as a first level) to preselect possibly rel-
evant documents for the task at hand (here, an-
swering a question posed by a user). In InSicht-
W3, the web is accessed when similar and equiva-
lent semantic networks have been generated dur-
ing query expansion. Clearly, one cannot di-
rectly use this semantic representation for retriev-
ing document URLs from any service out there on
the web—at least till the arrival of a web anno-
tated with formal NL semantics. One must trans-
form the semantic networks to an adequate level;
in many web search engines, this is the level of
search terms connected in a Boolean formula us-
ing and and or.
To derive a search engine query from a seman-
tic network, all lexical concepts from the seman-
tic network are collected. For example, question
qa04 068 from QA@CLEF 2004 in example (1)
leads to the semantic network depicted in Fig. 1
(and related semantic networks; for a QA@CLEF
2004 question, 4.8 additional semantic networks
on average).
(1) Wo
Where
sitzt
sits
Hugo
Hugo
Lacour
Lacour
hinter
behind
Gittern?
bars?
‘Where is Hugo Lacour imprisoned?’
The lexical concepts are roughly speaking those
concepts (represented as nodes in graphical form)
whose names are not of the form c1, c2, etc. The
lexical concepts are named after the lemma of the
2The average number of sentences per document has been
determined from a sample of 23,800 preprocessed web docu-
ments in InSicht-W3.
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
62
Figure1: SemanticnetworkforCLEFquestionqa04 068: Wo sitzt Hugo Lacour hinter Gittern? (‘Where
is Hugo Lacour imprisoned?’). Some edges are shown folded below the node name.
lacour.0fe
c28na
SUB nachname.1.1

GENER sp
QUANT one
CARD 1
ETYPE 0

VAL cs
d111d111
c37?wh-questionl
SUB underspecified-location.0
[GENER sp]
c27na
SUB vorname.1.1

GENER sp
QUANT one
CARD 1
ETYPE 0


VAL
c
sd15d15
c26d
SUB mensch.1.1




FACT real
GENER sp
QUANT one
REFER det
CARD 1
ETYPE 0
VARIA con





ATTR cc
d111d111
ATTRcc
d79d79
c2st
SUBS sitzen.1.1
TEMP present.0
[GENER sp]
LOC
s
sd15d15
LOC
s
s d79d79
SCAR cs
d111d111
hugo.0fe
c32d
PRED gitter.1.1

FACT real
QUANT mult
REFER indet
ETYPE 1


c36l

FACT real
QUANT mult
REFER indet
ETYPE 1

*HINTER csd111d111
corresponding word plus a numerical homograph
identifier and a numerical reading identifier.
As current web search engines provide no (or
only rough) lemmatization for German, one must
go one step further away from the semantic net-
work by generating full forms for each word be-
longing to a lexical concept. (Others have tried
a similar approach; for example, Neumann and
Sacaleanu (2005) construct IR queries which can
be easily adapted to different IR engines.) In the
example, the node sitzen.1.1 (to sit) leads to 26
different full forms (shown below in example (2)).
These forms can be used as search terms; as alter-
natives, they are connected disjunctively. Thus for
each semantic network from query expansion, a
conjunction of disjunctions of word forms is con-
structed. Word forms can belong to words not
occurring in the question because query expan-
sion can introduce new concepts (and thereby new
words), e.g. by inferential rules. The Boolean
formulae for several semantic networks are con-
nected disjunctively and the resulting formula is
simplified by applying logical equivalences. What
one can pass to a web search engine as a prese-
lection query for question (1) is shown in prefix
notation in example (2):
(2)(or
(and
(or Gitter Gittern Gitters)
(or Hugo Hugos)
(or Lacour Lacours)
(or gesessen gesessene gesessenem gesesse-
nen gesessener gesessenes saß saßen saßest
saßet saßt sitze sitzen sitzend sitzende sitzen-
dem sitzenden sitzender sitzendes sitzest sitzet
sitzt s¨aße s¨aßen s¨aßest s¨aßet))
...; query derived from second query network
)
IfasearchenginedoesnotofferBooleanoperators
(or limits Boolean queries to some hundred char-
acters), one can transform the Boolean query into
DNF and issue for each disjunct a search engine
query that requires all its terms (for example, by
prepending a plus sign to every term).
There are several problems related to break-
ing down a semantic network representation into
a Boolean formula over word forms. For example,
lexical concepts that do not stem from a surface
form in the question should be excluded from the
query. An example is the concept nachname.1.1
(‘last name’) which the named entity subparser of
WOCADI installs in the semantic network when
constructing the semantic representation of a hu-
man being whose last name is specified in the text
(see Fig. 1).
The generation of a preselection query de-
scribed above indicates that there is a fundamental
level mismatch between the first level (web search
engine) and the second level (an NL understand-
ing system like the QA system InSicht): words
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
63
(plus proximity information3) vs. concepts within
(possibly normalized) semantic representations of
sentences (or texts). This level mismatch deteri-
orates the precision of the first level and thereby
the recall of the second level. The latter might be
surprising. The precision of the first level (viewed
in isolation) is not problematic because the second
level achieves the same precision no matter how
low the precision of the first level is. But precision
does matter because the limited number of docu-
ments that the first level can return and that the
second level can efficiently process might lead to
not seeing relevant documents (i.e. lower recall) or
at least to seeing them much later (which can im-
ply unacceptable answer times). Or to put it differ-
ently: Very low precision in the first level can lead
to lower recall of the whole multi-level system if
there are any retrieval set limits for the first level.
At the end of the first level, the retrieved web
documents are simplified by a script that uses the
text dump function of the lynx web browser to ex-
tract the textual content. The resulting document
representations can be processed by InSicht.
3.3 Deficiencies of Web Search Engines
Web search engines typically restrict the length
of queries, e.g. to 1000 bytes, 1500 bytes, or 10
terms. But the rich morphology of languages like
German can lead to long and complex preselection
queries so that one needs to reduce preselection
queries as far as possible (e.g. by dropping rare
word forms). An additional strategy is to split the
query into several independent subqueries.
The QA system is currently based on match-
ing semantic representations with the granularity
of a sentence only. Thus one would achieve much
better precision if the web search engine allowed
a same-sentence operator or at least a proximity
operator (often named NEAR) of say 50 words
which ignores linguistic units like sentences and
paragraphs but suffices for our goals. Although
some web search engines once had some kind of
proximity operator, currently none of the search
engines with large indices and Boolean queries
seems to offer this facility.
3.4 Pragmatic Limits for Web-based QA
Pragmatic limits are needed to ensure acceptable
answer times when dealing with the vast number
3Atleastsomeyearsagoandhopefullyinthefutureagain,
see Sect. 3.3.
ofwebdocuments. Thefollowingconstrainingpa-
rameters are implemented in InSicht-W3:
number of documents d (BT-R) 4 The number
of documents retrieved from a search engine
is limited by InSicht-W3. A separate query
is sent to the search engine for each top-level
disjunct, which corresponds to one semantic
network. For each of the q subqueries, d/q
documents are retrieved, at most.
Many search engines employ a maximal
number of hits h. (A popular choice is cur-
rently 1000.) Due to other filtering con-
straints (like snippet testing as defined be-
low), the limit h can inhibit the performance
of InSicht-W3 even if h is greater than d.
document format (BT-R) Only documents en-
coded as HTML or plain text are covered.
Other formats are omitted by adding a format
constraint to the preselection query, by inves-
tigating the URL, and by interpreting the re-
sult of the UNIX program file.
document language (BPT-R) Only documents
that are mainly written in German are treated.
A language identification module is available
in InSicht-W3, but the selection of document
language in the web search engine was
precise enough.
document length l (T-R) If a document is longer
than l bytes it is shortened to the first l bytes
(current value: 1,000,000 bytes).
document URL (BPT-R) The URL of a docu-
ment must belong to certain top-level do-
mains (.at, .ch, .de, .info, .net, .org) and must
not be on a blacklist. Both conditions aim at
higher quality pages. The blacklist is short
and should be replaced by one that is collab-
oratively maintained on the web.
snippet testing (BPT-R) Many search engines
return a text snippet for each search hit that
typically contains all word forms from the
search engine query. Sometimes the snip-
pet does not fulfill the preselection query. A
stronger constraint is that a snippet passage
4Trade-offs (and goals) are indicated in parentheses, be-
tween bandwidth (B), precision (P), (answer) time (T), on the
one hand, and recall (R), on the other hand.
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
64
withoutellipsis(...) mustfulfillthepreselec-
tion query. This realizes an imperfect same-
sentence or similar proximity operator.
word form selection (BT-R) To shorten prese-
lection queries for highly inflecting word
categories (e.g. German verbs), InSicht-W3
omits full forms that are less likely to be con-
tained in answering document sentences (e.g.
verb forms that can only be imperative or first
and second person). For adjectives, only the
forms matching the degree (positive, compar-
ative, and superlative in German) used in the
question are included. The most restrictive
parameter setting is to choose only one word
formpercontentword, e.g.thewordformoc-
curring in the question.
These constraints can be checked before the sec-
ond level starts.
The following parameters are applicable on the
second level:
number of parsed sentences s (T-R) For each
document, at most s sentences are parsed
(current value: 100).
sentence language (PT-R) As many web pages
contain passages in several languages, it is
sometimes important to discard sentences
that seem to be in a different language than
the document itself. If the percentage of un-
known words in a sentences reaches a thresh-
old, the sentence is considered to be written
in a foreign language
sentence selection (T) Only a sentence whose set
of word forms fulfills the preselection query
sent to the web search engine is parsed by
WOCADI. This realizes a same-sentence op-
erator on the second level. Of course, this
operator would be more helpful in the web
search engine (see Sect. 3.3).
3.5 Parameter Settings of the QA System
In general, the QA system must be parameterized
differently when going from a high quality, low-
redundancy, small-sized collection (like archives
of some newspapers) to a mixed quality, high-
redundancy, large-sized collection (like a consid-
erable subset of the web). For example, the varia-
tion of concepts (like synonyms, hyponyms, and
hyperonyms) is less important on the web than
in smaller collections due to its large redundancy
and increased chance to find a formulation whose
semantic representation contains exactly the con-
cepts from the question. This is helpful because
full concept variations lead for some questions to
very long preselection queries.
3.6 Language-specific Problems
InSicht-W3 has to deal with several problems in
the German language and German web pages,
which might be less problematic for English or
some other languages. On the one hand, the rich
morphology requires to generate word forms for
content words derived from query networks; on
the other hand, to avoid very long preselection
queries the generation must be limited (e.g. with
the word form selection parameter).
The morphological phenomenon of separable
prefix verbs in German requires special treatment.
Certain word forms of such verbs must appear as
one orthographic word or two orthographic words
(often separated by many intervening words). The
choice depends on the word order of the sentence
or clause that contains the verb. InSicht-W3’s pre-
selection queries contain both variants, e.g. for the
word form absitzt the following formula is gener-
ated: (or absitzt (and ab sitzt)).
In German web pages, two orthography sys-
tems can be found, the old one and the new one
introduced in 1998. As both systems are often
mixed in pages (and question sets), the NL pars-
ing must be flexible and exploit the links between
old spellings, new spellings, and spelling vari-
ants. The parser normalizes all words and concept
names to the old orthography. For preselection
queries, word forms for all orthography systems
must be generated.
Not only the orthography system might change
in a web page, also the language itself might
switch (often without correct markup), e.g. be-
tween English and German. Therefore the sen-
tence language parameter from Sect. 3.4 was in-
troduced. Such effects also occur for other lan-
guages.
German texts contain important characters that
are not covered by the ASCII character set. (The
mostfrequentonesare ¨a, ¨o, ¨u, ¨A, ¨O, ¨U,andß.) Un-
fortunately, different character encodings are used
for German web pages, sometimes with a mis-
match of declared (or implied) encoding and ac-
tually used encoding. This problem has been suc-
cessfully delegated to the lynx web browser (see
end of Sect. 3.2).
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
65
Table 1: Evaluation results for the German questions from QA@CLEF 2004. InSicht used all 200
questions, InSicht-W3 only 170 questions (see text).
setup answers (%) K1
system document collection non-empty empty
right inexact wrong right wrong
InSicht QA@CLEF document collection 30.5 2.5 0.0 10.0 57.0 0.28
InSicht-W3 web (as virtual document collection) 22.9 2.9 2.4 0.0 71.8 0.18
4 Evaluation of the Web QA System
The resulting web QA system is evaluated on an
established set of test questions: QA@CLEF 2004
(Magnini et al., 2005). Some questions in this
set have an important implicit temporal context.
For example, the correct answer to questions like
qa04 090 in example (3) critically depends on the
time this present tense question is uttered.
(3) Wer
Who
ist
is
Pr¨asident
president
von
of
UNICE?
UNICE?
In QA@CLEF, a supported correct answer to this
question can be the name of a UNICE presi-
dent only from a certain time period because
the QA@CLEF document collection consists of
newspaper and newswire articles from 1994 and
1995 (and because there are no documents about
the history of UNICE). On the web, there are ad-
ditional correct answers.5 Questions with implicit
time restriction (30 cases) are excluded from the
evaluation so that 170 questions remain for eval-
uation in InSicht-W3. Alternatives would be to
refine these questions by making the temporal re-
striction explicit or to extend the gold standard by
answers that are to be considered correct if work-
ing on the web.
Table 1 contains evaluation results for InSicht-
W3: the percentages of right, inexact, and wrong
answers (separately for non-empty answers and
empty answers) and the K1-measure (see (Her-
rera et al., 2005) for a definition). For compar-
ison, the results of the textual QA system InSicht
on the QA@CLEF document collection are shown
inthefirstrow. Thepercentagesfornon-emptyan-
swers differ for right answers and wrong answers.
In both aspects, InSicht-W3 is worse than InSicht.
Themainreasonforthesechangesisthatthestruc-
ture and textual content of documents on the web
5Or just one correct answer, which differs from the one
in QA@CLEF, when one interprets the present tense in the
question as referring to today.
aremuchmorediversethanintheQA@CLEFcol-
lection. For example, InSicht-W3 regarded some
sequences of words from unrelated link anchor
texts as a sentence, which led to some wrong an-
swers especially for definition questions. Also
for empty answers, InSicht-W3 performs worse
than InSicht. This is in part due to the too opti-
mistic assumption during answer assessment that
all German questions from QA@CLEF 2004 have
a correct answer on the web (therefore the col-
umn labelled right empty answers contains 0.0
for InSicht-W3). However, InSicht-W3 found 12
right answers that InSicht missed. So there is a
potential for a fruitful system combination.
The impact of some interesting parameters was
evaluated by varying parameter settings (Table 2).
The query network quality qmin (column 2) is a
value between 0 (worst) and 1 (best) that mea-
sures how far away a derived query network is
from the original semantic network of the ques-
tion. For example, omitting information from the
semantic network for the question (like the first
name of a person if also the last name is specified),
leads to a reduction of the initial quality value of
1. The value in column 2 indicates at what thresh-
old variant query networks are ignored. Column
3 corresponds to the parameter of word form se-
lection (see Sect. 3.4). The two runs with d = 300
and qmin = 0.8 show that morphological genera-
tion pays off, but the effect is smaller than in other
document collections. This is probably due to the
fact that the large and redundant web often con-
tains answers with the exact word forms from the
question. Another interesting aspect found in Ta-
ble 2 is that even with d = 500 results still im-
prove; it remains to be seen at what number of
documentsperformancestaysstable(ormaybede-
grades). The impact of a lower quality threshold
qmin was not significant. This might change if one
operates with a larger parameter d and more infer-
ential rules.
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
66
Table 2: Influence of parameters on InSicht-W3 results. Parameter d is the maximal number of doc-
uments used from the search engine results (see Sect. 3.4); qmin is the minimal query network quality.
parameter setting results
d qmin morph.
generat.
#docs.
from first
level
per quest.
#sent.
matching
pre. query
per quest.
%docs.
with
matching
sent.
right non-
empty
answ. (%)
inexact
answ. (%)
wrong
non-
empty
answ. (%)
K1
100 0.9 frequent 18.1 30.5 59.7 18.8 4.1 1.8 0.13
200 0.9 frequent 34.4 59.0 60.4 20.6 3.5 2.9 0.14
300 0.9 frequent 45.5 76.9 60.2 22.4 3.5 2.4 0.16
300 0.8 frequent 48.4 84.2 61.7 22.4 3.5 2.4 0.16
300 0.8 none 57.3 91.3 61.5 19.4 3.5 3.5 0.12
500 0.8 frequent 68.9 119.8 62.2 22.9 2.9 2.4 0.18
Error analysis and classification is difficult as
soon as one steps to the vast web. One could start
with a classification of error reasons for wrong
empty answers. The error class that can be most
easily determined is that the search engine re-
turned no results for the preselection query. 40
of the 170 questions (23.5%) belong to this class.
This might surprise users that believe that they
can find nearly every bit of information on the
web. But there are areas where this assumption is
wrong: many QA@CLEF questions relate to very
specific events of the year 1994 or 1995. Twelve
yearsago, thewebpublishingratewasmuchlower
than today; and even if the relevant pieces of in-
formation were on the web at that time (or in the
following years), they might have been moved to
archives or removed in recent years. Other er-
ror reasons are very difficult and labor-intensive
to separate:
1. Result pages from the first level do not con-
tainananswer, butifonerequestsmoredocu-
ments (e.g. by raising the parameter d) result
pages with an answer will be found.
2. Result pages from the first level contain an
answer, but one or more components of the
textual QA system cause that the answer
is not found. Subclasses could be defined
by adapting the hierarchy that Hartrumpf
(2005a) applied to evaluate InSicht.
On average, the 40th search engine result is
the first that contains a right answer (in the best
InSicht-W3 run).6 Although this result is for the
system and not for a real user (they differ in their
strengths and weaknesses in NL understanding), it
indicates that human users of web search engines
might find the correct answer quite late if at all.
This is a strong argument for more advanced, sec-
ond level tools like a semantic QA system: How
many users will read through 40 search engine re-
sults and (possibly) 40 underlying web pages?
The answer time of InSicht-W3 currently
ranges between 15 and 1200 seconds. The doc-
ument download time was excluded because it de-
pended on too many external factors (caches, in-
tranet effects, parallel vs. sequential downloads)
to be measured consistently and reliably.
5 Related Work
There are few QA systems for German. The sys-
tem described by Neumann and Xu (2003) works
on German web pages. Its general approach dif-
fers from InSicht because it relies on shallow, but
robust methods, while InSicht builds on sentence
parsing and semantic representations. In this re-
spect, InSicht resembles the (English) textual QA
system presented by Harabagiu et al. (2001). In
contrast to InSicht, this system applies a theorem
prover and a large knowledge base to validate can-
didate answers. An interesting combination of
web and textual QA is presented by Vicedo et al.
(2005): English and Spanish web documents are
used to enhance textual QA in Spanish.
6This high number is in part an artifact of the preselection
query generation. In a more thorough analysis, human users
could be given the NL question and be asked to formulate a
corresponding search engine query.
EACL 2006 Workshop on Multilingual Question Answering - MLQA06
67
One of the first web QA systems for English
was Mulder (Kwok et al., 2001). Mulder parses
only the text snippets returned by the search en-
gine, while InSicht-W3 parses the underlying web
pages because the text snippets often have omis-
sions (‘...’) so that full parsing becomes problem-
atic or impossible. InSicht-W3’s approach needs
more time, especially if the web pages are not in
the local cache that InSicht-W3 maintains in order
to reduce its bandwidth requirements.
6 Conclusion and Perspectives
In this paper, an existent textual QA system was
extended and modified to work successfully on
the German web as a virtual document collection.
The main results are: precision-oriented exten-
sions and experimentally derived parameter set-
tings are needed to achieve similar performance
on the vast web as on small-sized document col-
lections that show higher homogeneity and quality
of the contained texts; taking a semantic QA sys-
tem to the web is feasible as demonstrated in this
paper, but answering a question is still expensive
in terms of bandwidth and CPU time.
There are several interesting directions for fu-
turework. ThefirstlevelofInSicht-W3canbeim-
proved by finding a better suited search engine or
by building and running a new one in a distributed
manner. Ideally, it should support arbitrarily com-
plex Boolean queries, parameterized proximity
operators like NEAR/N (N ∈ {1,2,...,100}), or
even linguistically informed operators like same-
sentence and same-paragraph.
The second level (the textual QA system) can
be improved by acquiring more inferential knowl-
edge to allow better query expansion. The match-
ing approach can be extended from the unit sen-
tencetoalargerlinguisticunitlikeparagraph, text,
and even text collection. Distributed architectures
and algorithms can reduce answer times.

References
Sanda Harabagiu, Dan Moldovan, Marius Pas¸ca, Rada
Mihalcea, Mihai Surdeanu, R˘azvan Bunescu, Rox-
ana Gˆırju, Vasile Rus, and Paul Mor˘arescu. 2001.
The role of lexico-semantic feedback in open-
domain textual question-answering. In Proceedings
of the 39th Annual Meeting of the Association for
Computational Linguistics (ACL-2001), pages 274–
281, Toulouse, France.
Sven Hartrumpf, Hermann Helbig, and Rainer Oss-
wald. 2003. The semantically based computer lex-
icon HaGenLex – Structure and technological en-
vironment. Traitement automatique des langues,
44(2):81–105.
Sven Hartrumpf. 2003. Hybrid Disambiguation in
NaturalLanguageAnalysis. DerAndereVerlag,Os-
nabr¨uck, Germany.
Sven Hartrumpf. 2005a. Question answering using
sentenceparsingandsemanticnetworkmatching. In
(Peters et al., 2005), pages 512–521.
Sven Hartrumpf. 2005b. University of Hagen
at QA@CLEF 2005: Extending knowledge and
deepening linguistic processing for question an-
swering. In Carol Peters, editor, Results of the
CLEF 2005 Cross-Language System Evaluation
Campaign, Working Notes for the CLEF 2005 Work-
shop. Centromedia, Wien, Austria.
Hermann Helbig. 2006. Knowledge Representation
and the Semantics of Natural Language. Springer,
Berlin.
Jes´us Herrera, Anselmo Pe˜nas, and Felisa Verdejo.
2005. Question answering pilot task at CLEF 2004.
In (Peters et al., 2005), pages 581–590.
Cody Kwok, Oren Etzioni, and Daniel S. Weld. 2001.
Scaling question answering to the web. ACM Trans-
actions on Information Systems, 19(3):242–262.
Bernardo Magnini, Alessandro Vallin, Christelle Ay-
ache, Gregor Erbach, Anselmo Pe˜nas, Maarten
de Rijke, Paulo Rocha, Kiril Simov, and Richard
Sutcliffe. 2005. Overview of the CLEF 2004 mul-
tilingual question answering track. In (Peters et al.,
2005), pages 371–391.
G¨unter Neumann and Bogdan Sacaleanu. 2005.
Experiments on robust NL question interpretation
and multi-layered document annotation for a cross-
language question/answering system. In (Peters et
al., 2005), pages 411–422.
G¨unter Neumann and Feiyu Xu. 2003. Mining an-
swers in German web pages. In Proceedings of the
International Conference on Web Intelligence (WI-
2003), Halifax, Canada.
Carol Peters, Paul Clough, Julio Gonzalo, Gareth J. F.
Jones, Michael Kluck, and Bernardo Magnini, ed-
itors. 2005. Multilingual Information Access for
Text, Speech and Images: 5th Workshop of the
Cross-Language Evaluation Forum, CLEF 2004,
volume 3491 of LNCS. Springer, Berlin.
Jos´e L. Vicedo, Maximiliano Saiz, Rub´en Izquierdo,
and Fernando Llopis. 2005. Does English help
question answering in Spanish? In (Peters et al.,
2005), pages 552–556.
