MAYA: A Fast Question-answering System Based On A Predictive 
Answer Indexer* 
Harksoo Kim, Kyungsun Kim 
Dept. of Computer Science, 
Sogang University 
1 Sinsu-Dong, Mapo-Gu, Seoul, 
121-742, Korea 
{ hskim, kksun } 
@nlpzodiac.sogang.ac.kr 
Gary Geunbae Lee 
Dept. of Computer Science 
and Engineering, 
Pohang University of 
Science and Technology 
San 31, Hyoja-Dong, 
Pohang, 790-784, Korea 
gblee@postech.ac.kr 
Jungyun Seo 
Dept. of Computer Science, 
Sogang University 
1 Sinsu-Dong, Mapo-Gu, 
Seoul, 121-742, Korea 
seojy@ccs.sogang.ac.kr 
(Currently Visiting CSLI Stanford University) 
 
 
 
 
Abstract 
We propose a Question-answering 
(QA) system in Korean that uses a 
predictive answer indexer. The 
predictive answer indexer, first, 
extracts all answer candidates in a 
document in indexing time. Then, it 
gives scores to the adjacent content 
words that are closely related with each 
answer candidate. Next, it stores the 
weighted content words with each 
candidate into a database. Using this 
technique, along with a complementary 
analysis of questions, the proposed QA 
system can save response time because 
it is not necessary for the QA system to 
extract answer candidates with scores 
on retrieval time. If the QA system is 
combined with a traditional 
Information Retrieval system, it can 
improve the document retrieval 
precision for closed-class questions 
after minimum loss of retrieval time. 
1 Introduction∗ 
Information Retrieval (IR) systems have been 
applied successfully to a large scale of search 
area in which indexing and searching speed is 
important. Unfortunately, they return a large 
                                                           
∗ This research was partly supported by BK21 program of 
Ministry of Education and Technology Excellency 
Program of Ministry of Information and 
Telecommunications. 
amount of documents that include indexing 
terms in a user’s query. Hence, the user should 
carefully look over the whole text in order to 
find a short phrase that precisely answers his/her 
question. 
Question-answering (QA), an area of IR, is 
attracting more attention, as shown in the 
proceedings of AAAI (AAAI, 1999) and TREC 
(TREC, http://trec.nist.gov/overview.html). A 
QA system searches a large collection of texts, 
and filters out inadequate phrases or sentences 
within the texts. By using the QA system, a user 
can promptly approach to his/her answer phrases 
without troublesome tasks. However, most of 
the current QA systems (Ferret et al., 1999; Hull, 
1999; Srihari and Li, 1999; Prager et al., 2000) 
have two problems as follows: 
a0 It cannot correctly respond to all of the users’ 
questions. It can answer the questions that are 
included in the pre-defined categories such as 
person, date, time, and etc. 
a1 It requires more indexing or searching time than 
traditional IR systems do because it needs a 
deep linguistic knowledge such as syntactic or 
semantic roles of words. 
 
To solve the problems, we propose a QA 
system using a predictive answer indexer - 
MAYA (MAke Your Answer). We can easily 
add new categories to MAYA by only 
supplementing domain dictionaries and rules. 
We do not have to revise the searching engine of 
MAYA because the indexer is designed as a 
separate component that extracts candidate 
answers. In addition, a user can promptly obtain 
answer phrases on retrieval time because 
MAYA indexes answer candidates in advance. 
Most of the previous approaches in IR have 
been focused on the method to efficiently 
represent terms in a document because they 
want to index and search a large amount of data 
in a short time (Salton et al., 1983; Salton and 
McGill, 1983; Salton 1989). These approaches 
have been applied successfully to the 
commercial search engines (e.g. 
http://www.altavista.com) in World Wide Web 
(WWW). However, in a real sense of 
information retrieval rather than document 
retrieval, a user still needs to find an answer 
phrase within the vast amount of the retrieved 
documents although he/she can promptly find 
the relevant documents by using these engines. 
Recently, several QA systems are proposed to 
avoid the unnecessary answer finding efforts 
(Ferret et al., 1999; Hull, 1999; Moldovan et al. 
1999; Prager et al., 1999; Srihari and Li, 1999). 
Recent researches have combined the 
strengths between a traditional IR system and a 
QA system (Prager et al., 2000; Prager et al., 
1999; Srihari and Li, 1999). Most of the 
combined systems access a huge amount of 
electronic information by using IR techniques, 
and they improve precision rates by using QA 
techniques. In detail, they retrieve a large 
amount of documents that are relevant to a 
user’s query by using a well-known TF a2 IDF. 
Then, they extract answer candidates within the 
documents, and filter out the candidates by 
using an expected answer type and some rules 
on the retrieval time. Although they have been 
based on shallow NLP techniques (Sparck-Jones, 
1999), they consume much longer retrieval time 
than traditional IR systems do because of the 
addictive efforts mentioned above. To save 
retrieval time, MAYA extracts answer 
candidates, and computes the scores of the 
candidates on indexing time. On retrieval time, 
it just calculates the similarities between a user’s 
query and the candidates. As a result, it can 
minimize the retrieval time. 
This paper is organized as follows. In Section 
2, we review the previous works of the QA 
systems. In Section 3, we describe the applied 
NLP techniques, and present our system. In 
Section 4, we analyze the result of our 
experiments. Finally, we draw conclusions in 
Section 5. 
2 Previous Works 
The current QA approaches can be classified 
into two groups; text-snippet extraction systems 
and noun-phrase extraction systems (also called 
closed-class QA) (Vicedo and Ferrándex, 2000). 
The text-snippet extraction approaches are 
based on locating and extracting the most 
relevant sentences or paragraphs to the query by 
assuming that this text will probably contain the 
correct answer to the query. These approaches 
have been the most commonly used by 
participants in last TREC QA Track (Ferret et al., 
1999; Hull, 1999; Moldovan et al., 1999; Prager 
et al., 1999; Srihari and Li, 1999). ExtrAns 
(Berri et al., 1998) is a representative QA 
system in the text-snippet extraction approaches. 
The system locates the phrases in a document 
from which a user can infer an answer. However, 
it is difficult for the system to be converted into 
other domains because the system uses syntactic 
and semantic information that only covers a very 
limited domain (Vicedo and Ferrándex, 2000). 
The noun-phrase extraction approaches are 
based on finding concrete information, mainly 
noun phrases, requested by users’ closed-class 
questions. A closed-class question is a question 
stated in natural language, which assumes some 
definite answer typified by a noun phrase rather 
than a procedural answer. MURAX (Kupiec, 
1993) is one of the noun-phrase extraction 
systems. MURAX uses modules for the shallow 
linguistic analysis: a Part-Of-Speech (POS) 
tagger and finite-state recognizer for matching 
lexico-syntactic pattern. The finite-state 
recognizer decides users’ expectations and 
filters out various answer hypotheses. For 
example, the answers to questions beginning 
with the word Who are likely to be people’s 
name. Some QA systems participating in Text 
REtrieval Conference (TREC) use a shallow 
linguistic knowledge and start from similar 
approaches as used in MURAX (Hull, 1999; 
Vicedo and Ferrándex, 2000). These QA 
systems use specialized shallow parsers to 
identify the asking point (who, what, when, 
where, etc). However, these QA systems take a 
long response time because they apply some 
rules to each sentence including answer 
candidates and give each answer a score on 
retrieval time. 
MAYA uses shallow linguistic information 
such as a POS tagger, a lexico-syntactic parser 
similar to finite-state recognizer in MURAX and 
a Named Entity (NE) recognizer based on 
dictionaries. However, MAYA returns answer 
phrases in very short time compared with those 
previous systems because the system extracts 
answer candidates and gives each answer a score 
using pre-defined rules on indexing time. 
3 MAYA Q/A approach 
MAYA has been designed as a separate 
component that interfaces with a traditional IR 
system. In other words, it can be run without IR 
system. It consists of two engines; an indexing 
engine and a searching engine. 
The indexing engine first extracts all answer 
candidates from collected documents. For 
answer extraction, it uses the NE recognizer 
based on dictionaries and the finite-state 
automata. Then, it gives scores to the terms that 
surround each candidate. Next, it stores each 
candidate and the surrounding terms with scores 
in Index DataBase (DB). For example, if n 
surrounding terms affects a candidate, n pairs of 
the candidate and terms are stored into DB with 
n scores. As shown in Figure 1, the indexing 
engine keeps separate index DBs that are 
classified into pre-defined semantic categories 
(i.e. users’ asking points or question types). 
The searching engine identifies a user’s 
asking point, and selects an index DB that 
includes answer candidates of his/her query. 
Then, it calculates similarities between terms of 
his/her query and the terms surrounding the 
candidates.  The similarities are based on p-
Norm model (Salton et al., 1983). Next, it ranks 
the candidates according to the similarities. 
 
a3 a4 a5 a6 a7 a8 a9 a10 a11
a12 a13 a14 a15 a16 a17 a17 a18 a19
a20 a21 a22 a23 a24 a25 a26 a27 a28 a29 a23 a22
a30 a31 a32 a33 a34 a35 a36 a37 a37 a33 a35 a31 a38 a39 a40 a41
a42 a43 a44 a45 a45 a46 a47 a46 a48 a49
a50 a51 a52 a53 a54 a55 a56 a57 a58 a59 a60 a61 a56 a57 a58 a59 a60 a57
a62 a63 a64
a65 a66 a67 a68 a66 a69 a70 a71 a72 a73
a74 a75 a76 a77
a78 a76 a79 a77 a76 a80 a75 a81 a82 a83
…
a84 a85 a86 a87 a88
a89 a90 a91 a92 a93 a94 a95 a96 a97 a98 a99 a100 a101 a99 a102 a101 a103 a100 a104 a96 a105 a104
a106 a107 a108 a109 a110 a111 a108 a112 a113 a111 a114 a107 a115
a116 a117 a118 a119 a120 a121 a121 a122 a117 a123 a124 a117 a125
a126 a127 a128 a129 a129 a128 a127
a130 a131 a132 a133 a134 a135 a136 a137 a138 a139 a140 a137 a132 a138
a141 a142 a143 a144 a145 a146 a142 a147 a144 a142 a147 a146 a142 a144 a148 a149 a150 a151 a152 a153 a154 a155 a156 a149 a155 a156 a154 a155 a149
a157 a158 a159 a160 a161 a162 a163
a164 a165 a166 a167 a164 a168 a168 a169 a170 a171
a172 a173 a174 a175 a176 a177 a178 a179 a173 a180 a181 a180 a179 a182 a176
a182 a176 a177 a183 a174 a178 a184 a177 a181 a173 a185
a186 a187 a188 a189 a190 a191 a192 a193
 
Figure 1. A basic architecture of the QA engines 
 
Figure 2 shows a total architecture of MAYA 
that combines with a traditional IR system. As 
shown in Figure 2, the total system has two 
index DBs. One is for the IR system that 
retrieves relevant documents, and the other is for 
MAYA that extracts relevant answer phrases. 
 
a194 a195 a196 a197 a198
a199 a200 a201 a202 a203 a204 a205 a206 a207 a208 a209 a202 a208 a209 a207 a208 a202 a210 a211 a212 a213 a214 a215 a216 a217 a218 a219 a220 a213 a219 a220 a218 a219 a213
a221 a222 a223 a224 a225 a226
a227 a228 a229 a230 a229 a231 a232 a233 a234
a235 a236 a237 a238 a239 a240 a241 a242 a243 a244 a245 a246 a247 a248 a249 a250 a251 a252 a253 a254 a255 a0 a1 a2 a0 a3 a4 a2 a5 a0
a6 a7 a8 a9 a8 a9 a10 a11 a12 a13 a7 a14 a11 a15 a14 a11 a16
a17 a13 a18 a19 a20 a21 a22 a23 a21 a24
a25 a26 a27 a28 a29 a30 a31 a32 a33
a34 a31 a35 a36 a30 a37 a38 a39 a32 a40 a41 a39 a35 a32
 
Figure 2. A total architecture of the combined 
MAYA system 
3.1 Predictive Answer indexing 
The answer indexing phase can be separated in 2 
stages; Answer-finding and Term-scoring. For 
answer-finding, we classify users’ asking points 
into 14 semantic categories; person, country, 
address, organization, telephone number, email 
address, homepage Uniform Resource Locator 
(URL), the number of people, physical number, 
the number of abstract things, rate, price, date, 
and time. We think that the 14 semantic 
categories are frequently questioned in general 
IR systems. To extract answer candidates 
belonging to each category from documents, the 
indexing engine uses a POS tagger and a NE 
recognizer. The NE recognizer makes use of two 
dictionaries and a pattern matcher. One of the 
dictionaries, which is called PLO dictionary 
(487,782 entries), contains the names of people, 
countries, cities, and organizations. The other 
dictionary, called unit dictionary (430 entries), 
contains the units of length (e.g. cm, m, km), the 
units of weight (e.g. mg, g, kg), and others. After 
looking up the dictionaries, the NE recognizer 
assigns a semantic category to each answer 
candidate after disambiguation using POS 
tagging. For example, the NE recognizer 
extracts 4 answer candidates annotated with 4 
semantic categories in the sentence, “ a42 a43 a44 a45
a46  (
a47 a48  a49 a50 a51  www.yahoo.co.kr) a52  a53 a54  a55
a56 a57  
a58 a59 a60  6 a61 a62 a63  a64 a65 a66 . (Yahoo Korea 
(CEO Jinsup Yeom www.yahoo.co.kr) expanded 
the size of the storage for free email service to 6 
mega-bytes.)”. a67 a68 a69 a70 a71  (Yahoo Korea) 
belongs to organization, and a72 a73 a74 (Jinsup 
Yeom) is person. www.yahoo.co.kr means 
homepage URL, and 6 a75 a76 (6 mega-bytes) is 
physical number. Complex lexical candidates 
such as www.yahoo.co.kr are extracted by the 
pattern matcher. The pattern matcher extracts 
formed answers such as telephone number, 
email address, and homepage URL. The patterns 
are described as regular expressions. For 
example, Homepage URL satisfies the following 
regular expressions: 
a77 ^(http://)[_A-Za-z0-9
a78 -]+(a78 .[_A-Za-z0-9a78 -
]+)+(/[_~A-Za-z0-9a78 -a78 .]+)*$ 
a79 ^[0-9]{3}(
a78 .[0-9]{3})(a78 .[0-9]{2,}){2}(/[_~A-
Za-z0-9a78 -a78 .]{2,})*$ 
a79 ^[0-9]*[_A-Za-z
a78 -]{1,}[_A-Za-z0-9a78 -
]+(a78 .[_A-Za-7z0-9a78 -]{2,}){2,}(/[_~A-Za-z0-
9a78 -a78 .]{2,})*$ 
 
In the next stage, the indexing engine gives 
scores to content words within a context window 
that occur with answer candidates. The 
maximum size of the context window is 3 
sentences; a previous sentence, a current 
sentence, and a next sentence. The window size 
can be dynamically changed. When the indexing 
engine decides the window size, it checks 
whether neighboring sentences have anaphora or 
lexical chains.  
 a80 a81 a82 a83 a84 a85 a86 a87 a88 a89 a90 a91 a92 a93
a94 a95 a96 a97 a98 a97 a99 a100 a101 a102 a103 a101 a97 a104 a105 a97 a100 a106 a107 a105 a98 a101 a97 a102 a98 a97 a101 a108 a106 a102 a101 a103 a109 a110 a106 a111 a108 a103 a98 a105 a97 a100 a112
a113 a114 a115 a116 a117 a118 a119 a120 a121 a122 a123 a124 a124 a124 a125 a126 a127 a128 a129 a129 a125 a130 a129 a125 a131 a132 a133 a134 a135 a136 a137 a137 a138 a138 a139 a139 a140 a140 a141 a141 a142 a142 a143 a144 a145 a146 a147 a148
a149 a150 a151 a152 a153 a153 a154 a153 a155 a156 a151 a157 a158 a159 a160 a161 a162 a163 a164 a165 a166 a167 a168 a169 a170 a171 a171 a171 a172 a173 a174 a175 a176 a176 a172 a177 a176 a172 a178 a179 a180 a181 a182 a183 a184 a185 a186 a187 a188 a189 a190 a191 a182 a182 a192 a192 a193 a193 a194 a194 a195 a196a195 a196 a197 a192a197 a192 a198 a198 a199 a199 a195 a195 a200 a200 a192 a192 a201 a202
a203 a204 a204 a205 a205 a206 a206 a207 a208 a209 a210 a211 a212 a213 a214 a215 a216 a217 a218 a219 a220 a221 a221 a222 a222 a223 a223 a224 a225 a226 a227 a228 a229 a230 a231 a232 a233
a234 a235 a236 a237 a238 a237 a238 a239 a237 a240 a241 a242 a243 a244 a236 a237 a245 a245 a246 a246 a247 a248 a249 a250 a246a247 a248 a249 a250 a246 a251 a252 a253 a254 a255 a0 a1 a2 a0 a3 a4 a0 a0 a255 a1 a5 a4 a252 a6 a0 a255 a5 a3 a7 a8 a0 a6 a252 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a18 a19 a19 a20 a20 a21 a22a21 a22 a23 a24
a25 a26 a27 a28 a29 a28 a30 a31 a32 a33 a28 a30 a34 a26 a33 a35 a36 a37 a38 a38 a39 a36 a40 a41 a38
a42 a42 a43 a43 a44 a44
a45 a45 a46 a46 a44 a44
a47 a47 a48 a48 a49 a49
a50 a50 a51 a51 a49 a49
a52 a52 a53 a53 a54 a54 a55 a55 a56 a56
a52 a52 a53 a53 a54 a54 a55 a55 a56 a56
a57 a57 a52 a52 a58 a58 a59 a59 a55 a55 a60 a60 a52 a52
a57 a57 a52 a52 a58 a58 a59 a59 a55 a55 a60 a60 a52 a52
 
Figure 3. An example with the adjusted window 
size 
 
If the next sentence has anaphors or lexical 
chains of the current sentence and the current 
sentence does not have anaphors or lexical 
chains of the previous sentence, the indexing 
engine sets the window size as 2. Unless 
neighboring sentences have anaphors or lexical 
chains, the window size is 1. Figure 3 shows an 
example in which the window size is adjusted. 
The scores of the content words indicate the 
magnitude of influences that each content word 
causes to answer candidates. For example, when 
www.yahoo.co.kr is an answer candidate in the 
sentence, “ a61 a62 a63 a64 a65  (www.yahoo.co.kr) a66  a67
a68 a69  
a70 a71 a72 a73  a74 a75 a76 a77 . (Yahoo Korea 
(www.yahoo.co.kr) starts a new service.)”, a78 a79
a80 a81 a82 (Yahoo Korea) has the higher score than 
a83 a84 a85 (service) because it has much more 
strong clue to www.yahoo.co.kr. We call the 
score a term score. The indexing engine assigns 
term scores to content words according to 5 
scoring features described below. 
 a86
POS: the part-of-speech of a content word. The 
indexing engine gives 2 points to each content 
word annotated with a proper noun tag and 
gives 1 point to each content word annotated 
with other tags such as noun, number, and etc. 
For example, a87 a88 a89 a90 a91 (Yahoo Korea) 
obtains 2 points, and a92 a93 a94 (service) obtains 1 
point in “ a95 a96 a97 a98 a99  (www.yahoo.co.kr) a100  a101
a102 a103  
a104 a105 a106 a107  a108 a109 a110 a111 . (Yahoo Korea 
(www.yahoo.co.kr) starts a new service.)”. 
a112 Grammatical Role: the grammatical relations of 
the subcategorized functions of the main verb in 
a sentence. The indexing engine gives 4 points 
to a topic word, 3 points to a subject, 2 points to 
an object and 1 point to the rests. The 
grammatical roles can be decided by case 
markers like a113 / a114 (un/nun), a115 / a116 (i/ga) and a117 /
a118 (ul/lul) since  Korean is a language with well-
developed morphemic markers. For example, a119
a120  
a121 a122 a123 (Yahoo Korea) obtains 3 points 
because it is a subject, and a124 a125 a126 (service) 
obtains 2 point because it is an object in the 
above sample sentence. 
a127 Lexical Chain: the re-occurring words in 
adjacent sentences. The indexing engine gives 2 
points to each word that forms lexical chains 
and gives 1 point to others. For example, if the 
next sentence of the above sample sentence is 
“ a128  a129 a130 a131 a132  a133 a134 a135 a136  a137 a138 a139  6 a140 a141 a142  a143 a144  
a145 a146 a147  
a148 a149 a150  a151 a152 a153  a154  a155 a156 . (The members 
of the service can use the free storages of 6 
mega-bytes for email.)”, a157 a158 a159 (service) 
obtains 2 points. 
a160 Distance: the distance between a sentence 
including a target content word and a sentence 
including an answer candidate. The indexing 
engine gives 2 points to each content word in 
the sentence including the answer candidate. 
The engine gives 1 point to others. For example, 
a161 a162 a163 a164 a165 (Yahoo Korea) and 
a166 a167 a168 (service) 
in the above sample sentence obtain 2 points 
respectively because the content words are in 
the sentence including the answer candidate, 
www.yahoo.co.kr. 
a169 Apposition: the IS-A relation between a content 
word and an answer candidate. The indexing 
engine extracts appositive terms by using 
syntactic information such as Explicit IS-A 
relation, Pre-modification and Post-modification. 
For example, a161 a162 a163 a164 a165 (Yahoo Korea) is Pre-
modification relation with www.yahoo.co.kr in 
the above sample sentence. The indexing engine 
gives 2 points to each appositive word and gives 
1 point to others. 
 
The indexing engine adds up the scores of the 5 
features, as shown in Equation 1. 
 
EDCBA
fEfDfCfBfAts iiiii
i ++++
⋅+⋅+⋅+⋅+⋅= 54321  (1) 
 
tsi is the term score of the ith term, and fij is the 
score of the jth feature in the ith term. A, B, C, D 
and E are weighting factors that rank 5 features 
according to preference. The indexing engine 
uses the following ranking order: E > C > B > A 
> D. The weighted term scores are normalized, 
as shown in Equation 2. 
 
( )
0                               0                   
0   log )/log(_5.05.0
=
>



 +
ij
ij
j
ij
ts
tsN nNtsMaxts  (2) 
 
Equation 2 is similar to TF⋅IDF equation (Fox, 
1983). In Equation 2, tsij is the term score of the 
ith term in the context window that is relevant to 
the jth answer candidate. Max_tsj is the 
maximum value among term scores in the 
context window that is relevant to the jth answer 
candidate. n is the number of answer candidates 
that are affected by the ith term. N is the number 
of answer candidates of the same semantic 
category. The indexing engine saves the 
normalized term scores with the position 
information of the relevant answer candidate in 
the DB. The position information includes a 
document number and the distance between the 
beginning of the document and the answer 
candidate. As a result, the indexing engine 
creates 14 DB’s that correspond to the 14 
semantic categories. We call them answer DB’s. 
3.2 Lexico-syntactic Query processing 
In the query processing stage, the searching 
engine takes a user’s question and converts it 
into a suitable form, using a semantic dictionary, 
called a query dictionary. The query dictionary 
contains the semantic markers of words. Query 
words are converted into semantic markers 
before pattern matching. For example, the query 
“ a170 a171 a172 a173 a174 a175  a176 a177 a178  a179 a180 a181 a182 a183 ? (Who is 
the CEO of Yahoo Korea?)” is translated into 
“ a184 a185 a186 a187 a188  j % a189 a190  j % a191 a192  jp ef sf (%who 
auxiliary-verb %person preposition Yahoo 
Korea symbol)”. In the example, % a193 a190
(%person) and % a191 a192 (%who) are the semantic 
markers. The content words out of the query 
dictionary keep their lexical forms. The 
functional words (e.g. auxiliary verb, 
preposition) are converted into POS’s. After 
conversion, the searching engine matches the 
converted query against one of 88 lexico-
syntactic patterns, and classifies the query into 
the one of 14 semantic categories. When two or 
more patterns match the query, the searching 
engine returns the first matched category. 
 
% a194 a195  (xsn)* (j)?% a196 a197 .* $ 
(%person (xsn)* (j)? %who .* $) 
% a198 a199  (xsn)* (j)? % a200 a201  (j) (% a202 a203 )? .* $ 
(%person (xsn)* (j)? %name (j) (%what)? .* $) 
% a204 a205  (xsn)* (j)? (% a206 a207 )? % a208 a209  .* $ 
(%person (xsn)* (j)? (%name)? %want_to_know .* $) 
% a210 a211  % a212 a213  .* $ 
(%which %person .* $) 
Figure 4. Lexico-syntactic patterns 
 
Figure 4 shows some lexico-syntactic patterns 
for person category. The above sample query 
matches the first pattern in Figure 4. 
After classifying the query into a semantic 
category, the searching engine calculates the 
term scores of the content words in the query. 
As shown in Rule 1, the term scores are 
computed by some heuristic rules, and the range 
of the term scores is between 0 and 1. Using the 
heuristic rules, the searching engine gives high 
scores to content words that focus a user’s 
intention. For example, when a user inputs the 
query “ a214 a215 a216  a217  a218 a219 a220  a221 a222 a223 a224 a225 ? (In 
what year is Yahoo founded?)”, he/she wants to 
know only the year, rather than the organizer or 
the URL of Yahoo. So, the QA searching engine 
gives a higher score to a226 a227 (year) than to a228 a229
(Yahoo) in contrast to the ordinary IR searching 
engine. 
 
1. The last content word in a sentence receives a 
high score. For example, a230 a231  (CEO) in “ a232 a233  
a234
a231 a235 ? (The CEO of Yahoo?)” receives a high 
score. 
2. The next content words of specific interrogatives 
such as a236 a237  (which), a238 a239  (what) receive high 
scores. For example, a240 (mountain) in “ a241 a242  a243 a244  
a245 a246  
a247 a248 a249 ? (Which mountain is the highest?)” 
receives a high score. 
3. The next content words of specific prepositions 
like a250 a251  (about) receive low scores, and the 
previous content words receive high scores. For 
example, the score of a252 a253  (article)  in “ a254 a255 a0  
a1 a2  
a3 a4  (the article about China)” is lower 
than that of a5 a6  (China). 
Rule 1. Heuristic rules for scoring query terms 
3.3 Answer scoring and ranking 
The searching engine calculates the similarities 
between query and answer candidates, and ranks 
the answer candidates according to the 
similarities. To check the similarities, the 
searching engine uses the AND operation of a 
well-known p-Norm model (Salton et al., 1983), 
as shown in Equation 3. 
 
p
p
i
pp
p
i
p
i
pppp
and qqq
aqaqaqQASim
+++
−++−+−−=
a7
a7
21
2211 )1()1()1(1),(   (3) 
 
In Equation 3, A is an answer candidate, and ai 
is the ith term score in the context window of 
the answer candidate. ai is stored in the answer 
DB. qi is the ith term score in the query. p is the 
P-value in the p-Norm model. 
It takes a relatively short time for answer 
scoring and ranking phase because the indexing 
engine has already calculated the scores of the 
terms that affect answer candidates. In other 
words, the searching engine simply adds up the 
weights of co-occurring terms, as shown in 
Equation 3. Then, the engine ranks answer 
candidates according to the similarities. The 
method for answer scoring is similar to the 
method for document scoring of traditional IR 
engines. However, MAYA is different in that it 
indexes, retrieves, and ranks answer candidates, 
but not documents. 
We can easily combine MAYA with a 
traditional IR system because MAYA has been 
designed by a separate component that 
interfaces with the IR system. We implemented 
an IR system that is based on TF⋅IDF weight 
and p-Norm model (Lee et al., 1999). 
To improve the precision rate of the IR 
system, we combine MAYA with the IR system. 
The total system merges the outputs of MAYA 
with the outputs of the IR system. MAYA can 
produce multiple similarity values per document 
if two or more answer candidates are within a 
document. However, the IR system produces a 
similarity value per document. Therefore, the 
total system adds up the similarity value of the 
IR system and the maximum similarity value of 
MAYA, as shown in Equation 4. 
 
βα
βα
+
⋅+⋅= ),(),(),( QAQAsimQDIRsimQDSim id   (4) 
 
In Equation 4, QAsimd(Ai,Q) is the similarity 
value between query Q and the ith answer 
candidate Ai in document d. IRsim(D,Q) is the 
similarity value between query Q and document 
D. a8  and a9  are weighting factors. We set a10  and a9  
to 0.3 and 0.7. 
The total system ranks the retrieved 
documents by using the combined similarity 
values, and shows the sentences including 
answer candidates in the documents. 
4 Evaluation 
4.1 The experiment data 
In order to experiment on MAYA, we collected 
14,321 documents (65,752 kilobytes) from two 
web sites: korea.internet.com (6,452 documents) 
and www.sogang.ac.kr (7,869 documents). The 
former gives the members on-line articles on 
Information Technology (IT). The latter is a 
homepage of Sogang University. The indexing 
engine created the 14 answer DBs (14 semantic 
categories). 
For the test data, we collected 50 pairs of 
question-answers from 10 graduate students. 
Table 1 shows the 14 semantic categories and 
the numbers of the collected question-answers in 
each category. As shown in Table 1, we found 2 
question-answers out of the 14 semantic 
categories. They are not closed-class question-
answers but explanation-seeking question-
answers like “Question: How can I search on-
line Loyola library for any books? Answer: 
Connect your computer to http://loyola 
1.sogang.ac.kr”. 
 
Category person country address organization 
# of QAs 9 3 3 9 
Category telephone email URL people num. 
# of QAs 3 5 4 0 
Category phy. num. abs. num. rate price 
# of QAs 1 1 0 4 
Category date time out of cat. total 
# of QAs 5 1 2 50 
Table 1. The number of the collected question-
answers in each category 
 
We use two sorts of evaluation schemes. To 
experiment on MAYA, we compute the 
performance score as the Reciprocal Answer 
Rank (RAR) of the first correct answer given by 
each question. To compute the overall 
performance, we use the Mean Reciprocal 
Answer Rank (MRAR), as shown in Equation 5 
(Voorhees and Tice, 1999). 
 




= ∑
i
iranknMRAR /1/1
 (5) 
 
With respect to the total system that combines 
MAYA with the IR system, we use the 
Reciprocal Document Rank (RDR) and the 
Mean Reciprocal Document Rank (MRDR). 
RDR means the reciprocal rank of the first 
document including the correct answers given 
by each question. 
4.2 Analysis of experiment results 
The performance of MAYA is shown in Table 2. 
We obtained the correct answers for 33 
questions out of 50 in Top 1. 
 
Rank Top 1 Top 2 Top 3 Top 4 
# of answers 33 4 3 2 
Rank Top 5 Top 6~ Failure Total (MRAR) 
# of answers 1 2 5 50 (0.80) 
Table 2. The performance of the QA system 
Table 3 shows the performance of the total 
system. As shown in Table 3, the total system 
significantly improves the document retrieval 
performance of underlying IR system about the 
closed-class questions. 
 The average retrieval time of the IR system 
is 0.022 second per query. The total system is 
0.029 second per query. The difference of the 
retrieval times between the IR system and the 
total system is not so big, which means that the 
retrieval speed of QA-only-system is fast 
enough to be negligible. The IR system shows 
some sentences including query terms to a user. 
However, the total system shows the sentences 
including answer candidates to a user. This 
function helps the user get out of the trouble that 
the user might experience when he/she looks 
through the whole document in order to find the 
answer phrase. 
 
Rank Top 1 Top 2 Top 3 Top 4 
# of answers 1 22 8 5 2 
# of answers 2 36 5 2 1 
Rank Top 5 Top 6~ Failure Total (MRDR) 
# of answers 1 3 10 0 50 (0.54) 
# of answers 2 2 4 0 50 (0.76) 
# of answers 1: the number of answers which are ranked at 
top n by using the IR system 
# of answers 2: the number of answers which are ranked at 
top n by using the total system 
Table 3. The performance of the total system 
 
MAYA could not extract the correct answers 
to certain questions in this experiment. The 
failure cases are the following, and all of them 
can be easily solved by extending the resources 
and pattern rules: 
a11 The lexico-syntactic parser failed to classify 
users’ queries into the predefined semantic 
categories. We think that most of these failure 
queries can be dealt with by supplementing 
additional lexico-syntactic grammars. 
a12 The NE recognizer failed to extract answer 
candidates. To resolve this problem, we should 
supplement the entries in PLO dictionary, the 
entries in the unit dictionary, and regular 
expressions. We also should endeavor to 
improve the precision of the NE recognizer. 
5 Conclusion 
We presented a fast and high-precision Korean 
QA system using a predictive answer indexer. 
The predictive answer indexer extracts answer 
candidates and terms surrounding the candidates 
in indexing time. Then, it stores each candidate 
with the surrounding terms that have specific 
scores in answer DB’s. On the retrieval time, the 
QA system just calculates the similarities 
between a user’s query and the answer 
candidates. Therefore, it can minimize the 
retrieval time and enhance the precision. Our 
system can easily converted into other domains 
because it is based on shallow NLP and IR 
techniques such as POS tagging, NE recognizing, 
pattern matching and term weighting with 
TF⋅IDF. The experimental results show that the 
QA system can improve the document retrieval 
precision for closed-class questions after the 
insignificant loss of retrieval time if it is 
combined with a traditional IR system. In the 
future, we pursue to concentrate on resolving the 
semantic ambiguity when a user’s query 
matches two or more lexico-syntactic patterns. 
Also, we are working on an automatic and 
dynamic way of extending the semantic 
categories into which the users’ queries can be 
more flexibly categorized. 

References 
AAAI Fall Symposium on Question Answering. 
1999. 
Berri, J., Molla, D., and Hess, M. 1998. Extraction 
automatique de réponses: implémentations du 
systéme ExtrAns. In Proceedings of the fifth 
conference TALN 1998,  pp. 10-12. 
Ferret, O., Grau, B., Illouz, G., and Jacquemin C. 
1999. QALC – the Question- Answering program 
of the Language and Cognition group at LIMSI-
CNRS. In Proceedings of The Eighth Text 
REtrieval Conference(TREC-8), http://trec.nist. 
gov/pubs/trec8/t8_proceedings.html. 
Fox, E.A. 1983. Extending the Boolean and Vector 
Space Models of Information Retrieval with P-
norm Queries and Multiple Concept Types, Ph.D. 
Thesis, CS, Cornell University. 
Hull, D.A. 1999. Xerox TREC-8 Question 
Answering Track Report. In Proceedings of The 
Eighth Text REtrieval Conference(TREC-8), 
http://trec.nist.gov/pubs/trec8/t8_proceedings.html. 
Kupiec, J. 1993. Murax: A Robust Linguistic 
Approach for Question Answering Using an On- 
line Encyclopedia. In Proceedings of SIGIR’93. 
Lee, G., Park, M., and Won, H. 1999. Using syntactic 
information in handling natural language queries 
for extended boolean retrieval model. In 
Proceedings of the 4th international workshop on 
information retrieval with Asian languages 
(IRAL99), pp. 63-70. 
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, 
R., Goodrum, R., Gîrju, R., and Rus, V. 1999. 
LASSO: A Tool for Surfing the Answer Net. In 
Proceedings of The Eighth Text REtrieval 
Conference (TREC-8), http://trec.ni st.gov/pubs 
/trec8/t8_proceedings.html. 
Prager, J., Brown, E., Coden A., and Radev D. 2000. 
Question-Answering by Predictive Annotation. In 
Proceedings of SIGIR 2000, pp. 184-191. 
Prager, J., Radev, D., Brown, E., and Coden, A. 1999. 
The Use of Predictive Annotation for Question 
Answering in TREC8. In Proceedings of The 
Eighth Text REtrieval Conference (TREC-8), 
http://trec.nist.gov/pubs/trec8/t8_proceedings.html. 
Salton, G., Fox, E.A., and Wu, H. 1983. Extended 
Boolean Information Retrieval, Communication of 
the ACM, 26(12):1022-1036. 
Salton, G., and McGill, M. 1983. Introduction to 
Modern Information Retrieval (Computer Series), 
New York:McGraw-Hill. 
Salton, G. 1989. Automatic Text Processing: The 
Transformation, Analysis and Retrieval of 
Information by Computer. Reading, MA:Addison-
Wesley. 
TREC (Text REtrieval Conference) Overview, 
http://trec.nist.gov/overview.html. 
Sparck-Jones, K. 1999. What is the role NLP in Text 
Retrieval?. Natural Language Information 
Retrieval, Kluwer Academic Publishers. 
T.Strzalkowski (ed), pp.1-24. 
Srihari, R., and Li, W. 1999. Information Extraction 
Supported Question Answering. In Proceedings of 
The Eighth Text REtrieval Conference (TREC-8), 
http://trec.nist.gov/pubs/trec8/t8_proceedings.html. 
Vicedo, J. L., and Ferrándex, A. 2000. Importance of 
Pronominal Anaphora resolution in Question 
Answering systems. In Proceeding of ACL 2000, 
pp. 555-562. 
Voorhees, E., and Tice, D. M. 1999. The TREC-8 
Question Answering Track Evaluation. In 
Proceedings of The Eighth Text REtrieval 
Conference (TREC-8), http://trec.nist.gov/pubs 
/trec8/t8_proceedings.html. 
