Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 29–32,
Sydney, July 2006. c©2006 Association for Computational Linguistics
K-QARD: A Practical Korean Question Answering Framework for
Restricted Domain
Young-In Song, HooJung Chung,
Kyoung-Soo Han, JooYoung Lee,
Hae-Chang Rim
Dept. of Computer Science & Engineering
Korea University
Seongbuk-gu, Seoul 136-701, Korea
CUsong, hjchung, kshan, jylee
rimCV@nlp.korea.ac.kr
Jae-Won Lee
Computing Lab.
Samsung Advanced Institute of Technology
Nongseo-ri, Giheung-eup,
Yongin-si, Gyeonggi-do 449-712, Korea
jwonlee@samsung.com
Abstract
We present a Korean question answer-
ing framework for restricted domains,
called K-QARD. K-QARD is developed to
achieve domain portability and robustness,
and the framework is successfully applied
to build question answering systems for
several domains.
1 Introduction
K-QARD is a framework for implementing a fully
automated question answering system including
the Web information extraction (IE). The goal of
the framework is to provide a practical environ-
ment for the restricted domain question answering
(QA) system with the following requirements:
AF Domain portability: Domain adaptation of
QA systems based on the framework should
be possible with minimum human efforts.
AF Robustness: The framework has to provide
methodologies to ensure robust performance
for various expressions of a question.
For the domain portability, K-QARD is de-
signed as a domain-independent architecture and
it keeps all domain-dependent elements in exter-
nal resources. In addition, the framework tries to
employ various techniques for reducing the human
effort, such as simplifying rules based on linguis-
tic information and machine learning approaches.
Our effort for the robustness is focused the
question analysis. Instead of using a technique
for deep understanding of the question, the ques-
tion analysis component of K-QARD tries to ex-
tract only essential information for answering us-
ing the information extraction technique with lin-
guistic information. Such approach is helpful for
NL Answer
Question Analysis
Web Information 
Extraction
Answer Finding
Answer 
Generation
Database
Web Page
NL Question
Web Page
Semantic frames
TE/TR rules
Domain ontology
Training examples
Answer frames
Domain-dependent
External Resources
Domain-independent
Framework
Figure 1: Architecture of K-QARD
not only the robustness but also the domain porta-
bility because it generally requires smaller size of
hand-crafted rules than a complex semantic gram-
mar.
K-QARD uses the structural information auto-
matically extracted from Web pages which include
domain-specific information for question answer-
ing. It has the disavantage that the coverage of QA
system is limited, but it can simplify the question
answering process with robust performance.
2 Architecture of K-QARD
As shown in Figure 1, K-QARD has four major
components: Web information extraction, ques-
tion analysis, answer finding, and answer gener-
ation.
The Web information extraction (IE) compo-
nent extracts the domain-specific information for
question answering from Web pages and stores
the information into the relational database. For
the domain portability, the Web IE component
is based on the automatic wrapper induction ap-
proach which can be learned from small size of
training examples.
The question analysis component analyzes an
29
input question, extracts important information us-
ing the IE approach, and matches the question with
pre-defined semantic frames. The component out-
puts the best-matched frame whose slots are filled
with the information extracted from the question.
In the answer finding component, K-QARD re-
trieves the answers from the database using the
SQL generation script defined in each semantic
frame. The SQL script dynamically generates
SQL using the values of the frame slots.
The answer generation component provides the
answer to the user as a natural language sentence
or a table by using the generation rules and the
answer frames which consist of canned texts.
3 Question Analysis
The key component for ensuring the robustness
and domain portability is the question analy-
sis because it naturally requires many domain-
dependent resources and has responsibility to
solve the problem caused by various ways of ex-
pressing a question. In K-QARD, a question is an-
alyzed using the methods devised by the informa-
tion extraction approach. This IE-based question
analysis method consists of several steps:
1. Natural language analysis: Analyzing the
syntactic structure of the user’s question and
also identifiying named-entities and some im-
portant words, such as domain-specific pred-
icate or terms.
2. Question focus recognition: Finding the
intention of the user’s question using the
question focus classifier. It is learned from
the training examples based on decision
tree(C4.5)(Quinlan, 1993).
3. Template Element(TE) recognition: Find-
ing important concept for filling the slots
of the semantic frame, namely template el-
ements, using the rules, NE information, and
ontology, etc.
4. Template Relation(TR) recognition: Find-
ing the relation between TEs and a question
focus based on TR rules, and syntactic infor-
mation, etc.
Finally, the question analysis component selects
the proper frame for the question and fills proper
values of each slot of the selected frame.
Compared to other question analysis methods
such as the complex semantic grammar(Martin et
al., 1996), our approach has several advantages.
First, it shows robust performance for the variation
of a question because IE-based approach does not
require the understanding of the entire sentence. It
is sufficient to identify and process only the impor-
tant concepts. Second, it also enhances the porta-
bility of the QA systems. This method is based on
the divide-and-conquer strategy and uses only lim-
ited context information. By virture of these char-
acteristics, the question analysis can be processed
by using a small number of simple rules.
In the following subsections, we will describe
each component of our question analyzer in K-
QARD.
3.1 Natural language analysis
The natural language analyzer in K-QARD iden-
tifies morphemes, tags part-of-speeches to them,
and analyzes dependency relations between the
morphemes. A stochastic part-of-speech tagger
and dependency parser(Chung and Rim, 2004) for
the Korean language are trained on a general do-
main corpus and are used for the analyzer. Then,
several domain-specific named entities, such as a
TV program name, and general named entities,
such as a date, in the question are recognized us-
ing our dictionary and pattern-based named entity
tagger(Lee et al., 2004). Finally some important
words, such as domain-specific predicates, ter-
minologies or interrogatives, are replaced by the
proper concept names in the ontology. The man-
ually constructed ontology includes two different
types of information: domain-specific and general
domain words.
The role of this analyzer is to analyze user’s
question and transform it to the more generalized
representation form. So, the task of the question
focus recognition and the TE/TR recognition can
be simplified because of the generalized linguistic
information without decreasing the performance
of the question analyzer.
One of possible defects of using such linguis-
tic information is the loss of the robustness caused
by the error of the NLP components. However,
our IE-based approach for question analysis uses
the very restricted and essential contextual infor-
mation in each step and can avoid such a risk suc-
cessfully.
The example of the analysis process of this
30
g84g120g104g118g119g108g114g113g35g61g35g35g35“g10942g5938 g81g69g70g10858g9654 g11418g5595g10858 g10830g6718 g14750g7414 g14834g5986g66”
g43g119g114g103g100g124g44 g43g114g113g35g81g69g70g44 g43g100g119g35g113g108g106g107g119g44 g43g115g117g114g106g117g100g112g44g43g115g111g100g124g44g43g122g107g100g119g44
g43“g90g107g100g119g35g112g114g121g108g104g35g122g108g111g111g35g101g104g35g115g111g100g124g104g103g35g114g113g35g81g69g70g35g119g114g113g108g106g107g119g66” g108g113g35g72g113g106g111g108g118g107g44
g43g52g44g35g61g35
g2314g2333g2352g2353g2350g2333g2344g2268g2312g2333g2346g2339g2353g2333g2339g2337g2268g2301g2346g2333g2344g2357g2351g2341g2351
“g10942g5938”g50g81g72g98g71g100g119g104
g43g119g114g103g100g124g44
“g81g69g70”g50g81g72g98g70g107g100g113g113g104g111
g43g114g113g35g81g69g70g44
“g11418g5595”g50g81g72g98g87g108g112g104
g43g100g119g35g113g108g106g107g119g44
“g10830g6718”g50g70g98g122g107g100g119
g43g122g107g100g119g44
“g14750g7414”g50g70g98g115g117g114g106
g43g115g117g114g106g117g100g112g44
“g14834g5986”g50g70g98g115g111g100g124
g43g115g111g100g124g44
g43g53g44g35g61g35
g2317g2353g2337g2351g2352g2341g2347g2346g2268g2306g2347g2335g2353g2351g2268g2318g2337g2335g2347g2339g2346g2341g2352g2341g2347g2346
“g10942g5938”g50g81g72g98g71g100g119g104
g43g119g114g103g100g124g44
“g81g69g70”g50g81g72g98g70g107g100g113g113g104g111
g43g114g113g35g81g69g70g44
“g11418g5595”g50g81g72g98g87g108g112g104
g43g100g119g35g113g108g106g107g119g44
“g10830g6718”g50g70g98g122g107g100g119
g43g122g107g100g119g44
“g14750g7414”g50g70g98g115g117g114g106
g43g115g117g114g106g117g100g112g44
“g14834g5986”g50g70g98g115g111g100g124
g43g115g111g100g124g44
g84g120g104g118g119g108g114g113g35g105g114g102g120g118g35g117g104g106g108g114g113
g84g120g104g118g119g108g114g113g35g105g114g102g120g118g35g61g35g84g73g98g115g117g114g106g117g100g112
g100g35
g43g54g44g35g61g35
g2320g2305g2268g2318g2337g2335g2347g2339g2346g2341g2352g2341g2347g2346
“g10942g5938”g50g81g72g98g71g100g119g104
g43g119g114g103g100g124g44
“g81g69g70”g50g81g72g98g70g107g100g113g113g104g111
g43g114g113g35g81g69g70g44
“g11418g5595”g50g81g72g98g87g108g112g104
g43g100g119g35g113g108g106g107g119g44
g84g120g104g118g119g108g114g113g35g105g114g102g120g118g35g61g35g84g73g98g115g117g114g106g117g100g112
g87g72g98g69g72g74g76g81g71g68g87g72 g87g72g98g69g72g74g76g81g87g108g112g104g87g72g98g70g75g68g81g81g72g79
g43g55g44g35g61g35
g2320g2318g2268g2318g2337g2335g2347g2339g2346g2341g2352g2341g2347g2346
“g10942g5938”g50g81g72g98g71g100g119g104
g43g119g114g103g100g124g44
“g81g69g70”g50g81g72g98g70g107g100g113g113g104g111
g43g114g113g35g81g69g70g44
“g11418g5595”g50g81g72g98g87g108g112g104
g43g100g119g35g113g108g106g107g119g44
g87g72g98g69g72g74g76g81g71g68g87g72 g87g72g98g69g72g74g76g81g87g108g112g104g87g72g98g70g75g68g81g81g72g79
g85g72g79g98g82g78
g85g72g79g98g82g78
g85g72g79g98g82g78
g2320g2350g2333g2346g2351g2344g2333g2352g2341g2347g2346g2268g2347g2338g2268g2319g2337g2345g2333g2346g2352g2341g2335g2268g2306g2350g2333g2345g2337
g73g85g80g35g61g35g83g85g82g74g85g68g80g98g84g88g72g86g87g76g82g81
g84g120g104g118g119g108g114g113g35g105g114g102g120g118g35g61g35g84g73g98g115g117g114g106g117g100g112
g69g104g106g108g113g35g71g100g119g104g35g61g35“g87g114g103g100g124”
g69g104g106g108g113g35g87g108g112g104g35g61g35“g81g108g106g107g119”
g70g107g100g113g113g104g111g35g61g35“g81g69g70”
g84g120g104g118g119g108g114g113g35g105g114g102g120g118g35g61g35g84g73g98g115g117g114g106g117g100g112
‘NE_*’ denotes that the corresponding word is named entity of *.
‘C_*’ denotes that the corresponding word is belong to the concept C_* in the ontology.
‘TE_*’ denotes that the corresponding word is template element whose type is *.
‘REL_OK’ indicates that the corresponding TE and question focus are related.
Figure 2: Example of Question Analysis Process in K-QARD
component is shown in Figure 2-(1).
3.2 Question focus recognition
We define a question focus as a type of informa-
tion that a user wants to know. For example, in
the question GOWhat movies will be shown on TV
tonight?GP, the question focus is a program title, or
titles. For another example, the question focus is
a current rainfall in a question GOSan Francisco is
raining now, is it raining in Los Angeles too?GP.
To find the question focus, we define question
focus region, a part of a question that may contain
clues for deciding the question focus. The ques-
tion focus region is identified with a set of simple
rules which consider the characteristic of the Ko-
rean interrogatives. Generally, the question focus
region has a fixed pattern that is typically used in
interrogative questions(Akiba et al., 2002). Thus
a small number of simple rules is enough to cover
the most of question focus region pattern. Figure
2-(2) shows the part recognized as a question fo-
cus region in the sample question.
After recognizing the region, the actual focus of
the question is determined with features extracted
from the question focus region. For the detection,
we build the question focus classifier using deci-
sion tree (C4.5) and several linguistic or domain-
specific features such as the kind of the interroga-
tive and the concept name of the predicate.
Dividing the focus recognition process into two
parts helps to increase domain portability. While
the second part of deciding the actual question fo-
cus is domain-dependent because every domain-
application has its own set of question foci, the
first part that recognizes the question focus region
is domain-independent.
3.3 TE recognition
In the TE identification phase, pre-defined words,
phrases, and named entities are identified as slot-
filler candidates for appropriate slots, according to
TE tagging rules. For instance, movie and NBC
are tagged as Genre and Channel in the sample
question GOTell me the movie on NBC tonight.GP (i.e.
movie will be used to fill Genre slot and NBC
will be used to fill Channel slot in a semantic
frame). The hand-crafted TE tagging rules basi-
cally consider the surface form and the concept
name (derived from domain ontologies) of a target
word. The context surrounding the target word or
word dependency information is also considered
in some cases. In the example question of Figure
2, the date expression ‘EAF4CIH2D8(today)’, time expres-
sion ‘B0DJELFIA3(night)’ and the channel name ‘MBC’
are selected as TE candidates.
In K-QARD, such identification is accom-
plished by a set of simple rules, which only ex-
amines the semantic type of each constituent word
in the question, except the words in the question
region. It is mainly because of our divide-and-
conquer strategy motivated by IE. The result of
this component may include some wrong template
elements, which do not have any relation to the
user’s intention or the question focus. However,
they are expected to be removed in the next com-
ponent, the TR recognizer which examines the re-
lation between the recognized TE and the question
focus.
31
(1) Broadcast-domain QA system
(2) Answer for sample question, 
“What soap opera will be played on MBC tonight?”
Figure 3: Broadcast-domain QA System using K-QARD
3.4 TR recognition
In the TR recognition phase, all entities identified
in the TE recognition phase are examined whether
they have any relationships with the question fo-
cus region of the question. For example, in the
question GOIs it raining in Los Angeles like in San
Francisco?GP, both Los Angeles and San Francisco
are identified as a TE. However, by the TR recog-
nition, only Los Angeles is identified as a related
entity with the question focus region.
Selectional restriction and dependency relations
between TEs are mainly considered in TR tagging
rules. Thus, the TR rules can be quite simplified.
For example, many relations between the TEs and
the question region can be simply identified by ex-
amining whether there is a syntactic dependency
between them as shown in Figure 2-(4). Moreover,
to make up for the errors in dependency parsing,
lexico-semantic patterns are also encoded in the
TR tagging rules.
4 Application of K-QARD
To evaluate the K-QARD framework, we built re-
stricted domain question answering systems for
the several domains: weather, broadcast, and traf-
fic. For the adaptation of QA system to each do-
main, we rewrote the domain ontology consisting
of about 150 concepts, about 30 TE/TR rules, and
7-23 semantic frames and answer templates. In
addition, we learned the question focus classifier
from training examples of about 100 questions for
the each domain. All information for the ques-
tion answering was automatically extracted using
the Web IE module of K-QARD, which was also
learned from training examples consisting of sev-
eral annotated Web pages of the target Web site. It
took about a half of week for two graduate stu-
dents who clearly understood the framework to
build each QA system. Figure 3 shows an example
of QA system applied to the broadcast domain.
5 Conclusion
In this paper, we described the Korean question
answering framework, namely K-QARD, for re-
stricted domains. Specifically, this framework is
designed to enhance the robustness and domain
portability. To achieve this goal, we use the IE-
based question analyzer using the generalized in-
formation acquired by several NLP components.
We also showed the usability of K-QARD by suc-
cessfully applying the framework to several do-
mains.
References
T. Akiba, K. Itou, A. Fujii, and T Ishikawa. 2002.
Towards speech-driven question answering: Exper-
iments using the NTCIR-3 question answering col-
lection. In Proceedings of the Third NTCIR Work-
shop.
H. Chung and H. Rim. 2004. Unlexicalized de-
pendency parser for variable word order languages
based on local contextual pattern. Lecture Note in
Computer Science, (2945):112–123.
J. Lee, Y. Song, S. Kim, H. Chung, and H. Rim. 2004.
Title recognition using lexical pattern and entity dic-
tionary. In Proceedings of the 1st Asia Information
Retrieval Symposium (AIRS2004), pages 345–348.
P. Martin, F. Crabbe, S. Adams, E. Baatz, and
N. Yankelovich. 1996. Speechacts: a spoken lan-
guage framework. IEEE Computer, 7(29):33–40.
J. Ross Quinlan. 1993. C4.5: Programs for Machine
Learning. Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA.
32
