File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-1902_abstr.xml
Size: 9,664 bytes
Last Modified: 2025-10-06 13:42:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1902"> <Title>Multilingual Question Answering with High Portability on Relational Databases</Title> <Section position="1" start_page="0" end_page="6" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes a highly-portable multilingual question answering system on multiple relational databases. We apply semantic category and pattern-based grammars, into natural language interfaces to relational databases. Lexico-semantic pattern (LSP) and multi-level grammars achieve portability of languages, domains, and DBMSs. The LSP-based linguistic processing does not require deep analysis that sacrifices robustness and flexibility, but can handle delicate natural language questions. To maximize portability, we drive various dependent parts into two tight corners, i.e., language-dependent part into front linguistic analysis, and domain-dependent and database-dependent parts into backend SQL query generation.</Paragraph> <Paragraph position="1"> Experiments with 779 queries generate only constraint-missing errors, which can be easily corrected by adding new terms, of 2.25% for English and 5.67% for Korean.</Paragraph> <Paragraph position="2"> Introduction As a natural language (NL) interface, question answering [7] on relational databases allows users to access information stored in databases by requests in natural language [16], and generates as output natural language sentences, tables, and graphical representation. The NL interface can be combined with other interfaces to databases: a We also call it NLIDB (Natural Language Interface to DataBases).</Paragraph> <Paragraph position="3"> formal query language interface directly using SQL, a form-based interface with fields to input query patterns, and a graphical interface using a keyboard and a mouse to access tables. The NL interface does not require the learning of formal query languages, and it easily represents negation and quantification [4], and provides discourse processing [8].</Paragraph> <Paragraph position="4"> The use of natural language has both advantages and disadvantages. Including general NLP problems such as quantifier scoping, PP attachment, anaphora resolution, and elliptical questions, current NLIDB has many shortcomings [1]: First, as a frequent complaint, it is difficult for users to understand which kinds of questions are actually allowed or not. Second, the user assumes that the system is intelligent; he or she thinks NLIDB has common sense, and can deduce facts. Finally, users do not know whether a failure is caused by linguistic coverage or by conceptual mismatch. Nevertheless, natural language does not need training in any communication media or predefined access patterns.</Paragraph> <Paragraph position="5"> NLIDB systems [2], one of the first applications of natural language processing, including &quot;LUNAR&quot; were developed from the 1970s [23]. In the 1980s, research focuses on intermediate representation and portability, and attempts to interface with various systems. CHAT-80 [22] transforms an English query into PROLOG representation, and ASK [20] teaches users new words and concepts. From 1990s, commercial systems based on linguistic theories such as GPSG, HPSG, and PATR-II appear [13], and some systems attempt to semi-automatically construct domain knowledge. MASQUE/SQL [1] uses a semi-automatic domain editor, and LOQUI [3], a commercial system, adopts GPSG grammar. Meanwhile, Demers introduces a lexicalist approach for natural language to SQL translation [6], and as the CoBase project of UCLA, Meng and Chu combine information retrieval and a natural language interface [14]. The major problems of the previous systems are as follows. First, they do not effectively reflect the vocabulary used in the description of database attributes into linguistic processing. Second, they require users to pose natural language queries at one time using a single sentence rather than give the flexibility by dialog-based query processing. The discordance between attribute vocabulary and linguistic processing vocabulary causes the portability problem of domain knowledge from knowledge acquisition bottleneck; the systems need extensive efforts by some experts who are highly experienced in linguistics as well as in the domain and the task.</Paragraph> <Paragraph position="6"> Androutsopoulos [1] [2], which are mainly referenced for this section, classifies NLIDB approaches into the following four major categories.</Paragraph> <Paragraph position="7"> Pattern matching systems: Some of the early systems exclude linguistic processing. They are easy to implement, but have many critical limitations caused by linguistic shallowness [17]. Syntax-based systems: They syntactically analyze user questions, and use grammars that transform parsed trees to SQL queries [23].</Paragraph> <Paragraph position="8"> However, the mapping rules are difficult and tedious to devise, which drops the portability of languages and domains.</Paragraph> <Paragraph position="9"> Semantic grammar systems: The systems adopt techniques interleaving syntactic and semantic processing, and generate SQL queries from the result [19] [21]. They are useful to rapidly develop parsers in limited domains, but are not ported well to new domains due to hard-wired and domain-dependent semantic information [18].</Paragraph> <Paragraph position="10"> Intermediate representation language systems: Most current systems place an intermediate logical query between NL question and SQL [5]. The processes before the intermediate query are defined as the linguistic front-end (LFE), and the other processes as the database back-end (DBE). This architecture has the merits that LFE is DBMS-independent and an inference module can be placed between LFE and DBE. However, the limitation of parsing and semantic analysis requires semantic post-processing. Nevertheless, it is difficult to achieve high quality analysis for database applications.</Paragraph> <Paragraph position="11"> On the contrary, we apply linguistic processing based on lexico-semantic patterns (LSP), a prominent method verified in text-based question answering [10] [12], into NLIDB, and propose multi-level grammars to represent query structures and to translate into SQL queries. Our system is a hybridization of the pattern matching system and the intermediate representation language system. However, our LSP-based pattern covers lexical to semantic matching, and the multi-level grammars for intermediate representation evidently separate the database back-end from the linguistic front-end. Thus, our method has the ability to divide LFE and DBE, but promises greater adaptability due to the hybrid linguistic analysis and the pattern-matching characteristics.</Paragraph> <Paragraph position="12"> The LSP-based linguistic processing does not require deep analysis that sacrifices robustness and flexibility, but handles delicate NL questions. To maximize portability of languages, domains, and DBMSs, we drive the various dependent parts into two tight corners, i.e., the language-dependent part into front linguistic analysis, and the domain-dependent and database-dependent parts into backend SQL query generation. In our LSP description, attribute vocabularies are also represented as semantic classes that represent semantic meaning of words. Thus, the domain-dependent attributes and values are automatically extracted from databases, and get registered in a semantic category dictionary.</Paragraph> <Section position="1" start_page="1" end_page="6" type="sub_section"> <SectionTitle> Multi-level Grammars </SectionTitle> <Paragraph position="0"> A lexico-semantic pattern (LSP) is the structure where linguistic entries and semantic types can be used in combinations to abstract certain sequences of words in a text [12] [15]. Linguistic entries consist of words, phrases and part-of-speech tags, such as &quot;television,&quot; &quot;3D , such as &quot;@model,&quot; &quot;@person,&quot; and &quot;%each.&quot; LSP-based language processing simplifies the natural language interface due to the following characteristics: First, linguistic elements from lexicons to semantic categories offer flexibility in representing natural language. Second, simple LSP matching without fragile high-level analyses assures a robust linguistic model. Third, the use of common semantic types among different languages reduces the burden of cross-linguistic portability, i.e., enhances multilingual expansion. Finally, separation between dictionary and rules easily enriches domain knowledge by minimizing the conflict to describe the rules.</Paragraph> <Paragraph position="1"> Multi-level grammars are designed to construct intermediate representation as the source of SQL query generation. The grammars interpret lexico-semantic patterns obtained from the linguistic front-end, i.e., morphological analysis, and build attribute-value trees for database back-end. We introduce three-level grammars that include lexico-semantic patterns to describe their rules: a QT grammar to determine question types, an AV -TYPE grammar to construct attribute-value nodes (see section 2.1), and an AV-OP grammar to find the relations between the nodes (see section 2.2). Using the QT grammar, query-to-LSP transfer makes a lexico-semantic pattern from a given question [9]. The lexico-semantic patterns enhance information abstraction through many-to-one mapping between questions and a lexico-semantic pattern.</Paragraph> </Section> </Section> class="xml-element"></Paper>