File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/c80-1082_metho.xml
Size: 19,642 bytes
Last Modified: 2025-10-06 14:11:18
<?xml version="1.0" standalone="yes"?> <Paper uid="C80-1082"> <Title>USING A NATURAL-ARTIFICIAL HYBRID LANGUAGE FOR DATABASE ACCESS</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> USING A NATURAL-ARTIFICIAL HYBRID LANGUAGE FOR DATABASE ACCESS Teruaki AIZAWA and Nobuko HATADA NHK Technical Research Laboratories </SectionTitle> <Paragraph position="0"> In this paper we propose a natural-artificial hybrid language for database access. The global construction of a sentence in this language is highly schematic, but allows expressions in the chosen language such as Japanese or English. Its artificial language part, SML, is closely related to our newly introduced data model, called scaled lattice. Adopting Japanese as its natural language part, we implemented a Japanese-SML hybrid language processing system for our compact database system SCLAMS, whose database consists of scaled lattices. The main features of this implementation are (i) a small lexicon and limited grammar, and (2) an almost free form in writing Kana Japanese.</Paragraph> </Section> <Section position="2" start_page="0" end_page="546" type="metho"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Various query languages for database access have been developed, among which unambiguous artificial ones are better adapted to computers. For man, on the other hand, it would be more convenient to communicate with computers in a natural language. The possibility of man-machine communication in a natural language has been one of the main concerns in the field of artificial intelligence, and considerable results have been obtained specifically in research into natural language access to a database. I~5 These results, however, seem to be too complex and inflexible for practical application to general-purpose database systems.</Paragraph> <Paragraph position="1"> We will propose in this paper a &quot;natural-artificial hybrid&quot; language for database access. The global construction of a sentence in this language is highly schematic but allows expressions in the chosen language such as Japanese or English. A Japanese version of this language has been implemented for our compact database system SCLAMS6;(SCaled LAttice Manipulation System). The main features of this implementation are: (I) Use of only a small lexicon and limited grammar so that they are quite easy to implement, and (2) Allowance of almost free form in writing Kana Japanese.</Paragraph> <Paragraph position="2"> Feature (i), which will be achieved also when using other languages like English, French, and so on, is one of the most noticeable merits obtained by using such a natural-artificial hybrid language for database access.</Paragraph> <Paragraph position="3"> We begin with an explanation of our basic logical unit of data, Scaled Lattice, or S.L. for short, since the proposed language is closely related to this unit.</Paragraph> <Paragraph position="4"> 2. SML:Scaled lattice manipulation language</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Scaled lattice as a data model </SectionTitle> <Paragraph position="0"> What the normalization theory in the relational data model tells us can be stated very loosely as &quot;one fact in one place&quot; 8 The concept of Scaled Lattice, or S.L. for short, also goes along this direction.</Paragraph> <Paragraph position="1"> Roughly speaking an S.L. is a multi-dimensional table, and is defined as a collection of data of one species arranged at multi-dimensional lattice points corresponding to the combinations of attribute values. Fig. 1 shows a graphical image of S.L. which represents population data by year, prefecture, and sex.</Paragraph> <Paragraph position="3"> All of male population data are arranged on this axis.</Paragraph> <Paragraph position="4"> Fig. 1 Graphical image of S.L. data model --543--This is an example of three dimensional S.L's, which can be furthermore regarded as a mapping or a function with three variables in the mathematical sense. Let SI, $2, and $3 be finite sets such as</Paragraph> <Paragraph position="6"> Also let A be an appropriate set having enough elements to represent values of population. Then the above S.L. can be naturally regarded as a mapping: F : S1 x $2 x S3 ~ A, (i) which associates any triple (x, y, z) of attribute values in S1 x $2 x $3 with the corresponding population value F(x, y, z). Thus, for example, F (1980, Tokyo, male) denotes the male population of Tokyo in 1980.</Paragraph> <Paragraph position="7"> Generally an S.L. is a mapping F of the direct product of finite sets SI, ..., Sn into an appropriate set A denoted by</Paragraph> <Paragraph position="9"> These sets S1, ..., Sn and their elements will be sometimes called root words and leaf words respectively.</Paragraph> <Paragraph position="10"> The following are the advantages of this data model: (i Data contained in an S.L. can be displayed exactly in the two-dimensional table form, which is visually very understandable.</Paragraph> <Paragraph position="11"> (2 In order to display data in table form, it is necessary to cut out an appropriate two-dimensional cross section from the S.L., or more precisely to select two appropriate scales on which the table is constructed, and, at the same time, to fix the remaining scales at some attribute values.</Paragraph> <Paragraph position="12"> This is nothing but a retrieval operation. Cutting out such a section is very easy, which means that certain retrieval operations are also easy.</Paragraph> <Paragraph position="13"> (3 Since an S.L. is regarded as a mapping, precise and powerful notations concerning &quot;sets and mappings&quot; are directly applicable for manipulation of the S.L. data.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Brief outline of SCLAMS </SectionTitle> <Paragraph position="0"> We have implemented a compact data-base system SCLAMS (Scaled lattice manipulation system), whose database consists of S.L.'s.6, 7 SCLAMS has the following three major modes: (i) Storage mode: Storage of data as a set of S.L.'s editing from any file into the database.</Paragraph> <Paragraph position="1"> (2) Retrieval mode: Selection of one or more suitable S.L.'S from the database.</Paragraph> <Paragraph position="2"> (3) Manipulation mode: Data extraction from the above S.L.'s and some operation on the data.</Paragraph> <Paragraph position="3"> Thus, a retrieval operation according to a user's query is divided into two modes: Retrieval and Manipulation. Retrieval mode is similar to the document retrieval system, and ManiPulation mode to the database system, in a narrow sense, regarding each S.L. as a small file. The main concern of our design of SCLAMS was to combine effectively these two modes, in other words, to integrate the function of document retrieval systems and that of database systems.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Manipulation of scaled lattices by SML </SectionTitle> <Paragraph position="0"> In this paper we will focus our attention exclusively on Manipulation mode of SCLAMS. The major function of this mode is to manipulate S.L.'s in a variety of ways such as extraction of data satisfying specified conditions, join of more than two S.L.'s data, elementary calculations for extracted data, etc. These operations are done through a query language for end users, named as SML (Scaled lattice Manipulation Language).</Paragraph> <Paragraph position="1"> We now show a few examples to illustrate some aspects of SML. Let F1 and F2 be two S.L.'s, i.e. two mappings such as</Paragraph> <Paragraph position="3"> scribers.</Paragraph> <Paragraph position="4"> These S.L.'s may be considered as an output of Retrieval mode.</Paragraph> <Paragraph position="5"> Each example below consists of an informal query and the corresponding formal one expressed by SML. Notice that the SML expressions contain the mathematical notations to describe sets and mappings.</Paragraph> <Paragraph position="6"> Example i. List the male population of Tokyo in 1980.</Paragraph> <Paragraph position="8"> Example 2. List names and the number of prefectures in which the male population in 1980 is greater than one million.</Paragraph> <Paragraph position="10"> In this example B is defined as the set of prefecture X's with the population value FI(1920, X, male) > 1,000,000, and C as COUNT of B, where COUNT is one of aggregate functions prepared in SCLAMS.</Paragraph> <Paragraph position="11"> Example 3. List numbers of TV subscribers in 1980 of prefectures PSn which the female population in 1975 is less than one million.</Paragraph> <Paragraph position="13"> In this example two S.L.'s F1 and F2 are related by a common scale $2.</Paragraph> <Paragraph position="14"> General format of a query or a sentence by SML is shown in Fig. 2.</Paragraph> <Paragraph position="16"> Fig. 2 General format of a query by SML In this format each of variables al,..., am is equal to one of those bl, ..., bn; and the order of bl, ..., bn is arbitrary.</Paragraph> <Paragraph position="17"> The types of expressions can be classified into PShe following six categories: i) Numeral or literal constants; e.g.</Paragraph> <Paragraph position="18"> 1980, Tokyo, male, etc.</Paragraph> <Paragraph position="19"> 2) Aggregate function values; e.g.</Paragraph> <Paragraph position="20"> COUNT (x), SUM (y), etc.</Paragraph> <Paragraph position="21"> 3) S.L.'s values; e.g.</Paragraph> <Paragraph position="22"> F(xl .... , xn) , etc.</Paragraph> <Paragraph position="23"> 4) Set operation formulas; e.g.</Paragraph> <Paragraph position="24"> x & y, xly, x-y, etc.</Paragraph> <Paragraph position="25"> 5) Set definition formulas; e.g.</Paragraph> <Paragraph position="26"> <3, 5, 7, ii>, <Tokyo, Nagoya, Osaka>, <xi:F(xl,...,xi, ...,xn)<y>, etc. 6) Abbreviate notations for elements of a scale, i.e. leaf words; e.g.</Paragraph> <Paragraph position="27"> S.l, S.II-20, etc.</Paragraph> <Paragraph position="28"> * The latter, for example, represents from llth to 20th elements of a scale S.</Paragraph> <Paragraph position="29"> It would be easily seen, from the above explanation, that a query by SML is expressed basically as a set of &quot;nonprocedural&quot; local queries, and thus the query as a whole has also of non-procedural nature.</Paragraph> <Paragraph position="30"> 3. Hybridization of SML with a natural language</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 An illustrative example </SectionTitle> <Paragraph position="0"> We have assured that our query language SML is sufficiently flexible and has strong expressive power, specifically for those who are familiar --545-with mathematical notations concerning sets and mapping s . However, we can also say that SML is less convenient than a natural language which seems to be best suited for casual users. We therefore tried to hybridize SML with a natural language like English, Japanese, etc., believing that such a natural-artificial hybrid language should be one of the milestones to a realization of database systems wholly accessible via unrestricted natural languages.</Paragraph> <Paragraph position="1"> The next example, closely related to Example 2 in the last section, will show us how to hybridize SML with a natural language, say English.</Paragraph> <Paragraph position="2"> Example 4. List names and the number of prefectures in which the male population in 1980 is less than the female population of Tokyo in 1970.</Paragraph> <Paragraph position="3"> Now we consider the following two types of expressions for this query.</Paragraph> <Paragraph position="4"> T_~e I (Original formal expression by</Paragraph> <Paragraph position="6"> are: (i) The global construction is quite similar to that of Type I expression, but it allows us to write phrases in the chosen natural language for definitions of variables such as A, B, and C. (If necessary, some of the variables may retain the original formal definitions.) (2) Notice that variable symbols such as A and C can be embedded in ordinary English phrases, so that the original query expressed as a complex sentence is divided into some simple queries. This contributes to readability of queries both for man and computer.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Features of a Japanese-SML version </SectionTitle> <Paragraph position="0"> We have implemented a &quot;Japanese-SML&quot; hybrid language processing system, as an extension of SCLAMS. The major design goal was to be practical rather than just ambitious. The processing system, which will be called Translator, is essentially a translator of a Japanese phrase into the corresponding SML expression, or in the above terminology, of a Type II expression into its Type I equivalent. The main process of Translator is shown in Fig. 3.</Paragraph> <Paragraph position="1"> Some considerations in achieving practicability of the implemented system are : (i) In our implementation a Japanese sentence or phrase can be written as a string of only Kana characters, in which case it is desirable, for convenience, to guarantee freedom from segmentation as much as possible. Our system indeed allows the free writing of a Kana sentence, as long as the leaf words (the elements of scales) cause no confusion with the reserved words in the lexicon.</Paragraph> <Paragraph position="2"> (2) It is desirable to keep the grammar as compact as possible to save storage space and processing time.</Paragraph> <Paragraph position="3"> This was done by restricting forms of possible Type II expressions.</Paragraph> <Paragraph position="4"> 4. Translation of Japanese into SML</Paragraph> </Section> <Section position="6" start_page="0" end_page="546" type="sub_section"> <SectionTitle> 4.1 Micro-grammar for Japanese </SectionTitle> <Paragraph position="0"> As mentioned in Section 2.3, the set of all Type I expressions are classified into six categories i)~6). Then the possible Type II expressions, which our Translator can accept, are restricted to those corresponding to the categories 2), 3), and a part of 5), i.e. the so-called implicit set definitions. It should be noticed that expressions belonging to the other categories are neatly expressed rather by Type I forms.</Paragraph> <Paragraph position="1"> We now show the lexicon and the grammatical rules prescribing these Type II expressions.</Paragraph> <Paragraph position="2"> Lexical items and their categories. There are 12 categories of lexical items. l) Num : Numbers, e.g.</Paragraph> <Paragraph position="3"> 12, 165.3, -0.137, etc.</Paragraph> <Paragraph position="4"> 2) Naux: Auxiliary numbers, e.g.</Paragraph> <Paragraph position="5"> hyaku, byaku, pyaku, sen, man (hundred, thousand, million), etc.</Paragraph> <Paragraph position="6"> 3) ~ : Names of aggregate functions, 6) Comp____~2: Particle for comparison, i.e. yori, yorimo ( % than).</Paragraph> <Paragraph position="7"> 7) adj : Adjectives, e.g.</Paragraph> <Paragraph position="8"> ookii, hayail shouno, daino (large, early, small, wide), e tc.</Paragraph> <Paragraph position="9"> 8)* Root : Root words, i.e. names of scales, e.g.</Paragraph> <Paragraph position="10"> nen, ken (year, prefecture), etc.</Paragraph> <Paragraph position="11"> 9)* Leaf : Leaf words, i.e. elements of scales, e.g.</Paragraph> <Paragraph position="12"> 1980, Tokyo, otoko (male), etc.</Paragraph> <Paragraph position="13"> l0 * Unit: Words for data units, e.g. en, nin, km (Yen, person, kilometer), etc.</Paragraph> <Paragraph position="14"> ii) * SL : Names of S.L.'s representing the sort of the S.L. data, usually given at Storage mode, e.g.</Paragraph> <Paragraph position="15"> jinko, TV keiyakusha (population, TV subscriber), etc.</Paragraph> <Paragraph position="16"> 12)** Var: Variable names such as A, B, KEN, etc.</Paragraph> <Paragraph position="17"> The items in the categories marked by one asterisk are automatically added to the lexicon at the beginning of Manipulation mode in order to cover those S.L.'s which are passed from Retrieval mode, and deleted after use. They are thus highly application oriented. The lexicon would become very large if it included the items in Leaf category. We tried to exclude them from our lexicon by contriving a recognition method of them from the contexts, so that the lexicon contains only about 100 application independent items plus application oriented ones. Var category marked by two asterisks was also excluded from our lexicon, since the formation rules of this category is well-defined and easily programmed.</Paragraph> <Paragraph position="18"> Grammatical rules. It was sufficient to prepare merely a dozen grammatical rules expressed as context-free-like productions with conditions of application. null cond~V < (cdegmp i) eq } comp 2 adj An example of parsing trees by this grammar is given in Fig. 4. We assume that 'jinko' S.L. is of dimension three.</Paragraph> </Section> <Section position="7" start_page="546" end_page="546" type="sub_section"> <SectionTitle> 4.2 Translation into SML </SectionTitle> <Paragraph position="0"> Translation from Type II expressions in Japanese into Type I expressions in 'pure' SML is performed by using two fundamental tools: a word-for-word conversion table and a conversion procedure.</Paragraph> <Paragraph position="1"> Word-for-word conversion table.</Paragraph> <Paragraph position="2"> This is prepared for the following five categories of lexical items: Agg, compl, adj, Root*, SL*.</Paragraph> <Paragraph position="3"> For the asterisked categories the table is made up whenever Manipulation mode is invoked. A portion of the conversion table is shown in Table i.</Paragraph> <Paragraph position="4"> Conversion procedure. Since the proposed grammar is so compact, we considered that the conversion procedure including syntax analysis would be best realized through a general-purpose programming language, say PL/I, rather than a comprehensive grammar-writing system like ATN. 9) This will also contribute to a portability of the system.</Paragraph> <Paragraph position="5"> The programming considerations were: (1) To insure a free writing of a Japanese Kana phrase, we adopted a left-to-right parsing, predicting the succeedilg category. However, the lexicon does not include the leaf words, we had to impose the restriction that any leaf word should be enclosed by a space or an apostrophe.</Paragraph> <Paragraph position="6"> (2) An SML expression is generated, by introducing a new variable symbol in the form 'SYS**', whenever a partial result of parsing becomes sufficient to do so. (This point can be best illustrated by the &quot; 548 example given below.) (3) Two important steps in a parsing flow are the decisions: a) Which of the initial productions can be applied; S~R, S---~D, or S~V? b) Which~phrase actually appears, R or R?</Paragraph> </Section> <Section position="8" start_page="546" end_page="546" type="sub_section"> <SectionTitle> 4.3 An example </SectionTitle> <Paragraph position="0"> We now return to Example 4 in Section 3.1. That query will be written in Type II form in Japanese as follows. (We adopt here a real notation of our</Paragraph> <Paragraph position="2"/> </Section> </Section> class="xml-element"></Paper>