File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2110_metho.xml

Size: 17,735 bytes

Last Modified: 2025-10-06 14:13:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2110">
  <Title>Design Tool Combining Keyword Analyzer and Case-based Parser for Developing Natural Language Database Interfaces</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 CAPIT Flow
</SectionTitle>
    <Paragraph position="0"> We have been collecting Japanese corpora which untrained users typed from computer terminals in order to access on-line databases. We found that the large part of the corpora arc &amp;quot;Pass me salt&amp;quot; like simple data retrievals front databases. Many sentences have simple grammatical or extra-grammatical structures.</Paragraph>
    <Paragraph position="1"> Complex linguistic patterns are very rare. One extreme example is just a sequence of keywords like, &amp;quot;Dynamic Memory author&amp;quot;, instead of asking &amp;quot;Who is the author of the book titled Dynamic Memory?&amp;quot;.</Paragraph>
    <Paragraph position="2"> We hypothesized that the processing mechanism for such simple expressions is different front a processing mechanism for grammatical expressions, The two parsing module structure of CAPIT reflects this hypothesis. null Figure-1 describes the flow of CAPIT. First, the application designer who develops a NL interface using CAPIT collects the corpora of users' queries in the target domain. A query of tile collected corpora is given to CAP1T one by one. The case-based parser (CBP) tries to interpret the sentence (Step1). If CBP finds a fully matched linguistic pattern in its case base, the corresponding concept is output as the meaning for the input sentence (Step-2). If CBP can not find any matching pattern, ttle NL query is passed to the keyword-bascd parsing module (KBP).</Paragraph>
    <Paragraph position="3"> If CBP finds a pattern which matches with a part of tile query in its case base, CBP replaces the matched part of the NL query with ttle corresponding concept, then passes the modified NL query to KBP (Step-3).</Paragraph>
    <Paragraph position="4"> KBP extracts only keywords from the query, and constructs its meaning (Step-4). KBP always constructs the meaning for a given sentence.</Paragraph>
    <Paragraph position="5"> The meaning generated by CBP and/or KBP, is  shown to the application designer. Tile application designer judges whether or not the interpretation is correct (Step-5). If it is correct, the examination using tbis NL query finishes, mid the next NL query is taken from the corpora for the next examination. If it is not correct, the Pattern Definition Interviewer module (PI)I) is activated. PDI asks the application designer for the correct interpretation of the NL query. He/she defines linguistic patterns and/or semantic concepts and/or the mappings between linguistic patterns and semantic concepts for the NL query (Step-6). The new definition is stored in KBP's knowledge base mid/or CBP's case base. Next time CAPIT encounters the same query or similar queries to tile query, it succeeds in interpreting the queries correctly.</Paragraph>
    <Paragraph position="6"> After numbers of such examinations, CBP's case base becomes rich, and tile NL interface application can be released.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 KBP Mechanism
</SectionTitle>
    <Paragraph position="0"> This section describes the KBP mechanism, using a simple example. Table-1 shows a simple CAPIT target database example. Linguistic patterns are attached as indices whicb refer to specific fields and the values of specific fields of records in tile table. For example, the indices to the &amp;quot;Title&amp;quot; field are &amp;quot;book&amp;quot;, &amp;quot;title&amp;quot;, &amp;quot;book name&amp;quot;, &amp;quot;named&amp;quot;, etc. We call an index to a field name field-name index. An index attached to the value of a field of a record is called field-value index. For example, &amp;quot;the father of AI&amp;quot; is a field-value index to &amp;quot;Minsky&amp;quot; which is the value of tile &amp;quot;Author&amp;quot; field in a specific record. Values of each field of each record is itself a field-value index. For example, &amp;quot;1983&amp;quot; is a field-value index to the value of &amp;quot;Date&amp;quot; field in a record. Field-name indices and field-value indices are stored in KBP's knowledge base.</Paragraph>
    <Paragraph position="1"> KBP always regards the meaning for a given NL query a~s an imperative, &amp;quot;Select records in s table which satisfy specific conditions, and return the value of the requested fields from the selected records&amp;quot;. Tile imperative is represented in SQL:  SELECT field-k, field-l, ...</Paragraph>
    <Paragraph position="2"> FROM target table WHERE field-i = value-i, field-j = value-j ....... ; The KBP algorithm to generate the SQL expression from a NL query is as follows:  l. KBP extracts only field-name indices and field-value indices from a given NL query. The rest of tile NL query arc abandoncd.</Paragraph>
    <Paragraph position="3"> 2. When a field-name index is extracted, its referring field name is kept a.s a SELECT-clause elenlent. null 3. When a field-value index is extracted, its referring field value and the field name of the field value are kept as a WlIERE-clause element, in tile form of (field name = field value).</Paragraph>
    <Paragraph position="4"> 4. After all extracted indices are processed, all SELECT-clause elements and WHERE-clause  elements are merged. Then, they are assigned into a SELECT-FROM-WlIERE structure.</Paragraph>
    <Paragraph position="5"> Next, we explain this algorithm, using a NL query example.</Paragraph>
    <Paragraph position="6"> AcrEs DE COLING-92, NAMES, 23-28 AoOr 1992 7 3 7 PRec. or COL1NG-92. NArcrEs. And. 23-28. 1992 SI: &amp;quot;Show me the books published by S&amp;S&amp;quot;. KBP extracts only &amp;quot;book&amp;quot;, &amp;quot;published&amp;quot; and &amp;quot;S&amp;S&amp;quot; from $1. &amp;quot;Book&amp;quot; is a field-name index to tile &amp;quot;Title&amp;quot; field. &amp;quot;Published&amp;quot; is a field-name index to the &amp;quot;Publisher&amp;quot; field. Since &amp;quot;S&amp;S&amp;quot; is a field-value index to the value of the &amp;quot;Publisher&amp;quot; field, the WHERE-clause clement, (Publisher = S&amp;S) is kept. From these indices, the following SQL command is generated:  The SQL command is evaluated, and its answer is returned. The answer is &amp;quot;Society of Mind&amp;quot; and &amp;quot;S&amp;S&amp;quot;. They are the reply to the above query.</Paragraph>
    <Paragraph position="7"> The actual KBP has several heuristic rules to select SELECT-clause elements and WHERE-clause elements. For example, the right answer to $1 is just &amp;quot;Society of Mind&amp;quot;. &amp;quot;S&amp;S&amp;quot; must not be produced. With the actual KBP, a heuristic rule suppresses the production of &amp;quot;S&amp;S&amp;quot; in the above example.</Paragraph>
    <Paragraph position="8"> Though the actual KBP is more complex than this simple explanation, it is still very simple \[2\]. Since KBP constructs a query meaning from only keywords in a NL query, it can treat extra-grammatical expressions, keyword sequences and linguistic fragments, in the same way as treating ordinary natural language queries. For example, even the following strange queries on Tabled are acceptable by KBP; &amp;quot;Publishers?&amp;quot;, &amp;quot;Dynamic Memory author&amp;quot;, &amp;quot;When the book named Society of Mind appear?&amp;quot;, &amp;quot;Society of Mind, how much&amp;quot;, etc.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The Role of CBP
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 The Situations KBP Fails to In-
</SectionTitle>
      <Paragraph position="0"> terpret KBP can perform a majority of those queries which are simple data retrievals. So, in what kind of situations does KBP fail to interpret? CBP processes only those queries which KBP fails to interpret. The application designer must define pattern-concept pairs which CBP uses to interpret such queries. Therefore, we have to know the limitations of KBP's interpretation capability. The followings are KBP's typical failure cases.</Paragraph>
      <Paragraph position="1"> Failure-1 Cases an application designer forgot to define necessary pattcrns as indices: If a necessary linguistic pattern is not defined as either field-name index or field-value index, KBP can not interpret concerning NL queries correctly.</Paragraph>
      <Paragraph position="2"> Failure-2 Cases a NL query includes idiomatic expressions or spatial expressions: KBP can not generate correct meanings, if idiomatic expressions like &amp;quot;greater than 10ft', or spatial expressions like &amp;quot;the switch between A and B&amp;quot; are included in a NL query.</Paragraph>
      <Paragraph position="3"> Failure-3 Cases the meaning for a NL query is not represented in tile form of SELECT-FROM-WHERE: KBP assumes that any NL query is translated into a SELECT-FROM-WHERE structure. If a NL query has a different SQL structure, like SELECT-FROM-GROUP BY-tIAVING, KBP can not generate a correct meaning. For example, a NL query like &amp;quot;Select author and its amount which is bigger than 1000&amp;quot; are represented with the SELECT-FROM-GROUP BY-I1AVING structure.</Paragraph>
      <Paragraph position="4"> Failure-4 Cases the meaning for a NL query can not be represented in SQL language: If a NL query is a meta-level question for the target database, like &amp;quot;What kind of information can I get from this?&amp;quot;, KBP can not interpret it.</Paragraph>
      <Paragraph position="5"> Failure-5 Cases KBP generates many candidate interpretations of a NL query: Since KBP generates tile meaning for a NL query using onty keywords in the query, it sometimes generates not only a correct meaning but also wrong meanings. \['or examptc, KBP generates several different meanings from the following query; &amp;quot;Show me the publisher of the book titled L.A.&amp;quot;.</Paragraph>
      <Paragraph position="6"> In order to avoid these KBP's failures, when KBP encounters these failures, the application designer must repair the failures, by enriching and modifying either KBP's knowledge base and/or CBP's case base. Such a failure-repair mechanism is analogous to those of case-based reasoning \[6\] \[8\].</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Repairs of KBP's Failures
</SectionTitle>
      <Paragraph position="0"> There are four repair types of the KBP's failures.</Paragraph>
      <Paragraph position="1"> Three of the four are realized by defining a new linguistic pattern-concept pairs in CBP's case base.</Paragraph>
      <Paragraph position="2">  This is corresponding to Failure-l, and is the easiest of the four repmr types.</Paragraph>
      <Paragraph position="3"> Repair-2 To define a pattern-concept pair, where the concept part is represented as SELECT-clause elements and/or WHEH.E-clause elements: This is corresponding to Fuihtre-2. This is usefill to define idiomatic expressions or spatial expressions. Suppose that KBP could not interpret a NL query which included an expression, &amp;quot;price is more than $100, and less than $200&amp;quot;. The aPl)lieation designer judges that the part of the query mnst be defined as a pattern-concept pair. Then, he/she defines a new pattern-concept pair: \[Definition- 1\] If a pattern sequence is: \[ &amp;quot;fiekl-nanm(Field), 1 {Field i~typc-of numerical}, ~ more than, number(N1), le~s thmt, number(N2)&amp;quot; 1, do the followings:  (1) to kee l) a field name, &amp;quot;Field&amp;quot;, ,as a SELECT-clause element, and (2) to keep an expression, &amp;quot; Fiekl &gt; N1, Field &lt; N2&amp;quot;,  as a WHERE-clause element.</Paragraph>
      <Paragraph position="4"> This definition means selecting records whose &amp;quot;Field&amp;quot; has the value more than N1 and less than N2, and returning the value of &amp;quot;Field&amp;quot; of the .selected records. Repair-3 '1&amp;quot;o define a pattern-concept pair, where the concept part is represented as an SQL expression which is not SELECT-FROM-WHERE: This is corresponding to Failure-3. The application IA terliu starting with a capital letter is a variable. 2An expression tlurrounded by a pair of brace ({ ta*d )) is a constraint to be satisfied. It ia a meta~level description, al~d is not regalx|ed as a Imrt of pattern aequellce.</Paragraph>
      <Paragraph position="5">  in CBI' for \]b~pair-4 designer nmst enumeratively detine a new SQL structure corresponding to a given linguistic pattern (See Figure-2).</Paragraph>
      <Paragraph position="6"> Repair-4 'fb define a pattern-concept pair, where the concept is represented im u senlantic concept which is a recta-level expression for the target database and can not be detined as an SQI, form: This is corresponding to Failure-4. CAPIT provides a frame-like tanguage to deline semantic concepts. The application designer detincs a new scm~mtic eonccl)t using the language, lie/she also defines a reply gem eration procedure. The procedure is called when the corresponding linguistic pattern is matched with an input qucry (See Figure-3).</Paragraph>
      <Paragraph position="7"> Repair-4 is tile most dilficult of all repair types for an apl)tieation designer. In Repair-d, he/she must dctine not only a new semantic concept, but al.qo the definitions of slots in the semantic cnncept, the procedures which fill the slots, the relations between the new semantic concept with existing other sentantic coucepts~ various constraiuts anlong concepts, etc. lIowever, relnember that he/she must carry out such eoml)licated tasks to all possible linguistic patterns in his/her target domain, if he/she uses the case-based parsing approach alone.</Paragraph>
      <Paragraph position="8">  5 Dialogue Example between PDI and an Application Designer null PDI (Pattern Definition interviewer) is CAPIT's interface to all application designer. A dialogue between PDI and an application designer progresses as follows: 1. PDI shows the application designer a NL query which both KBP and CBP have failed to inter- null pret. And, it asks him/her to define the correct interpretation to process the input NL query.</Paragraph>
      <Paragraph position="9"> 2. The application designer analyzes tile reason why KBP failed to interpret the NL query.</Paragraph>
      <Paragraph position="10"> 3. Tile application designer selects a repair type  of the failure, and performs the repair. The definition is stored in either KBP's knowledge base or CBP's case base. Here, he/she can generalize/modify the linguistic pattern, using linguistic pattern generalization/modification operators \[10\].</Paragraph>
      <Paragraph position="11"> 4. PDI retries interpreting the NL query again, and asks the application designer whether or not the new interpretation is correct. If it is correct, the definition process of the NL query ends. If it is not correct, go back to 1.</Paragraph>
      <Paragraph position="12"> Next, we show a typical sample dialogue between PD1 and an application designer. The situation is that the application designer is developing a guidance system which can understand various natural language queries on a specific commercial VCR. The guidance system has an internal database containing data about the functions and the elements of tile specific VCR. Each of them is represented its features in a record of the vet-function-table (Figure-4). The dialogue is an example of Failure-2 and Repair-2. In this example, KBP and CBP are cooperatively generating the meaning for a given sentence.</Paragraph>
      <Paragraph position="13"> Suppose, CAPIT is trying to interpret a new input sentence, $2: &amp;quot;Why does PAUSE exist?&amp;quot; Since CBP finds no matching pattern, $2 is sent to KBP. KBP extracts keywords from the sentence. Then, KBP generates its meaning. The KBP's interpretation and its generating meaning is shown to the application designer. He/she rejects them. He/she defines a new linguistic pattern which matches with the part of $2, &amp;quot;why omissible(does) * exist?&amp;quot; as a field-name index to the &amp;quot;function&amp;quot; field of the target database (See Figure 4). Here, &amp;quot;omissible&amp;quot; is a linguistic pattern modification operator \[10\], and the special symbol, &amp;quot;*&amp;quot;, ill a linguistic pattern, is a CAPIT's pattern definition notation, which means that it matches with any sequence of words. This definition means that the reason why a specific element exists is described in the &amp;quot;function&amp;quot; field of its corresponding record. Aftcr tire designer defines tile repair of KBP's failure, PDI tries to interpret the same sentence again. This time, since CHP matches &amp;quot;why omissible(does) * exist&amp;quot; with a part of the $2 sentence, CBP replaces tile matched part of tile $2 sentence with its corresponding concept, that is the &amp;quot;function&amp;quot; field. As a result, the input sentence is transformed into, $2': &amp;quot;field-name(function) PAUSE ?&amp;quot;.</Paragraph>
      <Paragraph position="14"> The transformed input sentence is passed to KBP.</Paragraph>
      <Paragraph position="15"> KBP extracts keywords from the input sentence.</Paragraph>
      <Paragraph position="16"> The extracted keywords are field-name(fimetion) and field-value(PAUSE). KBP generates a new SQL expression, which is different from the previous one. The application designer judges if the new interpretation is right.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML