File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/p82-1010_metho.xml

Size: 14,071 bytes

Last Modified: 2025-10-06 14:11:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P82-1010">
  <Title>ENGLISH WORDS AND DATA BASES: HOW TO BRIDGE THE GAP</Title>
  <Section position="4" start_page="0" end_page="57" type="metho">
    <SectionTitle>
II THE ENGLISH-ORIENTED LEVEL OF MEANING
REPRESENTATION
</SectionTitle>
    <Paragraph position="0"> The highest level of semantic representation is independent of the subject-domain. It contains a semantic primitive for every descriptive lexical item of the input-language 2. The semantic types of these primitives are systematically related to the syntactic categories of the corresponding lexical items. For example, for every noun there is a constant which denotes the set of individuals which fall under the description of this noun: corresponding to &amp;quot;employee&amp;quot; and &amp;quot;employees&amp;quot; there is a constant EMPLOYEES denoting the set of all employees, corresponding to &amp;quot;department&amp;quot; and &amp;quot;departments&amp;quot; there is a constant DEPARTMENTS denoting the set of all departments. Corresponding to an n-place verb there is an n-place predicate. For instance, &amp;quot;to have&amp;quot; corresponds to the 2-place predicate HAVE. Thus, the input analysis component .........................</Paragraph>
    <Paragraph position="1"> I There is no space for a definition of the logical formalism I use in this paper. Closely related logical languages are defined in Scha (1976), Landsbergen and Scha (1979), and Bronnenberg et a1.(1980). 2 In previous papers it has been pointed out that this idea, taken strictly, leads not to an ordinary logical language, but requires a formal language which is ambiguous. I ignore this aspect here. What I call EFL corresponds to what was called EFL- in some other papers. SeeLandsbergenand Scha (1979) and Bronnenberg et al. (1980) for discussion.</Paragraph>
    <Paragraph position="2">  of the system translates the question &amp;quot;How many departments have more than i00</Paragraph>
    <Paragraph position="4"/>
  </Section>
  <Section position="5" start_page="57" end_page="57" type="metho">
    <SectionTitle>
III THE DATA BASE ORIENTED LEVEL OF MEANING
REPRESENTATION
</SectionTitle>
    <Paragraph position="0"> A data base specifies an interpretation of a logical language, by specifying the extension of every constant. A formalization of this view on data bases, an&amp; its application to a CODASYL data base, can be found in Bronnenberg et ai.(1980).</Paragraph>
    <Paragraph position="1"> The idea is equally applicable to relational data bases. A relational data base specifies an interpretation of a logical language which contains for every relation R \[K, At, .... An\] a constant K denoting a set, and n functions Al,..., An which have the denotation of K as their domain. ~ Thus, if we have an EMPLOYEE file with a DEPARTMENT field, this file specifies the extension of a set EMPS and of a function DEPT which has the denotation of EMPS as its domain. In terms of such a data base structure, (i) above may be formulated as Count({xe (for: EMPS, apply: DEPT) 1 Count((y e EMPSIDEPT(y)=x}) &gt; i00}). (3) I pointed out before that it would be unwise to design a system which would directly assign the meaning (3) to the question (I). A more sensible strategy is to first assign (I) the meaning (2).</Paragraph>
    <Paragraph position="2"> The formula (3), or a logically equivalent dne, may then be derived on the basis of a specification of the relation between the English word meanings used in (i) and the primitive concepts at the data base level.</Paragraph>
  </Section>
  <Section position="6" start_page="57" end_page="57" type="metho">
    <SectionTitle>
IV THE RELATION BETWEEN EFL AND DBL
</SectionTitle>
    <Paragraph position="0"> Though we defined EFL and DBL independently of each other (one on the basis of the possible English questions about the subject-domain, the other on the basis of the structure of the data base about it) there must be a relation between them.</Paragraph>
    <Paragraph position="1"> The data base contains information which can serve to answer queries formulated in EFL. This means that the denotation of certain EFL expressions is fixed if an interpretation of DBL is given.</Paragraph>
    <Paragraph position="2"> We now consider how the relation between EFL and DBL may be formulated in such a way that it can easily serve as a basis for an effective translation from EFL expressions into DBL expressions. The most general formulation would take the form of a set of axioms, expressed in a logical language encompassing both EFL and DBL. If we allow the full generality of that approach, however, it leads to the use of algorithms which are not efficient and which are not guaranteed to terminate. An alternative formulation, which is attractive because it can easily be implemented by effective procedures, is one in terms of translation rules. This is the approach adopted in the PHLIQAI system. It is described in detail in Bronnenberg et al. (1980) and can be summarized as follows.</Paragraph>
    <Paragraph position="3"> The relation between subsequent semantic levels can be described by means of local translation rules which specify, for every descriptive constant of the source language, a corresponding expression of the target language I * A set of such translation rules defines for every source language query-expression an equivalent target language expresslono An effective algorithm can be constructed which performs this equivalence translation for any arbitrary expression.</Paragraph>
    <Paragraph position="4"> A translation algorithm which applies the translation rules in a straightforward fashion, often produces large expressions which allow for considerably simpler paraphrases. As we will see later on in this paper, it may be essential that such simplifications are actually performed. Therefore, the result of the EFL-to-DBL translation is processed by a module which applies logical equivalence transformations in order ~o simplify the expression.</Paragraph>
    <Paragraph position="5"> At the most global level of description, the PHLIQA system can thus be thought to consist of the following sequence of components: Input analysis, yielding an EFL expression; EFL-to-DBL translation! simplification of the DBL expression; evaluation of the resulting expression.</Paragraph>
    <Paragraph position="6"> For the example introduced in the sections II and III, a specification of the EFL-to-DBL translation rules might look llke this: DEPARTMENTS ~ (for: EMPS, apply: DEPT)</Paragraph>
    <Paragraph position="8"> These rules can be directly applied to the formula (2). Substitution of the right hand expressions for the corresponding left hand constants in (2), followed by X-reduction, yields (3).</Paragraph>
  </Section>
  <Section position="7" start_page="57" end_page="58" type="metho">
    <SectionTitle>
V THE PROBLEM OF COMPOUND ATTRIBUTES
</SectionTitle>
    <Paragraph position="0"> It is easy to imagine a different data base which would also contain sufficient information to answer question (i). One example would be a data base which has a file of DEPARTMENTS, and which has NUMBER-OF-EMPLOYEES as an attribute of this fileo This data base specifies an interpretation of a logical language which contains the set-constant DEPTS and the function #EMP (from departments to integers) as its descriptive constants. In terms of this data base, the query expressed by (i) would be: Count (~x e DEPTSI #EMP (x) &gt; i00}). (5) If we try to describe the relation between EFL and DBL for this case, we face a difficulty which dld not arise for the data base structure of section III: the DBL constants do not allow the construction of DBL expressions whose denotations involve employees. So the EFL constant EMPLOYEES cannot be translated into an equivalent DBL expression - nor can the relation HAVE, for lack of a suitable domain. This may seem to force us to give up local translation for certain cases: instead, we would have to design an algorithm which looks out for sub-expressions of the form I ignore the complexities which arise because of the typing of variables, if a many-sorted logic is used. Again, see Bronnenberget al. (1980), for details.</Paragraph>
    <Paragraph position="1">  (%y: Count( {x EEMPLOYEES IHAVE(y,x)} )), where y is ranging over DEPARTMENTS, and then translates this whole expression into: #~. This is not attractive - it could only work if EFL expressions would be first transformed so as to always contain this expression in exactly this form, or if we would have an algorithm for recognizing all its variants.</Paragraph>
    <Paragraph position="2"> Fortunately, there is another solution. Though in DBL terms one cannot talk about employees, one can talk about objects which stand in a one-to-one correspondence to the employees: the pairs consisting of a department d and a positive integer i such that i is not larger than than the value of #E~ for d. Entities which have a one-to-one correspondence with these pairs, and are disjoint with the extensions of all other semantic types, may be used as &amp;quot;proxies&amp;quot; for employees. Thus, we may define the following translation:  where id is a functionwhich establishes a one- em -to-one correspondence between its domain and its range (its range is disjoint with all other semantic types); rid is the inverse of id ; INTS is a emp function which assigns to any integer i the set of integers j such that 0&lt;j~i.</Paragraph>
    <Paragraph position="3"> Application of these rules to (2) yields:</Paragraph>
  </Section>
  <Section position="8" start_page="58" end_page="58" type="metho">
    <SectionTitle>
Count({x E DEPTS I
Count({y~ U(for: DEPTS,
</SectionTitle>
    <Paragraph position="0"> apply:(%d:(for: INTS(#EMP(d)), apply:</Paragraph>
    <Paragraph position="2"> which is logically equivalent to (5) above.</Paragraph>
    <Paragraph position="3"> It is clear that this data base, because of its greater &amp;quot;distance&amp;quot; to the English lexicon, requires a more extensive set of simplification rules if the DBL query produced by the translation rules is to be transformed into its simplest possible form. A simplification algorithm dealing succesfully with complexities of the kind just illustrated was implemented by W.J. Bronnenberg as a component of the PHLIQAI system.</Paragraph>
  </Section>
  <Section position="9" start_page="58" end_page="58" type="metho">
    <SectionTitle>
VI EXTENDING THE DATA BASE LANGUAGE
</SectionTitle>
    <Paragraph position="0"> Consider a slight variation on question (I): &amp;quot;How many departments have more than i00 people ?&amp;quot; (7~) We may want to treat &amp;quot;people&amp;quot; and &amp;quot;e~!oyees&amp;quot; as non-synonymous. For instance, we may want to be able to answer the question &amp;quot;Are all employees employed by a department ?&amp;quot; with &amp;quot;Yes&amp;quot;, but &amp;quot;Are all people employed by a department ?&amp;quot; with &amp;quot;I don't know&amp;quot;. Nevertheless, (7) can be given a definite answer on the basis of the data base of section IlL The method as described so far hasaproblem with this example: although the answer to (7) is determined by the data base, the question as formulated refers to entities which are not represented in the data base, cannot be constructed out of such entities, and do not stand in a one-to-one correspondence with entities which can be so constructed.</Paragraph>
    <Paragraph position="1"> In order to be able to construct a DBL translation of (7) by means of local substitution rules of the kind previously illustrated, we need an extended version of DBL, which we will call DBL*, containing the same constants as DBL plus a constant NONEMPS, denoting the set of persons who are not employees.</Paragraph>
    <Paragraph position="2"> Now, local translation rules for the EFL-to-DBL* translation may be specified. Application of these translation rules to the EFL representation of (7) yields a DBL* expression containing the unevaluable constant NONEMPS. The system can only give a definite answer if this constant is eliminated by the simplification component.</Paragraph>
    <Paragraph position="3"> If the elimination does not succeed, PHLIQA still gives a meaningful &amp;quot;conditional answer&amp;quot;. It translates NONEMPS into ~ and prefaces the answer with &amp;quot;if there are no people other than employees, ...&amp;quot;. Again, see Bronnenberg et al. (1980) for details.</Paragraph>
  </Section>
  <Section position="10" start_page="58" end_page="58" type="metho">
    <SectionTitle>
VII DISCUSSION
</SectionTitle>
    <Paragraph position="0"> Some attractive properties of the translation method are probably clear from the examples. Local translation rules can be applied effectively and have to be evoked only when they are directly relevant. Using the techniques of introducing &amp;quot;proxies&amp;quot; (section V) and &amp;quot;complementary constants&amp;quot; (section VI) in DBL, a considerable distance between the English lexicon and the data base structure can be covered by means of local translation rules.</Paragraph>
    <Paragraph position="1"> The problem of simplifying the DBL* expression (and other, intermediate expressions, in the full version of the PHLIQA method) can be treated separately from the peculiarities of particular data bases and particular constructions of the input language.</Paragraph>
  </Section>
  <Section position="11" start_page="58" end_page="58" type="metho">
    <SectionTitle>
VIII ACKNOWLEDGEMENTS
</SectionTitle>
    <Paragraph position="0"> Some of the ideas presented here are due to Jan Landsbergen. My confidence in the validity of the translation method was greatly enhanced by the fact that others have applied it succesfully. Especially relevant for the present paper is the work by Wim Bronnenberg and Eric van Utteren on the translation rules for the PHLIQAI data base. Bipin Indurkhya (1981) implemented a program which shows how this approach accommodates the meaning postulates of Montague's PTQ and similar fragments of English.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML