File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2166_metho.xml

Size: 15,254 bytes

Last Modified: 2025-10-06 14:13:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2166">
  <Title>A Dutch to SQL database interface using C4cnc, e lized Quantifier Theory</Title>
  <Section position="3" start_page="7029" end_page="7029" type="metho">
    <SectionTitle>
2 GSR: GENERAL OUTLINE
</SectionTitle>
    <Paragraph position="0"> The question what GSR should look like was to a l~rge extent tackled in a very pragmatical way. As far as the linguistic module of the program is concerned, the following criteria were formulated. GSR had to be a formal representation (i) with sufficient expressive power so theft every possibly useful query can be formulated in it in a not too complex fashion, (ii) that is relatively easy to reach computationally, starting off from natural language.</Paragraph>
    <Paragraph position="1"> A general observation is that, considering the kind of NL sentences one can expect as input to the system, GSR, inevitably had to differ from logical formalisms such as the ones used in formal semantics (focussing on propositions). In view of the general decision to work with intermediate, semantic expressions the denotation of which is the answer to the NL questions, the basic types of complete expressions listed in Pig. 3 were found useful. In this figure 9~ stands for an arbitrary proposition in some logical language L. The e.xtension of L created by introducing these new types will be called L '.</Paragraph>
    <Paragraph position="2"> (i) propositions (format: C/p), to be used when people ask yes-or-no</Paragraph>
  </Section>
  <Section position="4" start_page="7029" end_page="7029" type="metho">
    <SectionTitle>
3 FROM DUTCH TO GSR
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="7029" end_page="7029" type="sub_section">
      <SectionTitle>
3.1 ~ and V: problems
</SectionTitle>
      <Paragraph position="0"> The traditional.way of coping with quantification in NL database interfaces is by using _~ and V, the classical first order predicate logic (PL) instrmnents (see e.g. Warren &amp; Pereira, 1982). This approach, however, does not meet the criteria set out above. To illustrate this, we basically rely on two observations Barwise &amp;.</Paragraph>
      <Paragraph position="1"> Cooper (1981) made to show a fundamental difference in the natures of NL and PL. Their observations will be 'transposed' to the computational application at hand.</Paragraph>
      <Paragraph position="2"> The first observation is illustrated in figure 4, which contains some Dutch questions attd their most natural PL' counterparts. Whereas the Dutch sentences have the same syntactic structure, their PL' counterparts have different formats. These and many other examples suggests that there is no trivially compositionM way of translating NL expressions to their nearest PL' equivalents. The problem is thai; the quantiticational information, which in NL has a fixed location, is spread over the PL' expression in a seemingly arbitrary way.</Paragraph>
      <Paragraph position="3"> It may be concluded that criterium (ii) for a good GSR is violated.</Paragraph>
      <Paragraph position="4">  A second, more serious reason f'or the inadequacy of E and V is that some forms of Nil, quantification (:art only be expressed in a very complex way (e.g. Fig. 4, examples 2 and 3) or simply cannot be expressed at all (e.g. Fig. 4, example 4). llere criterium (i) is not satisfied.</Paragraph>
      <Paragraph position="5"> A third problem, mentioned in Kaan, Kas &amp; Puhland (1990), is that in practice, e.g. in implementations, one is tempted to make rough translations, and to neglect nuances or strong conversational implicalures in natural language, when one is limited to 3 and V. So, for instance, in Warren &amp; Pereira (1982) %', 'some' and 'the' all are simply interpreted as ~.</Paragraph>
    </Section>
    <Section position="2" start_page="7029" end_page="7029" type="sub_section">
      <SectionTitle>
3.2 L(GQ)': a solution
</SectionTitle>
      <Paragraph position="0"> There are many ways to try atnl get around the shortcomings of the traditional approach. To score better on criterium (i), i.e. to increase expressive power, one could consider the introduction of nnmbers in the logical formalism. Only, one can imagine that, if made in an ad hoc way, this extension could result in a hybrid formalism (with respect to quantification) showing an even greater syntactical mismatch with NL (decreasing the score on criterium ii).</Paragraph>
      <Paragraph position="1"> A solution for these probleins was first explored by Montague (1973), and later thoroughly worked out by Barwise &amp; Cooper (1981)in a formalism called L(GQ).</Paragraph>
      <Paragraph position="2"> In contrast to traditionM Pfi, which only has 3 and V, the language of generalized qnantifiers L(GQ) specilies no limitation of the number of primitives to express quantification. All kinds of del, erminers ('.an be used.</Paragraph>
      <Paragraph position="3"> The translation of the examples of Vig. 4 to L(GQ)' is given in Fig. 5. Some special notational conven- null th~_~({,, I ~,,~vtow4:,:)}, {~ I ..... .,,i~a(x)}) gijn precies drie werknemers gehuwd? e:t: act ly Zl ({a: I  The. denotation of I,(GQ)' determiners is de.fined ~l; a meta--h',vel, Some (,xamples are given in (1) Co (/1). 12, these examples I stands for m~ inl, erpreA, al;ion funcl;ion mNq)ing ~m expression on its de.notation.</Paragraph>
      <Paragraph position="4"> &amp;quot;r,... (it (x(v) \ *(x)) = ~) b) ~(,11(~, x)) :: i,,,,~,~,~ (oth,,,'wi~;(0</Paragraph>
      <Paragraph position="6"> In Fig. 5 the sl;ructural similarity of the NI, expressions is |'eflected in that of the I,(GQ)' expressions.</Paragraph>
      <Paragraph position="7"> l;urthermore., all N l, e.xamples (;&amp;II\[le expressed almost equally easily in \[,(QQ)'. By consequence, the formalism does not \[brce peol)le tel Ioe satisfied wil,h rough l;r~msl~l,ions. In shorl;, 1;t1(; problems o\[' l, radil, ional logical quant, ifi(;ation are overcome.</Paragraph>
    </Section>
    <Section position="3" start_page="7029" end_page="7029" type="sub_section">
      <SectionTitle>
3.3 L(GQ)': complications
</SectionTitle>
      <Paragraph position="0"> Unfortnnately, there are two reasons for not, consider-.</Paragraph>
      <Paragraph position="1"> ing I,(GQ)' an ideal sohll, ion. The first probhml actually is not typic~d of I,(GQ), lml, of l;he fact that B~rwise &amp; COOl/er take over i;he Mont~Govian way of coping with I)ossible ambiguiW due to phenomena of quantilier scope. In these cases one reading is gener;*ted in ~t straighl,forwa,rd way by H~rwise &amp;, COOlmr. To allow for altern~ttive red,dings, they introduce extra machinery (called t,\]2o 'qumll, itieat;ion rule').</Paragraph>
      <Paragraph position="2"> 'l'he l~ttl, er iFleella, lliSlil~ howevel; coiivelli(:ll~; \[1!o211 it l,heoretic?fl point of view, is rather imph~me~ntation-.</Paragraph>
      <Paragraph position="3"> unfriendly. It Ol)eral;es ou coml)lete sl, ructural descriptions (=non-t;rivial trees), and generat,es comph'l,e strucl, urM descril/tions. Allowing for such ;~ rule drastically changes the l)rolih~ of I, hc pm:ser thai; is needed. The second problem is (,h~l, il, is undesirable for GS It, being mt interface \[;mgm~ge with ;~ non NI,P me(hilt, to court,in the set of (NL instlircd ) det, erminers l;hal, I,(GQ)' contains. It wonld prol)~d)ly be I)el.ter it' GSt{ had fewer primilJves, prererably of a l,ype not com.</Paragraph>
      <Paragraph position="4"> plei;ely uncust, oma,ry in traditiomd I)I}MSs.</Paragraph>
    </Section>
    <Section position="4" start_page="7029" end_page="7029" type="sub_section">
      <SectionTitle>
3.4 GSR: an L(GQ)' derivative
</SectionTitle>
      <Paragraph position="0"> As a soluti(/n for these problems I,(GQ)' gets two new neighbours in the 1,ransh~tion process, as shown in</Paragraph>
    </Section>
    <Section position="5" start_page="7029" end_page="7029" type="sub_section">
      <SectionTitle>
\[,(Jill
</SectionTitle>
      <Paragraph position="0"> In order to avoid l,he N)pli(:~Ltion of the 'qu~mlJtication rule', th(; choice has been to first generate, an expression that is neul;ral wil, h resl)ect, l;o l, he SeOlle of its quantifie.rs (SR.1), and then solve the scope I)rOt&gt; lem in a second step, hereby generating m2 I,(GQ)' ex~ press|on. The 1,rick of first ge.ne.rating a scope-neula'M expression is not new. I,'or instance, it, is used in the l,OQUl system (see Gailly, l~.ibb('.ns &amp; Binot, 1990).</Paragraph>
      <Paragraph position="1"> The originality lies ral, her in the eflbrt to respect well-l'ormedne.ss iut, he scope-neutral expre.ssions.</Paragraph>
      <Paragraph position="2"> hdbrn,ally speaking, SILl is a llre.dieate-logic;fl for-realism in which the arguments of the llredicates ~Lre inte.rnally structure.d as l;hl~ N \[, arguments of verbs. The most imt)ort~mt (:onsequence is that del;erminers are local,ed within the predi(:~fl;e-~rguments, q'o give an example, 'Werk('n alle werknemers a~m l,wee projekten?' (l)o all employees work on two projects?) wouhl be represent;ed ;ts (5). For idenl;il;y and cm:dinMil,y ques t.ions l, he formats in gig. 3 ;~re rn~(le SUl)ertluous by the pseudo .(M;ermin(:rs Wll and CARl). For insl,ance, |;he quest,|e22 'Uelke werktlem('.rs werkell aan t,wee pro jekten?' (Which e.mph}yees work on two projects?) is translated to (6).</Paragraph>
      <Paragraph position="3"> ..... q:(all({a&amp;quot; I .... ploy,.'e(x)}), 2({~: I proj~:ct(x)})) (5) ...... k(W//({~: I .... ployee(,v)}), 2({:c I p,'ojeet(x)})) (Ci) The l, ranslation of NL l,o SI{1 is a sl, raightA'orward eoint)ositionM process, compar~tbh'~ t,() the I}arwisc ()oopcr processing of readings for which no 'quantification rule' is ne.eded. The algorithm lbr going fi'om SRI 1,o L(GQ)' is given in l?ig. 7.</Paragraph>
      <Paragraph position="4"> If an S1{1 expression contains a pseudo-determiner W\]I or CAll.l), the schema in Fig. 7 is adapl;ed as fol lows. In the first step the arguments with reM determiners are replaced by w~rb~bles vl up to v~,, ~md the cial w~riable v0. l!'urther, |;he result ~o of the norm;fl secolld sl,(}l/ is IAil'l|ed into ;t se|, expression or ~t numer~ i~l ,:xt,,:,~.~.io. ({,,,, I &amp; A ~} ,l,: #({~,~ I ,v,, A ~})) 'rhe t, hird step, which is ~o-inl, ernal, remains unchanged.</Paragraph>
      <Paragraph position="5"> The essent;iM part in Fig. 7 is l;he procedure that d(' te.rmines th(; possi/)le SCOlle-configur~tions. In l;he progl'a, lrl only one, I;he most I)robad)le scope-conligurat,ion is genexai;e(l. The algorithm st~d, es that &lt;,he e~rlier some quantifier occurs in I;he N l, e.xpression, the larger its s(:ol)e should be in the I,(GQ)' expression. In t, he</Paragraph>
      <Paragraph position="7"> Every argument Di(,~i) is replaced by a new, unique variable vi (i E {1,. .... })</Paragraph>
      <Paragraph position="9"> An independent procedure is run to determine tim probable or possible scope of the determiners. The determiners are wrapped around the initial proposition according to this scope. Formally the scope-determining procedure generates for every probable or possible reading a permutation f of {1,... ,n}.</Paragraph>
      <Paragraph position="11"> The remaining lacunes are filled up by adding, as shown, to every determiner 0i its original domain-set Si, and the variable vi that was introduced before to replace</Paragraph>
      <Paragraph position="13"> NL fragment that was tested extensively with the program, this procedure proved to be amazingly accurate (see Specimen, 1992, 85 98). The fllture goal, however, is that instead of on(.' most probable reading a list of all possible readings, tagged with a degree of probability, is generated. Since the procedure is a separate module, any extension or alteration of cat, be made without affecting the rest of the program.</Paragraph>
      <Paragraph position="14"> What remains to be overcome, is the fact that introducing a large set of determiners in GSH. would burden the interpreters used it, the database subsystem with an extra, NLP-type recognition tank. This problem is solved by giving L(GQ)' a righthand neigbonr (see Fi X. 6 in which the determiners are replaced by what was originally the recta-level definition of their semantics (see (1)-(4)). In the resulting I,(GQ)' derivative, called GSR, the numl)er of primitives (set, set intersection, set difference, set cardmality, ...) is drastically reduced. Fnrthermore, the new primitives are much closer to, and even at the heart of, the procedural and semantic building blocks of traditional computer science in general, and of relational DBMSs in particnlar.</Paragraph>
      <Paragraph position="15"> An example of the complete procedure, going from SILl to L(GQ)' to (\]SP~, is given in (7) up to (9). '\['he question is 'Zijn alle werknemers gehuwd?' (Are all employees married?).</Paragraph>
      <Paragraph position="16"> all({:q l employee(:q)}, {xl lmarried(x,))) (8)</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="7029" end_page="7029" type="metho">
    <SectionTitle>
4 FROM GSR TO SQL
</SectionTitle>
    <Paragraph position="0"> As the NLP subsystem, the database subsystem is hilly implemented. However, we shall restrict ourselves to a very brief sketch of its functionality here. As can be seen in Fig. 2, a GSH, expression is first translated to a formalism called I)BSIL This was clone for reasons of modularity, primarily for facilitating the extension of the system to dill?rent target languages.</Paragraph>
    <Paragraph position="1"> DBSR, which stands for DataBase specitic Semantic Representation, is a declarative relational database query language that is both close to GSR and easily translatable to any of the commercialized \]{.I)BMS query languages. Apart from the treatment of quantification the formalism is very sffnilar to relational eah:nlt, s. The major effort in the step fron, (\]S\[{ to I)BSR lies in adapting GSl{-terminology to concrete names of tabels and columns of a database. This is done using a Dl3-1exicon, which can be seer, as an augmented l~,t{-mode\] of a data/)ase.</Paragraph>
    <Paragraph position="2"> The last step, from I)BSR to SQL, is extremely straightforward. Sets and cardinality expressions are translated l,o (sub)qneries. Relations between sets or cardinality expressions are. translated to conditions for (sub)queries.</Paragraph>
    <Paragraph position="3"> For completeness, an example of the database sub-system ontlmt is given. For the last example of the foregoing section a I)BSI{ expression and an SQI, query are giver, in (10) and (11)respectively. YES contains</Paragraph>
  </Section>
  <Section position="6" start_page="7029" end_page="7029" type="metho">
    <SectionTitle>
5 IMPLEMENTATION
</SectionTitle>
    <Paragraph position="0"> The system is written in Common Lisp (according ;o the' de facto standard Steele,90) and generates star&gt; dard SQL queries (ISO). It has prow'd to be a perfectly portable product. Originally written on a Macintosi SE/30, it has afterwards been tested on several Symbelies, Macintosh and PC platforms.</Paragraph>
    <Paragraph position="1"> The major modules of the linguistic component are a 'letter tree' tool for efficient communication with the lexicon, a transition network based morphological analysis tool, and an augmented chart parser for syntactic and semantic analysis.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML