File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1088_metho.xml

Size: 13,726 bytes

Last Modified: 2025-10-06 14:11:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1088">
  <Title>A Response to the Need for Summary Responses</Title>
  <Section position="3" start_page="432" end_page="433" type="metho">
    <SectionTitle>
3. The Heuristics
</SectionTitle>
    <Paragraph position="0"> The heuristics employed in the system are procedural in nature. They guide the system to search for various patterns that may exist in the data. The heuristies are linearly ordered; they range from simple to complex. The ordering of the heuristics assumes that if more than one descriptive answer can be obtained for a query, it is sensible to produce the &amp;quot;simplest&amp;quot; one. The equality heuristic determines if all data values appearing for a particular attribute A in T~ are the same (say, c~). If so, and if no tuple in T,~u.~ has the same value for the attribute A, the general formulation of the response is: &amp;quot;All tuples having the value ~ for attribute A.&amp;quot; The particular value under consideration must be one of the designated &amp;quot;distinguishing values&amp;quot; for the attribute. Response $1-2 (above) is an example of what this heuristic would do.</Paragraph>
    <Paragraph position="1"> The dual of the equality heuristic is the inequality heuristic where instead of looking for equalities, the system searches for inequalities. The inequality heuris- null tic enables the system to produce responses such as: Q2: Which students are taking makeup courses? $2: All students with non-Computer Science undergradus~te background.</Paragraph>
    <Paragraph position="2">  Here, the value &amp;quot;Computer Science&amp;quot; for the attribute Ui~T~rERSITY-DEPARTMENT in the database under consideration may be considered a distinguishing value. If the equality or inequality heuristics are not applicable in their pure form and there are a &amp;quot;few&amp;quot; (&amp;quot;few&amp;quot; depends on the relative number of tuples in T~ and run~ and some other factors) tuples in Tu~, which do not satisfy the requirement of the heuristic, a modification of the response produced by the heuristic may be presented to the user. An example of such a modification is seen in the following: Q3: Which students are receiving University scholarships? $3: All but one foreign students. In addition, two Canadian students are also receiving University scholarships.</Paragraph>
    <Paragraph position="3"> Another set of heuristics, the range heuristics, determine if the data values for an attribute in the tuples in T~ are within a particular well-defined range. There are two main types of range heuristics - one is concerned with maximum values and the other with minimum values. We will discuss only the maximum range heuristic here.</Paragraph>
    <Paragraph position="4"> The maximum heuristic determines if the values of an attribute for all tuples in T~., are below a particular limit while the values of the attribute in all tuples in T,,~, are not. An example response produced by the maximum heuristic is: Q4: Which students have been advised to discontinue studies at the University? $4: All students with a cumulative GPA of 2.0 or less.</Paragraph>
    <Paragraph position="5"> In some cases, the maximum and minimum heuristics may be used together to define the end-points of a range of values (for some attribute) which the tuplcs in Tq~ satisfy. This results in a range specification. If a is the minimum value and ~ is the maximum value of the attribute A in T~, then the corresponding response is: &amp;quot;All tuples with the value of attribute A ranging from ~ to ~&amp;quot; An example of an answer with range specification is: Q5: Which students are in section 1 of CMPTII0.3? $5: All students with surnames starting with 'A' through 'F'.</Paragraph>
    <Paragraph position="6"> There are several heuristic rules which the system follows in producing answers with range specification. For example, one of these rules limits the actual range specified in an answer to 75% or less of the potential range of the attribute values. This limitation of 759~ is not sacrosanct; it is an arbitrary decision by the implementor of the knowledge base. In the current implementation it is believed that if the actual range is more than 75~o of the potential range, no special meaning  can be attributed to the occurrence of this range in Trj~.</Paragraph>
    <Paragraph position="7"> Another rule requires that the actual range specified in an answer must not be so small as to identify the actual tuples which constitute the answer. For example, we should not produce a response such as: &amp;quot;All students with student-id-no between 821661 and 821663&amp;quot; In fact, such answers are not brief when compared to the size of the set of tuples which they qualify.</Paragraph>
    <Paragraph position="8"> A more complex heuristic is the conjunction heuristic. If all values of an attribute A in To., satisfy a relation R {in the mathematical sense) and there are some tuples in Tu,~., in which the values of the attribute A satisfy this relation R, the system attempts to determine via the above heuristics if there is/are some &amp;quot;interesting&amp;quot; distinguishing characteristic(s) which the set T~ satisfies, but the set of tuples in 2&amp;quot;u,~., satisfying the relation R do not. Let us call the distinguishing characteristic(s} D. The general formulation of the response is &amp;quot;All tuples which satisfy the relation R for attribute A and have the characteristic(s) D.&amp;quot; An example is: Q6: Which students are working as T.A. and R.A.? $6: Students who have completed at least two years at the University and who are not employed outside the University.</Paragraph>
    <Paragraph position="9"> If none of the above heuristics can be applied successfully, the disjunction heuristic attempts to divide the tuples in T~ into a number of subsets and determine whether the above heuristics are appropriate for all of these subsets. The number of such subsets should be &amp;quot;small&amp;quot;; if too many subsets are identified, it is no more elegant than listing the data, which we are trying to avoid. The number of allowable subsets partially depends upon the number of tuples in T~ An example showing three partitions based on the values of three different attributes is: QT: Which graduate students are not receiving University scholarships? $7: Students who are receiving NSERC scholarships or have cumulative GPA less than 6.0 or have completed at least two years at the University.</Paragraph>
    <Paragraph position="10"> If none of the above heuristics produces a satisfactory response, the foreign-key heuristic searches other &amp;quot;related&amp;quot; relations. A related relation is one with which the relation under consideration has some common or join attribute(s). The names of such related relations and the attributes via which such a relation can be joined with the original target relation can be obtained from the knowledge base to be discussed later. An example of such a dialogue is: Q8: Which students are taking 880-level courses? $8: All second year students. In addition, two first year students are also taking 880-level COUrses.</Paragraph>
    <Paragraph position="11"> While attempting to answer Q8, the system finds that the question pertains to the relation COURSE-REGISTRATIONS. However, it fails to obtain any interesting descriptive pattern about the tuples in T~ by considering this relation alone. Hence, the system consults the knowledge base and finds that the relation COURSE-REGISTRATIONS can be joined with the relation STUDENTS. It takes the join of all the tuples constituting T~., with the relation STUDENTS and projects the resulting relation on the attributes of the relation STUDENTS. Let us call these tuples T,,,_o~. Next, it attempts to discover the existence of some pattern in the tuples in T,e~-~. It succeeds in producing the response given in $8 by employing modified equality heuristic.</Paragraph>
  </Section>
  <Section position="4" start_page="433" end_page="434" type="metho">
    <SectionTitle>
4. The Knowledge Base
</SectionTitle>
    <Paragraph position="0"> The knowledge base incorporates subjective perceptions of the user as to the nature and contents of the database. It consists of two types of frames - the relation and the attribute frames. These frames may be considered to be an extension of the database schema.</Paragraph>
    <Paragraph position="1"> The frames are created by the interface builder, and ditterent sets of frames must be provided for ditterent types of users and/or different databases.</Paragraph>
    <Paragraph position="2"> Each relation frame corresponds to an actual relation in the database; it provides the possible links with all other relations in the database. In other words, these frames define all permissible joins of two relations. If a direct join is not possible between two specific relations, the frame contains the name of a third relation which must be included in the join. The information in the relation frames is useful in the application of the foreign-key heuristic.</Paragraph>
    <Paragraph position="3"> The attribute frames play a role in our system similar to that played by McCoy's axioms \[9\]. Each attribute frame corresponds to an attribute in the relations in the database. In addition to a description of the attributes, these frames indicate the nature and range of the attribute's potential values. The expected range of values that an attribute may assume is helpful to the range heuristics. Information regarding the relative preferability of the various attributes is also included.</Paragraph>
    <Paragraph position="4">  Each attribute frame also contains a slot for &amp;quot;distinguishing values&amp;quot; which the attribute might take. This slot provides information for distinguishing a sub-class of an entity from other sub-classes. The contents of this field are useful in producing descriptive responses to users' queries. This slot contains one or more clauses, each of the following format C\[ \]' means optionality; '...' means arbitrary number of repetitions of the immediately preceding clause):</Paragraph>
    <Paragraph position="6"> If all the values of the attribute in T~ satisfy &amp;quot;applicable-operator-l-l&amp;quot; with respect to the contents of the list &amp;quot;list-of-distinguishing-values-l&amp;quot;, the actual values may be termed as &amp;quot;denomination-l-l&amp;quot; for producing responses. If the value of &amp;quot;denomination-l-l&amp;quot; is null, no names can be attached to the actual values of the attribute.</Paragraph>
    <Paragraph position="7"> The Distinguishing Values slot enables the implementor to specify classifications that he would a priori like to appear meaningfully in descriptive responses.</Paragraph>
    <Paragraph position="8"> This information enables the system to faithfully reflect the implementor perceived notions regarding how a database entity class may be appropriately partitioned into subclasses for generating summary responses.</Paragraph>
    <Paragraph position="9"> It is often useful to provide descriptive answers on the basis of certain preferred attributes. For example, for the STUDENTS relation, it is more &amp;quot;meaningful&amp;quot; to provide answers on the basis of the attribute NATIONALITY or UG-MAJOR, rather than STUDENT-ID-NO or AMOUNT-OF-FINANCIAL-AID.</Paragraph>
    <Paragraph position="10"> However, it is impossible to give a concrete weight regarding each attribute's preferability. Therefore, we have classified the attributes into several groups; all attributes in a group are considered equally useful in producing meaningful qualitative answers to queries.</Paragraph>
    <Paragraph position="11"> This classification means that it is preferable and more useful to produce descriptive responses using the attributes in preference category 1 than the attributes in category 2, 3 or 4. This categorization is based on one's familiarity with the data. The decision is subjective, and hence it is bound to vary according to the judgement of the person building the interface. In the Preference Category slot, we have an entry corresponding to each relation the attribute occurs in. The information in this slot ensures that the system chooses a description based on the most salient attribute(s) for producing a response.</Paragraph>
    <Paragraph position="12"> A simple example of an attribute frame is given  The example shows the frame for the attribute NATIONALITY belonging to the STUDENTS relation.</Paragraph>
    <Paragraph position="13"> It assumes character values. To be valid, the values must be members of a previously compiled list of countries. It belongs to the preference category 1 discussed above. Let us consider the clause ((Canadian)(=)(~ foreign)) in the Distinguishing Value slot. The value &amp;quot;Canadian&amp;quot; is a distinguishing value in the domain of values which the attribute may take. The term &amp;quot;(=)&amp;quot; indicates that it is possible to identify a class of students using the descriptive expression &amp;quot;NATIONALITY ---- Canadian&amp;quot;. If NATIONALITY ~ &amp;quot;Canadian&amp;quot;, the student may be referred to as a &amp;quot;FOREIGN&amp;quot; student. Similarly, if the value for a student under the attribute NATIONALITY is a member of the set (U.K.U.S.A.</Paragraph>
    <Paragraph position="14"> Australia ...), he may be designated as coming from an English-speaking country. This information may be helpful in answering a query such as: Qg: Which students are taking the &amp;quot;Intensive English&amp;quot; course in the Fall term? $9: Most entering foreign students from non-English speaking countries.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML