File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1171_metho.xml

Size: 9,875 bytes

Last Modified: 2025-10-06 14:08:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1171">
  <Title>A System for Generating Descriptions of Sets of Objects in a Rich Variety</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The Best-First Procedure
</SectionTitle>
    <Paragraph position="0"> The basic mechanism of the best-first search algorithm is a generalization of the incremental version: instead of successively adding attributes to the full expression generated so far, all intermediate results are accessible for this operation, producing an optimal solution, if completed see (Horacek 2003) for details. This algorithm uses two cut-off techniques, assuming conflation (e.g., the descriptors man and unmarried can be verbalized as &amp;quot;bachelor&amp;quot;) is not possible: * A dominance cut-off is carried out locally for sibling nodes, when two partial descriptions exclude the same set of potential distractors, the same set of descriptors still being available.</Paragraph>
    <Paragraph position="1"> The variant evaluated worse is discarded.</Paragraph>
    <Paragraph position="2"> * A value cut-off is carried out globally after a solution has been found. It is done for nodes whose most optimistic evaluation (including the minimal value of the description required for excluding the remaining potential distractors), surpasses the evaluation of that solution. Applying any of these cut-offs only serves to gain speed and does not change the final result.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Efficiency-Enhancing Measures
</SectionTitle>
      <Paragraph position="0"> We have enhanced this repertoire by a complexity cut-off, carried out prior to further expanding a node if the boolean combination of descriptors build leads to a description that is more complex than a given threshold. For this threshold, we use the complexity of descriptions identifying each referent individually, which is an enumeration.</Paragraph>
      <Paragraph position="1"> The generation of boolean combinations is a critical part of the algorithm, since it is its most time-consuming component. Redundancies must be avoided, which requires more effort than previous approaches due to our hierarchical organization of property values. This burden is split between a static representation of implications, compiled from the underlying knowledge base about specializations, and the function Generate-Next, which accesses these data. Four implications hold between properties and their negations: implies (p,q) if specializes(p,q) holds implies (p,!q) if incompatible(p,q) holds implies (!p,q) if opposite(p,q) holds implies (!p,!q) if generalizes(p,q) holds Then the predicates subsumes and redundant can be defined for properties (or their negations):</Paragraph>
      <Paragraph position="3"> The function Generate-Next (Figure 1) successively builds increasingly complex disjunctions of descriptors and their negation. To start with, the procedure Increment produces the next property combination with given complexity, if existing  (1). Otherwise (2), that complexity is augmented (9) before generating the next combination, unless the complexity limit is reached (8), causing a complexity cut-off. For a property combination, it is tested whether all its properties are pairwise redundant (3), then the next combination is built. If a non-redundant combination is found, it must pass the following tests: 1. It subsumes the target set (4).</Paragraph>
      <Paragraph position="4"> 2. It further reduces the set of distractors (5). 3. The reduced set of distractors is not equal to or a superset of the distractor associated with  a sibling node already created; otherwise, a dominance cut-off applies (6).</Paragraph>
      <Paragraph position="5"> If successful, that combination is returned, otherwise building combinations is resumed (7).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Enhancing the Best-First Procedure
</SectionTitle>
      <Paragraph position="0"> We have incorporated a number of improvements over the original version of the procedure: * Treating linguistically motivated preferences as options rather than restrictions * Putting limitations on the complexity of specifications, to control comprehensibility * Enhancing the expressive repertoire by descriptions of subsets of referents and by descriptions of referents to be excluded * Producing a sequence of increasingly restricting descriptions rather than a single one.</Paragraph>
      <Paragraph position="2"> In the following, we summarize each of these (see (Horacek 2004) for details).</Paragraph>
      <Paragraph position="3"> The following linguistically motivated preferences are treated as options: a boolean combination of descriptors that express the category of the object (by a head noun) is chosen first, other (attribute) descriptors later, since a category must be chosen anyway. Moreover, we reduce the set of potential solutions by excluding &amp;quot;mixed&amp;quot; boolean combinations, that is disjunctions of a category and attributes, such as car [?] red, which are unnatural and awkward to express verbally.</Paragraph>
      <Paragraph position="4"> To strengthen comprehensibility, we specify limitations on the surface form of descriptions, including places for the head noun, pre- and postnominal modifiers, and relative clauses.</Paragraph>
      <Paragraph position="5"> Maximum numbers for each of these positions can be given, also specifying places as alternative ones, thus limiting the number of components in conjoined expressions. By associating descriptors with surface positions they can take, these specifications allow one to control the surface structure of the descriptions during searching.</Paragraph>
      <Paragraph position="6"> For partial descriptions with multiple disjunctions, recasting the expression built as a partial description is attempted to remain within given limits. These descriptions are always of the form ^i=1,n ([?]j=1,mi Pij), where each Pij is a positive or negative descriptor. Even in moderately complex instances of this conjoined expression, several elements may consist of disjunctions of more than one descriptor. In such a constellation,we pick up one disjunction, for example [?]j=1,mk Pkj for some k, transforming that expression by applying distributivity. This amounts to partitioning the set of intended referents into subsets, where each of the components of the new top level disjunction describes one of these subsets.</Paragraph>
      <Paragraph position="7"> Consider, for example, &amp;quot;the sportscars that are not red and the small trucks&amp;quot; identifying x5, x7, x8, and x12 in two components rather than by the involved one-shot &amp;quot;the vehicles that are a sportscar or small, and either a truck or not red.&amp;quot; In addition, descriptions may specify exceptions: describing some of the referents to be excluded may lead to shorter expressions than expanding the description of the intended referents, so that we integrate it in the expressive repertoire - for example, &amp;quot;the vehicles on the right, but not the red truck&amp;quot;, identifying x1, x3, and x6 by excluding x7 in the locally restricted context.</Paragraph>
      <Paragraph position="8"> In accordance with these specifications, the best-first search is invoked to produce an identifying description. This may not always be possible in complex situations. If this is the case, the best partial solution is taken, and the search is repeated within the restricted context defined by the descriptions generated so far. By this procedure, a sequence of descriptions is generated rather than a single one. Consider, for example, &amp;quot;one of the trucks and the sportscars, all not white. The truck stands on the right&amp;quot;, identifying x6, x7, x11 and x12 out of all 12 vehicles (in Figure 1) in two passes.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 An Example
</SectionTitle>
      <Paragraph position="0"> We illustrate the behavior of the system by a small example. Let {x1, x3, x6} in Figure 1 be the set of intended referents. Specifications for maximum complexity of surface forms allow head nouns, pre- and postnominal modifiers, at most one of them as a conjoined expression, and a relative clause or a &amp;quot;but&amp;quot;-modifier expressing an exception. Only two descriptors apply to all intended referents, vehicle and right. Even if vehicle is chosen first, subsequent searching only expands on the partial description with right, since it excludes a superset of the objects vehicle does: only x7 is remaining. The next simplest descriptor combination is car [?] white, which would allow complete identification of the intended referents. Since it can only be expressed by a relative clause, for which conjoined expressions are not allowed, recasting the description is attempted. This yields (car ^ right) [?] (white ^ right), which is a possible solution. Since a head noun is required for the second part, adding a further descriptor, an attempt is made to improve the solution, through finding an alternative to car [?] white. Describing the complement constitutes such an alternative, since identification is required for x7 only. This can be done by selecting truck and, afterwards, any of the descriptors red, small, and old (let us say, we pick red). This yields right ^ ! (truck ^ red) as an alternative solution, with vehicle being added to obtain a head noun. Altogether, a surface generator could then generate &amp;quot;the vehicles on the right, but not the red truck &amp;quot;, resp. &amp;quot;the cars and the white vehicle, both on the right&amp;quot; - the latter with a clever aggregation module.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML