File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1408_metho.xml

Size: 22,598 bytes

Last Modified: 2025-10-06 14:10:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1408">
  <Title>Generating References to Parts of Recursively Structured Objects Helmut Horacek Universit t des Saarlandes</Title>
  <Section position="5" start_page="47" end_page="50" type="metho">
    <SectionTitle>
3 A Corpus with References to Formulas
</SectionTitle>
    <Paragraph position="0"> In this paper, we analyze some phenomena in the context of references to mathematical formulas and their components, as observed in a corpus on simulated man-machine tutoring dialogs (Wolska et al., 2004). These dialogs constitute the result of Wizard-of-Oz experiments in teaching students mathematical theorem proving in naive set theory resp.</Paragraph>
    <Paragraph position="1"> mathematical relations. In these experiments, a human wizard took the role of the tutor, with constraints on tutoring strategy and on use of natural language, although the constraints on natural language use were relaxed to encourage natural behavior on behalf of the student. In the corpus obtained this way, a number of quite particular expressions referring to components of recursively structured objects -the formulas -- showed up. Consequently, it is our goal to automate the production of these kinds of referring expressions in a more elaborate version of the simulated tutoring system, with full-fledged natural language generation.</Paragraph>
    <Paragraph position="2"> Representative examples originating from our corpus appear in Figure 1. Each example consists of two parts: 1. a student utterance, mostly a formula, labeled by (#a), which is the context for interpreting subsequent referring expressions, the intended referent appearing in  1. Reference to the typographic order (1a) (RdegS)-1 = {(x,y)  |(y,x) [?] RdegS} = {(x,y)  |[?]z (z [?] M ^ (x,z) [?] R-1 ^ (z,y) [?] S-1)} = R-1degS-1 (1b) Das geht ein wenig schnell. Woher nehmen Sie die zweite Gleichheit? (That was a little too fast. How did you find the second equality?) (2a) Nach 9 = ((y,z) [?] R ^ (z,y) [?] S) (2b) Fast korrekt. Das zweite Vorkommen von y muSS durch x ersetzt werden. Almost correct. The second occurrence of y must be replaced by x. (3a) (R [?] S)degT ist dann {(x,y)  |[?]z (z [?] M ^ ((x,y) [?] R [?] (x,y) [?] S) ^ (y,z) [?] T)} (3b) Nicht korrekt. Vermutlich liegt der Fehler nach der letzten 'und'-Verknupfung Not correct. The mistake is probably located after the last 'and'-operation 2. Reference by exploiting default scope and metonymic relations (4a) (RdegS)-1 = {(x,y)  |[?]z (z [?] M ^ (y,z) [?] R-1 ^ (z,x) [?] S-1)} [?] S-1degR-1 (4b) Nein, das ist nicht richtig! Vergleichen Sie den zweiten Term mit Ihrer vorhergehenden Aussage! No, this is not correct! Compare the second term with your previous assertion! (5a) {(x,y)  |(y,x) [?] (RdegS)} = {(x,y)  |(x,y) [?] {(a,b)  |[?]z (z [?] M) ^ (a,z) [?] R ^ (z,b) [?] S}} (5b) Das stimmt so nicht. Die rechte Seite w re identisch mit RdegS. This is not correct. The right side would be identical to RdegS. (6a) {(x,y)  |[?]z (z [?] M) ^ ((x,z) [?] R [?] (x,z) [?] S) ^ (z,y) [?] S} = {(x,y)  |[?]z (z [?] M) ^ (z,y) [?] S ^ ((x,z) [?] R [?] (x,z) [?] S)} = ((y,z) [?] S ^ (z,y) [?] S) (6b) Auf der rechten Seite ist z nicht spezifiziert On the right side, z is not specified (7a) {(x,y)  |[?]z (z [?] M) ^ ((x,z) [?] R [?] (x,z) [?] S) ^ (z,y) [?] S} = {(x,y)  |[?]z (z [?] M) ^ (z,y) [?] S ^ ((x,z) [?] R [?] (x,z) [?] S)} = [?]z (z [?] M ^ ((y,z) [?] S ^ (z,y) [?] S)) (7b) Diese Aussagen scheinen nicht gleichwertig zu sein. Ein z, das die Bedingung der rechten  Aussage erf llt, muSS nicht die Bedingung der linken Menge erf llen. These assertions do not seem to be of equal range. A z which fulfills the condition of the right assertion does not necessarily fulfill the condition of the left set.  3. Reference by exploiting default scope for building groups of objects (8a) K((A [?] B) [?] (C [?] D)) = K(A [?] B) [?] K(C [?] D) (8b) De Morgan Regel 2 auf beide Komplemente angewendet.</Paragraph>
    <Paragraph position="3"> De Morgan Rule 2, applied to both complements.</Paragraph>
    <Paragraph position="4"> (9a) (T-1degS-1)-1 [?] (T-1degR-1)-1 = {(x,y)  |(y,x) [?] (T-1degS-1) ^ (y,x) [?] (T-1degR-1)} (9b) Dies w rde dem Schnitt der beiden Mengen entsprechen.</Paragraph>
    <Paragraph position="5"> This would correspond to the intersection of both sets.</Paragraph>
    <Paragraph position="6"> 4. Reference to regions by expressions involving vagueness (10a) Also ist (R [?] S)degT = {(x,z)  |[?]v (((x,v) [?] R [?] (x,v) [?] S) ^ (z,v) [?] T)} (10b) Fast richtig. Am Ende der Formel ist ein Fehler.</Paragraph>
    <Paragraph position="7"> Almost correct. At the end of the formula, there is a mistake.</Paragraph>
    <Paragraph position="8"> (11a) Wegen der Formel f r die Komposition folgt (R [?] T)deg(S [?] T) = {(x,z)  |[?]z ((x,z) [?] R ^ (z,y) [?] T) [?] [?]z ((x,z) [?] R ^ (z,y) [?] T)} (11b) Fast richtig. In der zweiten Halfte der Formel ist ein Fehler. Almost correct. In the second half of the formula, there is a mistake.  bold, and 2. a tutor response labeled by (#b), with at least one referring expression, in italics.</Paragraph>
    <Paragraph position="9"> Texts are given in the original German version, accompanied by a translation into English.</Paragraph>
    <Paragraph position="10"> The examples are partitioned into four categories. The first one, (examples 1 to 3), illustrate references by the typographical position, from left to right. Items referred to in this manner are qualified by their formal category.</Paragraph>
    <Paragraph position="11"> (1) refers to an equality -- two terms joined by an equal sign -- in a sequence of three equalities. (2) refers to an instance of a variable, y, which must be further qualified by its position to distinguish it from another occurrence. (3) refers to the last occurrence of the and operator. Distinct surface forms are used for objects referred to by category ( second equality ) resp. by name ( second occurrence of y ).</Paragraph>
    <Paragraph position="12"> The second category, the only one specific to recursively structured objects, comprises references which look similar to the previous ones, but they do not reflect the typographical position but structural embeddings. Objects referred to by this kind of expressions are found on the top level of the embedding object or close to it. In most cases, references to the embedding level where the intended referent is to be found are left unexpressed, which carries the implicit meaning that the referent appears at the top most level in which the referred category can be found. In (4), for example, the entire formula contains many terms as its components, in various levels of embedding, so that orientation on typographic positions is not clear. However, on top level of the inequation chain, there are only three terms and the order among these is perfectly clear. (5) and (6) illustrate the role of incompleteness -- only right side is mentioned, leaving the object whose right side is meant implicit. Consequently, this must be the right side of the whole formula. The last example in this category, (7) shows the reference to different levels of embedding in one sentence. While right assertion refers to the expression on the right side of the equivalence on top level, left set refers to the left of the two sets in the equation on the left side of that equivalence.</Paragraph>
    <Paragraph position="13"> The third category, which features the reference to sets of objects, shows the interpretation of the embedding level in which the intended referent is to be found on the basis of number constraints. In precise terms, this is an instance of implicature (Grice 1975): if the number of objects that are on top level of the embedding object and satisfy the description, exceeds the cardinality specified, identification of the intended referents is transferred to one of the embedded substructures. In (8), three subexpressions satisfy the metonymic description complement , but the expression refers only to two. Consequently, the intended referents must be found in one of the substructures where a precise cardinality match is present -here, the right side of the equation. Due to the implicature, expressing this additional qualification is not required. An additional complication arises in the context of interference across referring expressions in one sentence. In (9), both sets would be resolved to the two sides of the equation, without the context of the whole sentence. However, since this refers to the result of the preceeding assertion, that is, the right side of the equation, this part is in some sense excluded from the context for resolving the next referring expressions.</Paragraph>
    <Paragraph position="14"> Hence, the left side of the equation yields the two sets on top level as interpretation.</Paragraph>
    <Paragraph position="15"> The fourth category comprises examples of references which are in some sense associated with vagueness. In references to formulas, we consider the end (example (10)) -- which means the region towards the end, as a vague expression, but also the second half (example (11)), since it is not entirely clear whether this expression must be interpreted structurally or typographically, and a precise interpretation of half in the typographical sense is pointless.</Paragraph>
    <Paragraph position="16"> In the following, we present methods for the automated generation of referring expressions of the kind illustrated in Figure 1 -- concise ones. We address the following phenomena: Y= Implicit scope interpretation Y= Incomplete or metonymic expressions Y= Implicatures of category and cardinality We do, however, restrict our task to the generation of single referring expressions with precise references. Hence, we do not address vagueness issues, since the meaning of expressions as occurring in (10) and (11) is not fully clear. Moreover, we do not accommodate the context due to previously generated referring expressions as in (9), which we assume to be done by the embedding process.</Paragraph>
  </Section>
  <Section position="6" start_page="50" end_page="52" type="metho">
    <SectionTitle>
3 Operationalization
</SectionTitle>
    <Paragraph position="0"> In this section, we describe an operationalization of generating referring expressions of the kind discussed in the previous section. This operationalization is realized in terms of extensions to the algorithm by Dale and Reiter (1995). This algorithm assumes an environment with three interface functions: Basic-LevelValue, accessing basic level categories of objects (Rosch 1978), MoreSpecificValue for accessing incrementally specialized attribute values according to a taxonomic hierarchy, and UserKnows for judging whether the user is familiar with the attribute value of an object. In a nutshell, MakeReferringExpression (Figure 2, including our extensions) iterates over the attributes P of an intended referent r (or a set of referents). In FindBestValue, a value is chosen that is known to the user and maximizes discrimination (RulesOut) -- this value describes the intended referent and rules out at least one potential distractor in C. If existing, such values are iteratively collected in L, until P is empty or a distinguishing description is found. The value V of an attribute A is chosen within an embedded iteration, starting with the basic level value attributed to r, after which more specific values also attributed to r and assumed to be known to the user are tested for their discriminatory power. Finally, the least specific value that excludes the largest number of potential distractors and is known to the user is chosen.</Paragraph>
    <Paragraph position="1"> The extensions to handle particularities for our concerns comprise several components: Y= The knowledge representation of objects is enhanced by properties expressing positions in some context and by a meta-property about the use of descriptors -metonymic use of a descriptor when standing in relation to another one.</Paragraph>
    <Paragraph position="2"> Y= The value selection for context-dependent descriptors requires special treatment; moreover, metonymic expressions are built in some sort of a two-step process.</Paragraph>
    <Paragraph position="3"> Y= The discriminatory power in the subprocedure RulesOut is interpreted in local contexts for attributes expressing position.</Paragraph>
    <Paragraph position="4"> Y= Termination criteria include a test whether a cardinality or position-based implicature establishes a unique preference.</Paragraph>
    <Paragraph position="6"> In order to precisely define the extensions, we introduce some predicates and formal definitions for them (Figure 2). Composition in recursively structured objects is built on dominates(x,y), expressing that component y is part of component x; chained compositions of dominates are acyclic. On that basis, groups of items are built according to local contexts. A Group which some items x belong to is the set of items dominated by one same item, if existing. Otherwise, Group is empty. A special group is the set of items on top level, T-groupitems, which are all dominated by the entire structure, the root item, which is not dominated by any item. These items also build a group. In contrast, L1-items, which comprise the items one level below the T-group-items, are not all in one group. Intersection with the Group predicate yields subsets, where each element in these sets is dominated by one and the same T-group-item (see the definition of L1-group-pref). A central definition is Grouppref (group preference), used for testing the effect of implicatures. It is defined for the set of relevant items to be used within the algorithm (r [?] C), that is, the intended referents and still existing distractors, in relation to a Group, in the context of cardinality N and position V, which apply to the set of items. For that group to be preferred, the relevant items falling into that group must match the given cardinality and the position description (see the definition of Position in the next paragraph). On that basis, T-group-pref expresses</Paragraph>
    <Paragraph position="8"> case Ai of [2] cardinality: V - |r |[3] global-position: V - Position(r,Ctotal,|r|) [4] local-position: V - Position(r,Group(r),|r|) [5] other: V = FindBestValue(r,Ai,BasicLevelValue(r,Ai)) end case  if RulesOut(&lt;Ai,V&gt;,C) [?] nil then if metonymic(Ai,X) and &lt;type,X&gt; [?] L for some X [6]</Paragraph>
    <Paragraph position="10"> if &lt;type,X&gt; [?] L for some X  if V = no-value then return nil else case Ai of [14] cardinality: return C [?] [?] Group(c) c [?] C, where |Group(c) [?] C  |&lt; V [15] global-position: return {x : x [?] C ^ Position(x,Ctotal,|r|) [?] V [16] local-position: return {x : x [?] C ^ Position(x,Group(x),|r|) [?] V other: return {x: x [?] C ^ UserKnows(x,&lt;A,V&gt;) = false}  preference for top-group items, when bound to Group, and L1-group-pref expresses preference for such a group with x one level below. The knowledge representation of objects is enriched by some properties which are not intrinsic to an object itself. These properties comprise descriptors cardinality, position, and the meta-property metonymic. The predicate metonymic(x,y) expresses the acceptability of a metonymic reference of a descriptor x for a category y (e.g., an operator for a formula, in mathematical domains). The descriptor cardinality easily fits in the standard schema of the procedure. However, it only contributes to the discrimination from potential distractors in the context of effects of implicature. The most complex addition is the descriptor position, which expresses some sort of relative position of an object considered within the context of a set of comparable objects (e.g., first, second). There are two dimensions along which such descriptors are meaningful in the domain of mathematical formulas and in similar domains with recursively structured objects: (1) the typographical position within the entire object, referred to by the descriptor global-position, and (2) the position within the structural level where the object in question resides, referred to by the descriptor local-position. Moreover, that position also depends on the number of objects considered, if subgroups of objects are built prior to checking their position within the entire group (e.g,: the first two items). This information is encapsulated in the function Position(x,y,n), where x denotes the object or set of objects whose position within group y is the value of that function, where subgroups of n objects are formed. In order to yield a proper result, x must be a subset of y and the position value within y must be the same for all elements of x. Otherwise, the value is undefined. For example, for a group G=&lt;1,2,3,4, 5,6&gt;, Position({3},G,1) = 3, Position({3},G,2) = 2, and Position({2,3},G,2) = undefined. In some sense, this handling of positions is a generalization of the ordering for vague descriptors in (van Deemter 2006). Also in accordance with van Deemter, we separate descriptor selection from surface form determination, yielding, for example, left set for {&lt;type,set&gt;, &lt;local-position,first&gt;}, the first part of an equation, and second occurrence of x for {&lt;type,x&gt;, &lt;local-position,second&gt;}.</Paragraph>
    <Paragraph position="11">  In order to process these enhanced representations adequately, we have incorporated appropriate modifications in the procedure MakeReferringExpression (labeled by [#] in Figure 3). First, the original set of potential distractors is stored for computations within a global context [1]. Then the value selection for the attribute currently considered is done [2], which is different from the usual call to FindBestValue for cardinality [3], global-position [4], and local-position [5]; the latter two are realized by the function Position, with appropriate instantiations for the group parameter. Next, the treatment for the inclusion of metonymic properties in the description is addressed. If the metonymic descriptor fits to the object category [6], and its discriminatory power [7] dominates that associated with the type descriptor [8], the descriptor values are conflated by overwriting the type value by that of the metonymic descriptor [9]. The two calls to RulesOut involved in the above test ([7] and [8]) are the only references to Rules Out where effects on the original, entire set of distractors are tested. Therefore, the parameter C is added in the definition of RulesOut [13] and in all other places where that procedure is called [10], [12]. Similarly to the inclusion of attribute-value pairs in the description, the exclusion tests in RulesOut are specific for non-intrinsic attributes [14]. For cardinality, those distractors are excluded which belong to a group where the number of still relevant distractors (those consistent with the partial description built so far) is below that cardinality [15]. Similarly, for testing position values, those distractors are picked for which the values returned by the function Position, in dependency of the relevant scope -- the group the intended referent(s) belong to, are not consistent with value of the attribute considered (global-position resp. local-position) [16]. Finally, the termination criterion [11] is enhanced, by taking into account the effect of implicatures through cardinality and position descriptors, by the function Preference-byimplicature [17]. In this function, the values of cardinality and global-position or local-position are instantiated, provided they appear in the description L [18]. The return value is the result of a test whether there exists preference for the top-level, or for that level 1 group which contains the intended referents [19].</Paragraph>
  </Section>
  <Section position="7" start_page="52" end_page="53" type="metho">
    <SectionTitle>
4 Examples
</SectionTitle>
    <Paragraph position="0"> In this section, we illustrate how particularities of our application domain are modeled and how the procedure behaves in generating the referring expressions observed in our corpus.</Paragraph>
    <Paragraph position="1"> The ordered list of attributes, P, consists of &lt;type, form, cardinality, global-order, local-order&gt; for atomic items and of &lt;type, operator, cardinality, local-order, dominated-by&gt; for the composed expressions -- dominated-by is the inverse of dominates. The meta-predicate metonymic is instantiated for pairs &lt;variable, form&gt;, &lt;expression, local-order&gt;, and &lt;term, operator&gt; for producing expressions such as x referring to variable x, left side referring to the left part of an assertion or equation, and complement referring to a term with complement as top level operator.</Paragraph>
    <Paragraph position="2"> We show the generation of two examples.</Paragraph>
    <Paragraph position="3"> 1. example: Left set in (7) in Figure 1.</Paragraph>
    <Paragraph position="4"> It is generated by choosing set as the type, followed by unsuccessful attempts to pick an operator attribute (there is none defined for that set), and a cardinality (which yields no discrimination). Then first is chosen for local-ordering, yielding unique identification (the embedding is left implicit), and this value is expressed by left on the surface.</Paragraph>
    <Paragraph position="5"> 2. example: both complements in (8).</Paragraph>
    <Paragraph position="6"> It is generated by choosing term as the type, followed by complement as the operator, which overwrites term due to its specification as metonymic with respect to that category. Then 2 is chosen for cardinality, which yields unique identification since a subgroup preference for level one is present.</Paragraph>
    <Paragraph position="7"> Altogether, the algorithm is able to generate the expressions occurring in our corpus, or quite similar ones, assisted by the application-specific tailored list P. Exceptions constitute reference to regions related to some formula component, such as (3) in Figure 1, effects of interference of scope across several referring expressions, such as (9), and expressions involving vague region descriptors, such as (10) and (11). While the last set of examples comprises more than referring expressions, the first two can be handled, but the generated expressions are typically a bit cumbersome, such as the third term in the condition of the set instead of after the last and -operation in (3) and both sets on the left side instead of simply both sets in (9).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML