File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1084_metho.xml

Size: 23,986 bytes

Last Modified: 2025-10-06 14:14:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1084">
  <Title>Bridging Textual Ellipses</Title>
  <Section position="3" start_page="496" end_page="496" type="metho">
    <SectionTitle>
2 Constraints on ConceptualLinkage
</SectionTitle>
    <Paragraph position="0"> This section provides a highly condensed exposition of the conceptual constraints underlying the resolutkm of textual ellipses. A much more detailed presentation is given by H~dm et al. (1996). The constraints we posit require a domain knowledge base to consist of concepts and conceptual roles linking these concepts. Concepts and roles are hierarchically ordered by subsumption (a terminological knowledge representation framework is assumed; cf. Woods &amp; Schmolze (1992)).</Paragraph>
    <Paragraph position="1"> In order to determine suitable conceptual links between ~m antecedent and an elliptic expression, we distinguish two modes of constraining the linkage between concepts via conceptual roles. In the process of path finding an extensive unidirectional search is performed in the domain knowledge base and formal well-formedness conditions holding for paths between two concepts are considerexl, viz. complete connectivity (compatibility of domains and ranges of the included relations), non-cyclicity (exclusion of inverses of relations) and non-redundancy (exclusion of including paths).</Paragraph>
    <Paragraph position="2"> The mode of path evaluation incorporates empirically plausible criteria in order to select the strongests of the ensuing Imths. Based on mmlyses of approximately 60 product reviews from the informatkm technok)gy domain and experimental evidences from several (psycho)linguistic studies (e.g., Chaffin (1992)), we state certain predefined path patterns. From those general troth patterns and by virtue of the hierarchical organizatkm of conceptual relations, concrete conceptual role chains cau atttomatically be derived in a terminological r~tsoning system. As a consequence, we may distinguish between a subset of all types of well-lormed paths, which is labelext &amp;quot;plausible&amp;quot;, another subset which is labeled &amp;quot;metonymic&amp;quot;, and all remaining paths which are labeled &amp;quot;implausible&amp;quot;. Examples of plausible paths are all paths of length 1 -- they are explicitly encoded in the domain's concept descriptkms and are therefore &amp;quot;plausible&amp;quot;, by definition -or any series of transitive relations, e.g. has-physical-part relations. Following well-established typologies of metonymies (Lakoff, 1987) we include producerfor-product, part-for-whole, and whole-for-part patterns among the metonymic paths.</Paragraph>
    <Paragraph position="3"> The computation of paths between an antecedent z and an elliptic expression 9, however, may yield several types of well-lormed paths, viz. &amp;quot;plausible&amp;quot;, &amp;quot;metonymic&amp;quot; or &amp;quot;implausible&amp;quot;. For proper selection we define a ranking on those path labels according to their intrinsic conceptual strength in terms of the relation &amp;quot;&gt;~-~,t,.&amp;quot; (conceptually stronger than) (cf. Table l).</Paragraph>
    <Paragraph position="4"> As a consequence of this ordering, &amp;quot;metonymic&amp;quot; 1~:t ~ausiblc'' &gt;~.-.,t,. &amp;quot;mctonymic&amp;quot; &gt;c-st,. &amp;quot;implausible&amp;quot; \]  paths will be excluded from a path list iff &amp;quot;plausible&amp;quot; paths already exist, while &amp;quot;implausible&amp;quot; paths will be excluded iff&amp;quot;plausible&amp;quot; or &amp;quot;metonymic&amp;quot; paths already exist. At the end of this selection process, only paths of the strongest type are retained in the tinal path list.</Paragraph>
    <Paragraph position="5"> All conceptual paths which meet the above linkage criteria for two concepts, z and y, are contained in the final list denoted by CP&lt;v. As, in the case of textual ellipsis, we have to deal with imths leading fi'om the elliptical expression to several altenmtivc antecedents, we usually have to compare pairs of path lists CP~,, v and CP,:,~, where x, y, z are concepts. Obviously, the criterion which ranks conceptual paths according to their associated path markers is applicable as all paths in a single CP list have the same marker.</Paragraph>
    <Paragraph position="6"> A function, PathMarker(CPi,j), yields either &amp;quot;plausible&amp;quot;, &amp;quot;metonymic&amp;quot; or &amp;quot;imphmsible&amp;quot; depending on the type of lmths the list contains. Hence, the same ordering of path markers as in Table 1 can be applied to compare two CP lists (of. Table 2).</Paragraph>
    <Paragraph position="7">  : ..... 1 PathMarker(CP,,,j) &gt; .... z,. PathMarker(CP.,: ~) \[ asStrongAs (CPx ;:, C1),~: ~) :C/~, \] PathMarker((','P.~, v) ~- PathMarker(CP~, z) 1</Paragraph>
  </Section>
  <Section position="4" start_page="496" end_page="497" type="metho">
    <SectionTitle>
3 Constraints on Centers
</SectionTitle>
    <Paragraph position="0"> Conceptual criteria are of tremendous importance, but they are not sufficient for proper ellipsis resolution.</Paragraph>
    <Paragraph position="1"> Additional criteria have to be supplied in the case of equal strength of CP lists for alternative antecedents.</Paragraph>
    <Paragraph position="2"> We therefore incorporate into our model criteria which relate to the fimctional information structure of utterances using the methodological framework of the well-known centering model (Grosz et al., 1995). Accordingly, we distinguish each utterance's backward-looking center (Cb (U,~)) and its forward-looking centers (Cf(U~)). The ranking imposed on the elements of the CI rellccts the assumption that the most highly ranked element of Cy (U,~) is the most preferred antecedent of an anaphoric (or elliptical) expresskm in IJ,,+~, while the remaining elements are ordered by decreasing preference for establishing refereutial links.</Paragraph>
    <Paragraph position="3"> The main difference between Grosz et al.'s work and our proposal (see Strube &amp; Hahn (1996)) concerns the criteria for r~mking the forward-looking centers. While Grosz et al. assume that grammatical roles are the major determinant for the ranking on the  C t, we claim that for languages with relatively free word order (such as German), it is the functional information structure (IS) of the utterance in terms of the context-boundedness or unboundedness of discourse elements. The centering data structures and the notion of context-boundedness can be used to redefine Dane~' (1974) trichotomy between given information, theme and new information (or rheme). The Cb(Un), the most highly ranked element of Cf (U,~ _ 1 ) realized in U~, corresponds to the element which represents the given information. The theme of U,~ is represented by the preferred center Cp (U,~), the most highly ranked element of C! (Un). The theme/rheme hierarchy of Un is determined by the C\] (U,~_ 1): the most rhematic elements of U,~ are the ones not contained in C! (U,~_ J (unbound discourse elements), they express the new information in U,~. The distinction between context-bound and unbound elements is important for the ranking on the Cf, since bound elements are generally ranked higher than any other non-anaphoric elements (cf. also Hajieovfi et al. (1992)).</Paragraph>
    <Paragraph position="4"> bound element(s) &gt;rsb,,, unbound element(s) anaphora &gt;XSbo,,~ elliptical antecedent &gt;XSbo,,d elliptical expression nom headt &gt;p~,~ nom head2 &gt;p~,~ ... &gt;p~,~ nom head,~  The constraints holding for the ranking on the Cf for German are summarized in Table 3. They are organized at three layers. At the top, &amp;quot;&gt;,Sbo,o&amp;quot; denotes the basic relation for the overall ranking of information structure (IS) patterns. The second relation, &amp;quot;&gt;r ~bo u n d &amp;quot;' denotes preference relations dealing exclusively with multiple occurrences of bound elements in the preceding utterance. Finally, &amp;quot;&gt;~.~o&amp;quot; covers the preference order for multiple occurrences of the same type of any information structure pattern, e.g., the occurrence of two anaphora or two unbound elements (all nominal heads in an utterance are ordered by linear precedence in terms of their text position). Given these basic relations, we may formulate the composite relation &amp;quot;&gt;,s&amp;quot; (Table 4), It summarizes the criteria for the ordering of the items on the Cf (x and y denote lexical heads).</Paragraph>
    <Paragraph position="5">  &gt;rs :: { (x, y) I /fx and y represent the same type of IS pattern then the relation &gt;p.,c applies to x and y else ifx and y represent different forms of bound elements then the relation &gt;iSbo, nd applies to x and y else the relation &gt;rsb,,, applies to x and y }</Paragraph>
  </Section>
  <Section position="5" start_page="497" end_page="497" type="metho">
    <SectionTitle>
4 Predicates for Textual Ellipsis
</SectionTitle>
    <Paragraph position="0"> The grammar formalism we use (for a survey, cf.</Paragraph>
    <Paragraph position="1"> Hahn et al. (1994)) is based on dependency relations between lexical heads and modifiers. The dependency specifications allow a tight integration of linguistic (grammar) and conceptual knowledge (domain model), thus making powerful terminological reasoning facilities directly available for the parsing process. 1 The resolution of textual ellipses is based on two major criteria, a conceptual and a structural one. The conceptual strength criterion for role chains is already specified in Table 2. The structural condition is embodied in the predicate isPotentialElliptic-Antecedent (cf. Table 5). A quasi-anaphoric relation between two lexical items in terms of textual ellipsis is here restricted to pairs of nouns. The elliptical phrase which occurs in the n-th utterance is restricted to be a definite NP and the antecedent must be one of the forward-looking centers of the preceding utterance.</Paragraph>
    <Paragraph position="2">  isPotentialElliptieAntecedent (y, x, n) :C/~y isac* Nominal A x isac* Noun A 3 z: (x headz A z isac* DetDefinite) A x E U,~ Ay.r E Cf(U,~-x)  The predicate PreferredConceptualBridge (cf. Table 6) combines both criteria. A lexical item y is determined as the proper antecedent of the elliptic expression x iff it is a potential antecedent and if there exists no alternative antecedent z whose conceptual strength relative to z exceeds that of y or, if their conceptual strength is equal, whose strength of preference under the IS relation is higher than that ofy.</Paragraph>
    <Paragraph position="4"/>
  </Section>
  <Section position="6" start_page="497" end_page="499" type="metho">
    <SectionTitle>
5 Resolution of Textual Ellipsis
</SectionTitle>
    <Paragraph position="0"> The actor computation model (Agha &amp; Hewitt, 1987) provides the background for the procedural interpretation of lexicalized grammar specifications, as those 1We assume the following conventions to hold: C = {Word, Nominal, Noun, PronPersonal,...} denotes the set of word classes, and isac = {(Nominal, Word), (Noun, Nominal), (PronPersonal, Nominal),...} C C x g denotes the sub-class relation which yields a hierarchical ordering among these classes. Furthermore, object.r refers to the instance in the text knowledge base denoted by the linguistic ,item object and object.c refers to the corresponding concept class c. Head denotes a stxuctural relation within dependency trees, viz. x being the head of modifier y.</Paragraph>
    <Paragraph position="1">  given in the previous section, in terms of so-called word actors. Word actors communicate via asynchronous message passing; an actor can only send messages to other actors it knows about, its so-called acquaintances. The arrival of a message at an actor triggers the execution of a method, a program composed of grammatical predicates.</Paragraph>
    <Paragraph position="2"> The resolution of textual ellipses depends on the results of the foregoing resolution of nominal anaphors (Strube &amp; Hahn, 1995) and the termination of the semantic interpretation of the current utterance. It will only be triggered at the occurrence of the definite noun phrase NP when NP is not a nominal anaphor and (the referent of the) NP is only connected via certain types of relations (e.g., has-property, has-physical-part) 2 to referents denoted in the current utterance at the conceptual level.</Paragraph>
    <Paragraph position="3"> The protocol level of text analysis encompasses the procedural interpretation of the grammatical constraints from Section 4. We will illustrate the protocol for text ellipsis resolution (of. Fig. 1), referring to the already introduced text fragment (1) which is repeated at the bottom line of Fig. 1.</Paragraph>
    <Paragraph position="4"> (lc) contains the definite noun phrase &amp;quot;die Ladezeit&amp;quot;. Since &amp;quot;Ladezeit&amp;quot; (charge time) does not subsume any word at the conceptual level in the preceding text, the anaphora test fails; the definite noun phrase &amp;quot;die Ladezeit&amp;quot; has also not been integrated in terms of a significant relation into the conceptual representation of the utterance as a result of its semantic interpretation. Consequently, a SearchTextEllipsisAntecedent message is created by the word actor for &amp;quot;Ladezeit&amp;quot;. Message passing consists of two phases: 1. In phase 1, the message is forwarded from its initiator &amp;quot;Ladezeit&amp;quot; to the forward-looking centers of the previous sentence (an acquaintance of that sentence's punctation mark), where its state is set to phase 2.</Paragraph>
    <Paragraph position="5"> ZAssociated with the set of conceptual roles is the set of their inverses. This distinction becomes crucial for already established relations like has-property (subsuming chargetime, etc.) or has-physical-part (subsuming has-accumulator, etc.). These relations do no__tt block the triggering of the resolution procedure for textual ellipsis (e.g., ACCUMULATOR -- charge-time - CIIARGE-TIME), whereas instantiations of their inverses, we here refer to as POF-type relations, e.g., property-of(subsuming charge-time-of, etc.) and physical-part-of(subsuming accumulator-of, etc.), do (e.g., ACCUMULATOR- accumulator-of- 316LT). This is simply due to the fact that the semantic interpretation of a phrase like &amp;quot;the charge time of the accumulator&amp;quot; already leads to the creation of the POF-type relation the resolution mechanism for textual ellipsis is supposed to determine. This is opposed to the interpretation of its elliptified counterpart &amp;quot;the charge time&amp;quot; in sentence (1 c), where the genitive object &amp;quot;\[of O J~ the accumulat r\] is zeroed and, thus, the role charge-time-of remains uninstantiated.' 2. In phase 2, the forward-looking centers of the previous sentence are tested for the predicate PreferredConceptualBridge. null In this case, the instance 316LT (the conceptual referent of the nominal anaphor &amp;quot;der Rechner&amp;quot; (the computer), which has already been properly resolved) is related to CHARGE-TIME (the concept denoting &amp;quot;Ladezeit&amp;quot;) via a metonymic path, viz. (chargetime-of accumulator-of) indicating a whole-for-part metonymy, while the concept ACCUMULATOR is related to CHARGE-TIME via a plausible path (viz.</Paragraph>
    <Paragraph position="6"> charge-time-of). As plausible paths are the strongest type of conceptual paths, only an element which is more highly ranked in the centering list and is linked via a plausible path to the elliptical expression could be preferred as the elliptic antecedent of &amp;quot;Ladezeit&amp;quot; (charge time) over &amp;quot;Akku&amp;quot; (accumulator) according to the constraint from Table 6. As this can be excluded the remaining concepts associated with the current forward-looking centers (namely, TIME-UNIT-PAIR and POWER) need no longer be considered.</Paragraph>
    <Paragraph position="7"> Hence, &amp;quot;Akku&amp;quot; is determined as the proper elliptical antecedent 3. As a consequence, a TextEllipsis-AntecedentFound message is sent from &amp;quot;Akku&amp;quot; to the initiator of the SearchAntecedent message, viz.</Paragraph>
    <Paragraph position="8"> &amp;quot;Ladezeit&amp;quot;. An appropriate role filler update links the corresponding concepts via the role charge-time-of and, thus, local coherence is established at the conceptual level of the text knowledge base.</Paragraph>
    <Paragraph position="9"> In order to illustrate our approach under slightly varying conditions, consider text fragment (2): (2) a. Der 316LT geht sparsam mit Energie um.</Paragraph>
    <Paragraph position="11"> Ne~unabh~a-agig wird er fiir ca. 2 Stunden mit Strom versorgt.</Paragraph>
    <Paragraph position="12"> (In a power supply-independent mode it is - for approximately 2 hours - with power - provided.) Wenn die Taktfrequenz herabgesetzt wird, reicht die Energie sogar ~r 3 Stunden.</Paragraph>
    <Paragraph position="13"> (When the clock frequency - is reduced - suffices - the power - even for 3 hours.) Here, the elliptical expression &amp;quot;Taktfrequenz&amp;quot; (clock frequency) can tentatively be related to three antecedents in the preceding sentence: &amp;quot;er&amp;quot; (it) (which is an anaphoric expression for &amp;quot;316LT&amp;quot;), &amp;quot;Stunden&amp;quot; (hours), and &amp;quot;Strom&amp;quot; (power). Thus, in the path finding mode paths from CLOCK-MIIZ-PAIR (the conceptual representation for &amp;quot;Taktfrequenz&amp;quot;) to  sponding to the SearchTextEllipsisAntecedent message and of being tested as to whether they fulfill the required criteria for an elliptical relation.</Paragraph>
    <Paragraph position="14">  and POWER, respectively, are searched. As only a single well-formed role chain from CLOCK-MIIZ-PAIR to 316LT can be determined (viz. (clock-mhz-pair-of cpu-of motherboard-of central-unit-oJ) ), &amp;quot;316LT&amp;quot; is selected as the valid elliptic antecedent. Under these circumstances, conceptual linkage could not be established via a plausible path, but only via a metonymic path, corresponding to a whole-for-part metonymy.</Paragraph>
    <Paragraph position="15"> This is due to the fact that &amp;quot;Taktfrequenz&amp;quot; (clock frequency) (conceptualized as CLOCK-MHZ-PAIR) is a property of the CPU of COMPUTER-SYSTEM and, therefore, only a mediated property of computers as a whole (hence, the whole-for-part metonymy).</Paragraph>
    <Paragraph position="16"> Evaluation. A small-scale evaluation experiment was conducted on a test set of 109 occurrences of textual ellipses in 5 different texts taken from our corpus. The evaluation used our knowledge base from the information technology domain, which consists of 449 concepts and 334 relations. Among 46 (42.2%) false negatives (no resolution triggered though textual ellipsis occurs), the ellipsis handler encountered 42 (38.5%) cases of lacking concept specifications (half of which were gaps that can easily be closed, the other half constituted by &amp;quot;soft&amp;quot; concepts (e.g., referring to spatial knowledge) which are hard to get hold of). In 4 of the 46 cases the conceptual model was adequate but the triggering conditions were inappropriate.</Paragraph>
    <Paragraph position="17"> Among the 63 cases where the ellipsis handler started processing 60 were correctly analyzed (recall rate of 55.05%), 2 modelling bugs were encountered in the knowledge base, and one case was due to incorrect conceptual constraints. Considering the performance of the criteria we propose -- disregarding effects that come from deficient knowledge engineering, i.e. restricting the evaluation to the 63 cases run by the ellipsis handler -- the precision rate amounts to 95.2%.</Paragraph>
    <Paragraph position="18"> With respect to accuracy, however, we still have to consider the actual number of textual ellipses processed including false positives, i.e., cases where the for Text Ellipsis Resolution ellipsis resolution is carried out although no textual elm lipsis actually occurs. Altogether, the ellipsis handler was triggered 82 times, thus it was incorrectly triggered in 19 cases (23.2%). 12 of these 19 false positives were due to insufficiencies of the parsing process (it failed to create suitable semantic/conceptual representations blocking the triggering of the ellipsis handler). In 4 cases the anaphora resolution process failed to resolve an anaphor, thus leading to an incorrect call of the ellipsis handler, and in the 3 remaining cases the triggering condition was not restrictive enough. This condition gives an overall accuracy score of 73.2%.</Paragraph>
  </Section>
  <Section position="7" start_page="499" end_page="500" type="metho">
    <SectionTitle>
6 Comparison with Related Approaches
</SectionTitle>
    <Paragraph position="0"> As far as text-level processing is concerned, the framework of DRT (Kamp &amp; Reyle, 1993), at tirst sight, constitutes a particularly strong alternative to our approach. The machinery of DRT, however, might work well for (pro)nominal anaphora, but faces problems when elliptical text phenomena are to be interpreted (though Wada (1994) has recently made an attempt to deal withrestricted forms of textual ellipsis in the DRT context). This shortcoming is simply due to the fact that DRT is basically a semantic theory, not a full-tledged model for text understanding. In particular, it lacks any systematic connection to well-developed reasoning systems accounting for conceptual domain knowledge. Actually, the sort of constraints we considered seem much more rooted in encyclopedic knowledge than are they of a primarily semantic nature anyway.</Paragraph>
    <Paragraph position="1"> As far as proposals lot the analysis of textual ellipsis are concerned, none of the standard grammar theories (e.g., HPSG, LFG, GB, CG, TAG) covers this issue.</Paragraph>
    <Paragraph position="2"> This is not at all surprising, as their advocates pay almost no attention to the text level of linguistic description (with the exception of several forms of anaphora) and also do not seriously take conceptual criteria beyond semantic features into account. Hence their indetermination with respect to conceptually driven inferencing in the context of text understanding.</Paragraph>
    <Paragraph position="3">  Actttally, only few systems exist which deal with texttml ellipsis in a dedicated way. For example, the PUNDIT system (Palmer et al. (1986)) provides a fairly restricted solution in that only direct conceptual links between the concept denoted by the antecedent and the elliptical expression are considered (&amp;quot;plausible&amp;quot; paths of length 1, in our terminology). A pattern-based approach to infereucing (inchtding textual ellipsis) has also been put forward by Norvig 11989).</Paragraph>
    <Paragraph position="4"> The main dillerence to our work lies in the fact that these path patterns (to not take the compositional properties of relations into accotmt (e.g., transitive relations). Furthermore, numerical constraints like path length restrictions am posited without motivating their origin, whereas we state fomml well-formedness and empirical criteria the evidence for which is derived li'om psycholinguistic studies. The abduction-based approach to in ferencing underlying the TACITUS system (ltobbs et al. (1993)) also refers to weights and costs and, thus, shares some sinfilarity with Norvig's proposal (Hobbs ct al., 1993, p. 122). Moreover, the crucial problem still unsolved in this logically very principled framework concerns a proper choice methodology lor fixing appropriate costs for specific assmnptions on which, among other factors, textual ellipsis resolution is primarily based. The approach reportexl in this paper also extends our own previous work on textual ellipses (H~flm, 1989) by the incorporation of an elaboratexl model of ftmctional preferenccs on (/1 elements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML