File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1308_metho.xml
Size: 19,288 bytes
Last Modified: 2025-10-06 14:10:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1308"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Resolution of Referents Groupings in Practical Dialogues</Title> <Section position="4" start_page="0" end_page="54" type="metho"> <SectionTitle> 2 Groupings of Referents </SectionTitle> <Paragraph position="0"> Several kinds of clues can specify that referents should be grouped together, or at least could be grouped together. These clues may occur at several language levels, from the noun phrase level to the rhetorical structure level. We have not explored in detail the different ways of groupings entities together in a discourse or dialogue. What is described here are just some of the phenomenon we got confronted with while developing a reference resolution module for a dialogue understanding system.</Paragraph> <Paragraph position="1"> square4 Explicit Coordination - The most basic way to explicitly express the grouping of two or more referents is using a connector such as and, or, as well as, etc.</Paragraph> <Paragraph position="2"> &quot;Good afternoon, I would like to book a single room and a double room&quot;</Paragraph> <Section position="1" start_page="0" end_page="54" type="sub_section"> <SectionTitle> square4 Implicit Sentential Coordination - An </SectionTitle> <Paragraph position="0"> implicit coordination occurs when two or more referents of the same kind are present in one sentence, without explicit connector between them. &quot;Does the hotel de la gare have a restaurant, like the Holiday Inn?&quot; square4 Implicit Discursive Coordination Such a coordination occurs when several reference are evoked in separate sentences.</Paragraph> <Paragraph position="1"> The grouping must be done based on rhetorical structuring. Here we consider short pieces of dialogue, admitting only one level of implicit discursive coordination. &quot;I would like an hotel close to the sea... I also need an hotel downtown... And the hotels have to accept dogs.&quot; square4 Repetitions/Specifications - In some particular cases, groupings make explicit a previous expression. For instance &quot;Two rooms. A single room, a double room&quot;.</Paragraph> </Section> </Section> <Section position="5" start_page="54" end_page="54" type="metho"> <SectionTitle> 3 Reference Domain Theory </SectionTitle> <Paragraph position="0"> We are willing to try a pragmatic approach to reference resolution in practical multimodal dialogues (Gieselman, 2004). For example we need to process frequent phenomena like ordinals for choosing in a list (discursive, or visual) or otherness when re-evoking old referents. Hence keeping the track of the way the context is modified when introducing a referent or referring, is mandatory. The Reference Domains Theory (Salmon-Alt, 2001) supposes that every act of reference is related to a certain domain of interpretation. It endorses the cognitive grammar concept of domain, defined as a cognitive structure presupposed by the semantics of the expression (Kumar et al., 2003). In other words, a referring expression has to be interpreted in a given domain, highlighting and specifying a particular referent in this domain. A reference domain is composed of a group of entities in the hearer's memory which can be discursive referents, visual objects, or concepts. It describes how each entity could be addressed through a referential expression.</Paragraph> <Paragraph position="1"> This theory views the referring process as a dynamic extraction of a referent in a domain instead of a binding between two entities (Salmon-Alt, 2000). Hence doing a reference act consists in isolating a particular entity from other rejected candidates, amongst all the accessible entities composing the domain (Olson, 1970).</Paragraph> <Paragraph position="2"> This dynamic discrimination relies on projecting an access structure focusing the referent in the domain. The domain then becomes salient for further interpretations. The preferences for choosing a suitable domain are inspired from the Relevance theory (Sperber & Wilson, 1986) taking into account such focalization and salience.</Paragraph> <Paragraph position="3"> Landragin & Romary (2003) have also studied the usage of reference domains in order to model a visual scene. The grouping factors for visual objects are those given by the Gestalt theory, proximity, similarity, and good continuation.</Paragraph> <Paragraph position="4"> Each perceptual groups or groups designated by a gesture could be the base domain for an extraction. Referential expressions work the same way either the domains are discursive, perceptual or gestural, they extract and highlight referents in these domains. See (Landragin et al., 2001) for a review of perceptual groupings.</Paragraph> </Section> <Section position="6" start_page="54" end_page="55" type="metho"> <SectionTitle> 4 Basic Type </SectionTitle> <Paragraph position="0"> A referential domain is defined by: * a set of entities accessible through this domain (ground of domain), * a description subsuming the description of all these entities (type of domain), * a set of access structures to these entities.</Paragraph> <Paragraph position="1"> For instance: &quot;the Ibis hotel (h1) and the hotel Lafayette (h2)&quot; forms a referential domain, whose type would be Hotel, and whose accessible entities would be h1 and h2, themselves defined as domains of type Hotel. These two hotels could be accessed later on by their names.</Paragraph> <Section position="1" start_page="54" end_page="54" type="sub_section"> <SectionTitle> 4.1 Access structures </SectionTitle> <Paragraph position="0"> We suppose that the distinction between the referents from the excluded alternatives requires highlighting a discrimination criterion opposing them. This criterion behaves like a partition of the accessible entities, grouping them together according to their similarities and their differences. A partition may have one of its parts focused. There are, at least, three kinds of discrimination criteria: * discrimination on description. Entities can be discriminated by their type, their properties, or by the relations they have with other entities. For example the name of the hotels is a discrimination criterion in &quot;the Ibis hotel and the hotel Lafayette&quot;.</Paragraph> <Paragraph position="1"> * discrimination on focus. Entities can also be discriminated by the focus they have when they are mentioned in the discourse or designed by a gesture. For example, &quot;this room&quot; would select a focused referent in a domain, whereas &quot;the other room&quot; would select a non-focused one.</Paragraph> <Paragraph position="2"> * discrimination on time of occurrence.</Paragraph> <Paragraph position="3"> Entities can finally be discriminated by their occurrence in the discourse. For example &quot;the second hotel&quot; would discriminate this hotel by its rank in the domain.</Paragraph> </Section> <Section position="2" start_page="54" end_page="55" type="sub_section"> <SectionTitle> 4.2 Classical resolution algorithm </SectionTitle> <Paragraph position="0"> Each activated domain belongs to list of domains ordered along their recentness (the referential space). The resolution algorithm consists of two phases: 1. Searching a suitable, preferred domain in the referential space when interpreting a referring expression. The suitability is defined by the minimal conditions the domain has to conform to in order to be the base of an interpretation (particular description, or presence of a particular access structure with focus or not). The main preference factor is the minimization of the access cost (recentness or salience), however other criteria like thematic structure could be taken into account and will be future work. Each domain is tested according to the constraints given by the referential expression. We allow several layers of constraints for each type of expression : if the stronger constraints are not met, then weaker constraints are tried.</Paragraph> <Paragraph position="1"> 2. Extracting a referent and restructuring the referential space, taking into account this extraction. It not only focuses the referent in its domain, but also moves the domain itself to a more recent place. When one referent acquires the focus, the alternative members of the same partition loose it.</Paragraph> <Paragraph position="2"> This generic scheme is instantiated for each type of access modes (a modality plus an expression). For example a definite &quot;the N&quot; will search for a domain in which a particular entity of type &quot;N&quot; can be discriminated, and the restructuring consists in focalizing in this domain the referent found. See (Landragin & Romary, 2003) for a description of the different access modes.</Paragraph> <Paragraph position="3"> The algorithm highlights the two types of ambiguities, domain or referent ambiguities, which occur when there is no preference available to make a choice between multiples entities in the first or the second phase. We guess that natural ambiguities should eventually be solved through the dialogue between the agents of the communication.</Paragraph> </Section> </Section> <Section position="7" start_page="55" end_page="56" type="metho"> <SectionTitle> 5 Super-Domains </SectionTitle> <Paragraph position="0"> In order to take groupings into account in the Reference Domains Theory, we introduce two constructs in our formal toolbox. Indeed, having only one kind of domain construct doesn't allow for a correct distinction between different referent statuses.</Paragraph> <Paragraph position="1"> First we distinguish plural and simple domains. The simple domains D serve as bases for profiling, or highlighting, a subpart, or related part of a simple referent. For instance, if D = Room, then one can profile a Price from D. The plural domains D* serve as either as a generic base or as a plural representative for profiling a simple domain D. A generic base is mandatory in our model to support the insertion of new extra-linguistic referents evoked with an indefinite construct (for instance &quot;I saw a black bird on the roof&quot;), while plural representatives are used for explicit groupings. A domain D*1 can also be profiled from a D*0, provided D*1 profiles a subset of the elements of D*0.</Paragraph> <Paragraph position="2"> Second, we introduce the notion of super-domain D+, from which a D* can be profiled. The relations allowed between domains are represented on figure 1. A super-domain D+ is the domain of all groupings D*, including a special D*all grouping which is the representative of all evoked instances of a given category. This configuration is not intended to deal with long dialogues where several, trans-sentential groupings occur, and where older groupings may become out of access. Doing this would require a rhetorically driven structuring of the D*all.</Paragraph> <Paragraph position="3"> As Reference Domain Theory is primarily targeted toward extra-linguistic referents occurring in practical dialogue, the construction of the domain trees, representing the supposed structuring of referents accessibility, is based on ontology. As a consequence, for each &quot;natural&quot; type and each subtype (for instance Room[?]Single), a domain tree is potentially created (actually, one can easily imagine how this creation may be driven 'on-demand').</Paragraph> <Paragraph position="4"> Another evolution from the initial Reference Domain Theory is the possibility to focalize several items of a partition. Indeed, since the resolution algorithm can focalize a whole plural domain, all elements of this domain must be focalized in all the plural domains they occur in. In order to refer to plural entities the idea is to build plural domains dynamically : when some sentence-level grouping, either implicit or explicit occurs or when a plural extra-linguistic referent is evoked, a D* is created and focussed in D+, with each of its components as children, when possible (that is, when each component is described). When new extra-linguistic referents (singular or plural) are evoked, they are individually profiled under the D*all corresponding to their types (that is, their &quot;natural&quot; type, and all the subtypes they are eligible to).</Paragraph> <Paragraph position="5"> In short, for all referents of type D: * they become subdomains of D</Paragraph> <Paragraph position="7"> up a focalized subdomain of D+ * all the referents of a given type are then grouped together under a new focalized subdomain of D+.</Paragraph> <Paragraph position="8"> Figure 2 illustrates the state of the Hotel+ domain tree after a scenario with three dialogue acts, the first one introducing Hotel1, the second one inserting a grouping of Hotel2 and Hotel3. is retrieved.</Paragraph> <Paragraph position="9"> One can see that Hotel*all is inaccessible by a generic expression like a demonstrative without modifiers but only by a special expression like &quot;all the hotels&quot;. In our point of view, the reason is that the grouping Hotel*1 lowers the salience of Hotel*all.</Paragraph> </Section> <Section position="8" start_page="56" end_page="57" type="metho"> <SectionTitle> 6 Implementation </SectionTitle> <Paragraph position="0"> We used description logics for modelling domains and domain-reasoning. One has to deal with plural entities and can follow (Franconi, 93) by using collection theory, representing collections as individuals and membership by a role (plus plural quantifiers). But we should use another way considering that the inference engine we use, Racer (Haarslev and Moller, 03), does not take into account ALCS. Hence we tried representing the domains by concepts, given their semantic are set of individuals. The domain D+ corresponds to the concept D, and the domain-subdomain relation is a subsumption. All basic manipulation with domains could be done using Tbox assertions.</Paragraph> <Paragraph position="1"> Additionnally, a partition structure is simply a sequence of subdomains which are different from each other (disjoint concepts) and whose elements could be focussed. The algorithm goes through the referential space and tests each domain in the recency order against the constraints given by the referential expression. Conceptual tests on the description and partitional tests on the focus or possible discriminations are made to retrieve the domain and the referent. If none are found, they may be created by accomodation. Groupings are created only for explicit coordinations, implicit sentential coordinations (two referents could be grouped if they have the same basic type) and some kind of specifications.</Paragraph> <Paragraph position="2"> Domains and groupings creation entails the creation of new concepts in the Tbox. Each concept insertion requires a costly reclassification, therefore we preferred an approximation considering only that new groupings assert primitive concepts. Other domains are concept terms i.e. descriptions which do not have to be asserted in the Tbox automatically.</Paragraph> <Paragraph position="3"> Implicit discursive groupings are not implemented considering the need of a rhetorical structure (like in SDRT, Asher 93) or a mental space model. The following example shows the</Paragraph> <Paragraph position="5"> the Lafayette hotel (h3).</Paragraph> <Paragraph position="6"> Hotel h1 could very hardly be grouped with h2 and h3, even by &quot;all these hotels&quot; (or maybe by a third speaker). We guess among other factors that they belong to different levels of interpretation, h1 in the domain of the desires of the user, and the others in the domain of existing hotels. The link between the two domains is possible if one knows that S1 is an answer of to U's request. Such discrimination criterion and high level domains are not yet implemented.</Paragraph> <Paragraph position="7"> Instead we concentrated on extra-linguistic referents which are assumed to be interpreted in the real/system world (like hotels, rooms). We are currently testing the approach to see if it could be extended to any type of entities provided accurate discrimination criteria (like the predication).</Paragraph> </Section> <Section position="9" start_page="57" end_page="57" type="metho"> <SectionTitle> 7 Example </SectionTitle> <Paragraph position="0"> A sample dialogue (table 1) is analyzed through the preceding algorithm. This example shows how the referents introduced in an explicit coordination could be referenced as a whole &quot;the two hotels&quot;, or extracted discriminately by an ordinal &quot;the second one&quot; or by an otherness expression &quot;the other one&quot;. All the subdomains of H+ (i.e. the plural domains of hotels) are indicated after each interpretation using a simplified notation. Only the ordered list of accessible entities and their focalization (bold) are noted for each subdomain. For instance H*all= (h1, h2, h3) means that the domain H*all is focalized in H+, and that h3 is focalized in H*all.</Paragraph> <Paragraph position="1"> In order to interpret U1, U2 or U3 one needs to rely on the previous structuring of H+. In U1, the previously focalized domain H*1 is preferred to be the base for interpreting &quot;the second one&quot; because of the order discrimination. This leads to extracting h1 hence focalizing it in H*1 but also in H*0 and in H*all. In U2, H*1 cannot be the base for interpreting &quot;the third one&quot; because no entity could be discriminate this way. Therefore the only suitable domain is H*all. It is also impossible to interpret U3 : &quot;the other one&quot; in H*1 because of the lack of a focus discrimination between h1 and h2.</Paragraph> <Paragraph position="2"> It is however possible to choose H*all for the domain of interpretation: the excluded referents h1 and h2 are unfocused while h3 gains focus.</Paragraph> <Paragraph position="3"> 8 Evaluation in progress This work is currently being evaluated in the MEDIA/EVALDA framework, a national understanding evaluation campaign. (Devillers et al., 04). It aims to evaluate the semantic and referential abilities of systems with various approaches of natural language processing. The results of each system are compared to manually annotated utterances transcribed from a Woz corpus in a hotel reservation task. For the referential facet, referential expressions (excluding indefinites, and proper names) are annotated by a semantic description of their referents.</Paragraph> <Paragraph position="4"> Our system which relies on a symbolic approach using deep parsing and description logics for semantic currently scores 64% (f-measure) for identifying and describing accurately the referents. We guess that such evaluation will be an occasion for us to test different hypothesis on reference resolution using domains (for exemple different criteria for grouping). However we do not have yet more precise results on plurals and ordinals specifically.</Paragraph> </Section> class="xml-element"></Paper>