File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1086_intro.xml
Size: 7,059 bytes
Last Modified: 2025-10-06 14:05:34
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1086"> <Title>REFERRING TO WORLD OBJECTS WITH TEXT AND PICTURES</Title> <Section position="2" start_page="0" end_page="530" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> From a speech act theoretical point of view, referring is a planncd action to achicve certalu go:ds (Appclt aud Kroafold, 1987). Although natur~ language may be the most conventional vehicle for referring, it has been widely acccpted that pictures cau be used ~s well. For example, Goodmann (1969) points out that pictures can be cmploycd to refer to both an individual object and the type of which an objcct is an exemplary of. Morcovcr, there arc good reasons to include pictures in refcrring acts. l'icturcs effectively convey discriminating object properties such as surface atlributes and shape. If au object can only be discriminated against alternatives through ils location, a picture may provide the spatted context of the object. Since depictions arc explicit material representations of the world objects to which they correspond, new attributes of the type 'being dcpicted as ...' arc iutroducc(l which, in ttlrn, provide an additiomd source for object discriminatiou (e.g., the knob which is reprcscnlcd by thc black circle ...). Last but not least, several graphical focusing tcchniqucs can bc applied to effcctivcly constraiu the set of alternatives (c.g., arrows, blinking). Unfortunately, there is also a dark side of the picture. An obvious drawback is that pictures do not provide for syutactical devices to distinguish between a reference-specifying and a predication-specifying part since objects and their properties are hardly separable once depict &quot;cd. Auothcr difliculty is that pictures lack the means to distinguish deliuitc from indefinite descriptions. Thus, it may remain unclear whcthcr a particular object or whether ~m ,-u-biUzu-y exemplary of a class is depicted. The conclusion we can draw from these considerations is that it often makes sensc to employ bofll text lind pictures when rcfcrriug to domain objects. Pictures may be used in order to simplify verbal reference expressions. On the other hand, ambiguitics of pictures cau be rcsolvcd by providing additional information throngh text. When an~dyzing illustrated documeuls such as assembly matmals and iustructions for use, diffcrcnt kinds of rcfcrring expression can be found: Multimedia referring expressions rcfcr to world objects via a combination of at least two media. :Each medium convcys somc discriminatlug attributcs which in sum ,allow for a proper identification of the intended object. Examples ~ue NL expressions that are accompanied by pointing gestures and text-picture combinations where the picture provides information about the appe~u'ance of au object mid the text restricts the visual search space as in &quot;the switch on the frontsidc&quot;.</Paragraph> <Paragraph position="1"> Anaphoric referring expressions refer to world objects in an abbreviated form (llirst, 1981) presuming that they are already explicitly or implicitly introduced in the discourse.</Paragraph> <Paragraph position="2"> Thc presentation part to which ,-m anaphoric expression refers back is called the antecedent of the referring expression. In a multimedia discourse, we have not only to h,'mdle linguistic anaphora with linguistic antecedents, but also linguistic anaphora with pictorial antecedents, mid pictorial anaphora with linguistic or pictorial m~tecedents. Ex,'unpies, such as &quot;the hatched switch,&quot; show that the boundary bctwcen multimedia referring expressions and ,'maphora is indistinct, llere, we have to consider whether the user is intended to employ all parts of a presentation for object disambiguation or whethcr one wants him to infer anaphoric rclations bctwcen them.</Paragraph> <Paragraph position="3"> Cross-media referring expressions do not refcr to world objects, but to document parts in other prcscnultiou mcdia (Wahlslcr et at., 1991). Examples of cross-media referring expressious are &quot;the upper left comer of the picture&quot; or &quot;Fig. x&quot;. in most c,'tses, cross-media referring cxprc,ssions are part of a complex multimedia referring expresssiou where they serve to direct the rc~lder's attention to part.s of a document that has ,also to be employed in order to find the intended referent.</Paragraph> <Paragraph position="4"> When viewing referring as a planned action, we have to specify which goals uuderly the use of different types of referring expressions. Appelt ,'rod Kronfeld (1987) distinguish between the literal goal and the discourse purpose of a refcrence act. Wherc~ls the literal goal is to establish mutuld belief between a speaker and a hearer that a particular object is being talked about, the discourse purpose is to make the hearer recognize what kind of identification is appropriate and to have him identify the referent accordingly. When addressing illustrated docmncnts, the question arises of what idcutification means when domain objects are referred to via pictures (,'rod text). As with h'mguage this varies from discourse to discourse. For exmnple, if the user is confronted with a picture showiug how to insert the filter of a coffee machine, he has to recognize whether any object with the feature 'being a liltcr' can be inserted or whctlter a particular object is lUCallt. Ill the first case, he has to idenlify the piclurc t)l~,jccl as all cxemphuy of a certain class whereas, ill tile second case, hc has to look for somethiug in lhe workl which tits the graphical depiction.</Paragraph> <Paragraph position="5"> lu other siluations, )dentil)cation involves establishing a kind of cohesive link between doeluneut parts. If Ihe user is coufrouled with a sequeuce of pictures showing an object lmm different angles, he has to recognize that in all pictures the same ol~jcct is depicted (pictorial anaphor with pictorial anlecedent). When re:aliug an utterance, such as &quot;the resistor in the ligurc above,&quot; he has to recognize au anaphoric rchttionship between the textual closer)p lion and Ihc graphical depiclion (linguistic anaphor with pictorial antecet&nt).</Paragraph> <Paragraph position="6"> Previous work on Ihc generation of rclc) ring expressions in a multimedia cnvirotuncnt has mainly cotlcclltrated Oil single refercnce phenomena, such as references to pictorial material via natural language and pointing gestures (Allo gayer et al., 1989; C.laasseu, 1992; Stock el al., 1993) and the generation of cross-media references lrom text to grlqfl> ics (McKcown ct al., 1992; Wahlster ct al., 1993). The aim of this paper is, however, to provide a more general model Iha! explains which kinds of corcferculial link bctweeu referring expressions, objects of the world :rod ol2iccts of the multimedia preseutalion have Io be established to ensure rite coutpreheusibility of at rclcrring expression.</Paragraph> </Section> class="xml-element"></Paper>