File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/j94-2003_metho.xml
Size: 48,074 bytes
Last Modified: 2025-10-06 14:13:53
<?xml version="1.0" standalone="yes"?> <Paper uid="J94-2003"> <Title>Japanese Discourse and the Process of Centering</Title> <Section position="6" start_page="201" end_page="205" type="metho"> <SectionTitle> SUBJ OBJ </SectionTitle> <Paragraph position="0"> In example 4, the use of TOPIC marking in the phrase Ziroo wa of utterance (c) means that (c) is interpreted as a RETAINJ 2 Ziroo becomes the most highly ranked discourse entity for c, although Taroo is the Cb since Taroo was most highly ranked for utterance (b) (by Constraint 3). Then when we apply the Centering algorithm in (d), there are two candidates for the Cb(d) from the Cf(c), both Ziroo and Taroo. However, this time when constraint 3 applies, stipulating that the Cb must be the highest-ranked Marilyn Walker et al. Japanese Discourse element of Cf(c) realized in 4d, Ziroo must be the highest-ranked entity realized, and therefore must be the Cb. At this point it is clear that some kind of SHIFT is forced by the application of constraint 3. The two candidates are a SMOOTH-SHIFT and a ROUGH-SHIFT. The SMOOTH-SHIFT interpretation corresponds to the reading Ziroo invited Taroo to a movie whereas the ROUGH-SHIFT interpretation corresponds to the Taroo invited Ziroo reading. The SMOOTH-SHIFT interpretation is more highly ranked, thus considered more coherent and so is the preferred interpretation (Z = 10.93, p < .001).</Paragraph> <Section position="1" start_page="202" end_page="205" type="sub_section"> <SectionTitle> 2.4 The Centering Algorithm </SectionTitle> <Paragraph position="0"> The CENTERING ALGORITHM that was proposed by Brennan, Friedman, and Pollard incorporates the centering rules and constraints in addition to contra-indexing constraints on coreference (Reinhart 1976; Brennan, Friedman, and Pollard 1987; Iida 1993).</Paragraph> <Paragraph position="1"> These contra-indexing constraints specify that in a sentence such as He likes him, that he and him cannot co-specify the same discourse entity. The algorithm applies centering theory to the problem of resolving anaphoric reference. Application of the algorithm requires three basic steps.</Paragraph> <Paragraph position="3"> FILTER by constraints, e.g. contra-indexing,* sortal predicates, centering rules and constraints RANK by transition orderings In order to apply this algorithm to Japanese, possible Cb-Cf combinations (GEN-ERATE step 1) must be constructed from the surface string and information from the subcategorization frame of the verb. First the verb subcategorization is examined, and if there are more entities than appear in the surface string, zeros are postulated as forward centers. These zeros are then treated just like pronouns in English by the rest of the algorithm. We use a different ranking for the Cf for Japanese than for English, but this has no effect on the actual algorithm itself since the Cf ranking is a declarative parameter.</Paragraph> <Paragraph position="4"> The steps of the algorithm given above can be interleaved to improve computational efficiency. A simple implementation is to: * Never propose a Cf that violates linguistic constraints on contra-indexing. (In other words, apply the contra-indexing filter as early as possible to avoid Cb-Cf combinations that will be eliminated by that filter,) * If there are pronouns in an utterance, only propose pronouns as possible Cbs. (Collect the pronouns from the proposed Cfs as Cbs, from Rule 1.) In addition, it is simple to add additional filters to step (2) of the algorithm. For instance, any constraint that is lexically specified such as \[+-animacy\] can be easily applied as a filter. It is also possible to pursue a 'best first' strategy by interleaving steps (1), (2), and (3) so that a CONTINUE will be found without extra processing if one exists.</Paragraph> <Paragraph position="5"> In example 5, we illustrate in more detail how the steps of the algorithm work and the difference between CONTINUE and RETAIN. Each utterance shows what the Cb and Cf would be for that utterance. We will mostly be concerned with the process of resolving the two zeros in utterance 5c.</Paragraph> <Paragraph position="6"> Example 5c has explained as the main verb, which requires an animate subject and object2. Since there are two animate zeros in 5c, which are also contra-indexed by syntactic constraints, both Ziroo and Taroo must be realized in 5c. Constraint (3) restricts the Cb to Taroo as the highest-ranked element from the Cf(Sb). The interpretive process must also generate the possible candidates for the Cf. If no constraints applied, then all four candidates shown above as Cfl, Cf2, Cf3, and Cf4 would be possible. However, the contraindexing filter will rule out Cf3 and Cf4. As mentioned above, there is no reason that these filters cannot be applied at the GENERATE phase rather than later on.</Paragraph> <Paragraph position="7"> The only CONTINUE interpretation available, Taroo explained the newly equipped functions to John, corresponds to the forward centers Cfl. It is a CONTINUE interpretation because Cb(5c) = Cb(5b) and also Cb(5c) = Cp(5c). The RETAIN interpretation is less preferred and is defined by the fact that Cb(5c) = Cb(5b), but Cb(5c) ~ Cp(5c). This example supports the claim that a CONTINUE is preferred over a RETAIN(Z ~- 13.24, p < .001).</Paragraph> <Paragraph position="8"> In order to find this preferred continue interpretation in a 'best first' fashion, Taroo as the Cp(Ui-1) would be tried first as the Cb(Ui), and as the interpretation for the subject. Contraindexing rules out Taroo as the object, so John would be tried next as the object.</Paragraph> <Paragraph position="9"> In the next section, we examine further the application of centering to the interpretation of zeros in Japanese. We will examine the ranking of forward centers that we have adopted for Japanese and explain how this is partially determined by the way the Japanese language allows a speaker to express discourse functions. We will Marilyn Walker et al. Japanese Discourse also give some examples of the interpretation of zeros in cases involving Japanese discourse markers for TOPIC and EMPATHY.</Paragraph> <Paragraph position="10"> 3. Centering in Japanese The theory of centering is a formal specification that is intended to model attentional state and is defined by the rules and constraints given in Section 2.1. Attentional state in turn constrains the discourse participant's interpretation process; one aspect of attentional state is the notion of discourse salience. In the centering model, the ordering of the forward centers is an approximation of discourse salience. This in turn is the main determinant of discourse interpretation processes such as the resolution of zeros in Japanese. A crucial question then is what discourse factors must be considered to determine the ordering of the forward centers, Cf, in Japanese discourse. Being a subject has been shown to be an important factor for English; this is reflected in a Cf ordering by grammatical function (Prince 1981b; Brennan, Friedman, and Pollard 1987; Hudson-D'Zmura 1988; Brennan submitted). Aspects of surface order may also affect the interpretation (Di Eugenio 1990; Hajicova and Vrbova 1982). An interpretation algorithm can also use pronominalization as an indicator of what the speaker believes is salient (Grosz, Joshi, and Weinstein unpublished). Furthermore, zeros in Japanese are not realized syntactically so that there must be a way to distinguish zeros from other entities inferred to be part of a discourse situation. Consider: This sentence is not felicitous unless the addressee has already been given some information about the person that Taroo met, either in the current discourse or in previous discourses. In contrast, nonsubcategorized-for arguments such as adjuncts are not necessarily given a specific interpretation, but rather are given a nonspecific one.</Paragraph> <Paragraph position="11"> The sentence means that Taroo met Hanako at some time in some place: the temporal location of the meeting situation need not be specified. The speaker can utter this sentence even if the addressee does not know where and when Taroo met Hanako. Thus, in this work, we only represent obligatorily subcategorized arguments of the verb on the Cf, assuming that the salience of discourse entities is partially determined by virtue of filling a verb's argument role, and the information from the subcategorization frame is used to determine that a zero is present in an utterance. Zeros are then interpreted with reference to the current context. Prince has proposed that the current context should be categorized by ASSUMED FAMILIARITY (Prince 1981b; Horn 1986), with a concomitant goal of determining the correlation between the use of certain linguistic forms and the types of assumed familiarity. The first division of assumed familiarity is into the subtypes of NEW, INFERABLE, and EVOKED. Computational Linguistics Volume 20, Number 2 NEW can be divided into BRAND-NEW, discourse entities that are both new to the discourse and new to the hearer, and UNUSED, discourse entities old to the hearer but new to the discourse. The information status of EVOKED can be further divided into TEXTUALLY EVOKED, old in the discourse and therefore old to the hearer as well, and SITUATIONALLY EVOKED, entities in the current situation. INFERABLES are technically both hearer-new and discourse-new but depend on information that is old to the hearer and the discourse, and are often treated by speakers as though they were both hearer-old and discourse-old. There is a hierarchy of assumed familiarity in terms of discourse salience:</Paragraph> </Section> </Section> <Section position="7" start_page="205" end_page="209" type="metho"> <SectionTitle> Assumed Familiarity Hierarchy (Prince 1981b): TEXTUALLY EVOKED > SITUATIONALLY EVOKED > INFERABLE > UNUSED > BRAND-NEW </SectionTitle> <Paragraph position="0"> Zeros typically refer to EVOKED entities, 13 but there is a scale of relative salience among the EVOKED entities. In our theory this is modeled with Cf ranking. We repeat the proposed ranking of the Cf here and justify it in the following sections: 14 Cf Ranking for Japanese (GRAMMATICAL OR ZERO) TOPIC > EMPATHY > SUBJECT > OBJECT2 > OBJECT > OTHERS The relevance of the notions of TOPIC and speaker's EMPATHY to centering is that a discourse entity realized as the TOPIC or the EMPATHY LOCUS is more salient and should be ranked higher on the Cf. Whenever a discourse entity simultaneously fulfills multiple roles, the entity is usually ranked according to the highest ranked role.</Paragraph> <Paragraph position="1"> In the following sections we will discuss the motivation for this ranking. Section 3.1 discusses the role of the grammatical topic marker wa in Japanese. Section 3.2 explains the role of EMPATHY in Japanese discourse salience and shows that (GRAMMATICAL OR ZERO) TOPIC > EMPATHY and that EMPATHY > SUBJ. Section 3.2.1 shows how the centering algorithm handles utterances with empathy loci. Zero topics will not be discussed until Section 5.</Paragraph> <Section position="1" start_page="205" end_page="207" type="sub_section"> <SectionTitle> 3.1 Topic </SectionTitle> <Paragraph position="0"> Discourse entities that are EVOKED, INFERABLE, or UNUSED can be marked as the TOPIC.</Paragraph> <Paragraph position="1"> The speaker cannot mark an entity as the grammatical TOPIC unless the hearer is aware of the object that s/he is going to talk about (Prince 1978a; Kuno 1976b). For example: Example 8 Hutari wa paatii ni kimasita.</Paragraph> <Paragraph position="2"> two-person TOP/SUBJ party to came Speaking of two persons, they came to the party.</Paragraph> <Paragraph position="3"> 13 Under certain circumstances that we cannot explore here, it appears that zeros can at times be used to refer to inferable or unused entities, just as pronouns in English sometimes can be. 14 This ranking resembles Kuno's Empathy Hierarchy and Kameyama's Expected Center Order, but we distinguish two kinds of TOPIC and we posit that OBJECT2 is more salient than OBJECT. We continue Kuno's use of the term EMPATHY to represent the EMPATHY LOCUS, whereas Kameyama used the property IDENT for EMPATHY (Kameyama 1988).</Paragraph> <Paragraph position="4"> Marilyn Walker et al. Japanese Discourse Example 8 is felicitous only when hutari ('two persons') is understood as meaning the two people under discussion. The sentence never means that the people who came to the party numbered two.</Paragraph> <Paragraph position="5"> The fact that the wa-marked entity should be discourse-old is also shown by the fact that a wh-question cannot be answered with a wa-marked NP.</Paragraph> <Paragraph position="6"> What the question context shows is that even in a simple declarative sentence, the use of the topic marker wa contrasts with the subject marker ga in what is understood as already in the discourse context. For instance, in a discourse initial utterance, 10a, assumes no shared information or that someone defended Ziroo and asserts that the someone is Taroo. In 10b, the discourse-old proposition is that Taroo did something and what is asserted is that what he did was defend Ziroo.</Paragraph> <Paragraph position="7"> Tokyoo e wa Hanako ga ittadeg Tokyo to TOP Hanako SUBJ went To Tokyo, Hanako went.</Paragraph> <Paragraph position="8"> The assumption that the TOPIC is more salient than the SUBJECT, when the two are different, is supported by the fact that an indefinite NP in subject position such as who, which, or somebody cannot be regarded as the TOPIC: an indefinite NP is never marked by the topic marker wa, but by the subject marker ga. For example: It is clear from these examples that the grammatical topic, wa-marked entity, in Japanese, represents assumable shared information in an ongoing conversation. It has been taken to be the 'theme' or 'what the sentence is about' (Kuno 1973; Shibatani 1990). In our framework, this is the role of the Cb. We will provide evidence supporting this position in Section 4. However, we claim that this is just a default and that other factors can contribute to establishing or continuing an entity as the Cb. Kuno also claims that a zero subject is equivalent to a wa-marked entity, and we provide support for this claim in Section 5, showing that the property of having previously been the Cb, in combination with being realized by a zero, contributes to an entity being the Cp.</Paragraph> </Section> <Section position="2" start_page="207" end_page="209" type="sub_section"> <SectionTitle> 3.2 Empathy </SectionTitle> <Paragraph position="0"> Kuno (1976b) proposed a notion of EMPATHY in order to present the speaker's position or identification in describing a situation. In a hugging situation involving a man named Taroo and his son Saburoo, Kuno notes that this situation can be described in various ways, some of which are shown in example 15.</Paragraph> <Paragraph position="1"> Example 15 a. Taroo hugged Saburoo.</Paragraph> <Paragraph position="2"> b. Taroo hugged his son.</Paragraph> <Paragraph position="3"> c. Saburoo's father hugged him.</Paragraph> <Paragraph position="4"> These sentences differ from each other with respect to camera angle, the position that the speaker takes to observe and describe this situation. In 15a, the speaker is assumed to be describing the event objectively: the camera is placed at the same distance from both Taroo and Saburoo. On the other hand, the camera may be placed closer to Taroo in 15b and closer to Saburoo in 15c. This is shown by the use of relational terms such as son and father, respectively. The term EMPATHY is used for this camera angle, which indicates the speaker's position among the participants in the event describedY 15 The speaker's position is not determined by his physical proximity, but rather is measured by the emotional or social relationship. In this sense, the term speaker's identification (Kuno 1976b) may be more suitable than the term speaker's position. Furthermore, the notion of EMPATHY is different from that of perspective (Iida 1993). Empathy is the speaker's identification with a discourse entity, but the speaker does not have to take the perspective of the person who he empathizes with. For example, consider the following utterance: (i) Taroo wa Hanako ni migigawa no hon o totte-kureta.</Paragraph> <Paragraph position="5"> Taroo TOP/SUBJ Hanako OBJ2 right GEN book OBJ take-gave Taroo did Hanako a flavor in taking a book on his~her right.</Paragraph> <Paragraph position="6"> In this example, the speaker empathizes with Hanako as indicated by the empathy verb kureru, yet he still can describe the given situation from Taroo's perspective, which is indicated by ambiguity in the interpretation of the deictic expression migigawa no ('right of').</Paragraph> <Paragraph position="7"> Marilyn Walker et al. Japanese Discourse In Japanese the realization of speaker's empathy is especially important when describing an event involving giving or receiving. There is no way to describe a giving and receiving situation objectively (Kuno and Kaburaki 1977). In 16, the use of the verb kureru indicates the speaker's empathy with Ziroo, the discourse entity realized in object position, while in 17, the speaker's empathy with the subject Taroo is indicated by the use of the past tense form yatta of the verb yaru.</Paragraph> <Paragraph position="8"> identifies with. In other words, the verb kureru has the EMPATHY LOCUS on the object, while verbs like yaru place the EMPATHY LOCUS on the subject.</Paragraph> <Paragraph position="9"> The use of deictic verbs such as kuru ('come'), iku ('go'), okuru ('send to'), and yokosu ('send in') also encode the speaker's empathy. For example, the speaker indicates empathy with Taroo by using the past tense form kita of the verb kuru in the following example.</Paragraph> <Paragraph position="10"> ductive verb-compounding operation by which these empathy-loaded verbs are used as the auxiliary verb, attaching to the main verb} 6 For example, kureru can be used as a suffix, to mark OBJ or OBJ2 as the EMPATHY LOCUS. The attachment of yaru marks SUBJECT as the EMPATHY LOCUS. The complex predicate made by this operation inherits the EMPATHY LOCUS of the suffixed verb. For example: Hanako did Taroo a favor in reading a book. EMPATHY = OBJ2 = TAROO In this case Taroo is interpreted as the EMPATHY LOCUS because of the auxiliary kureta attached to the main verb. Similarly in example 20, the speaker indicates empathy with Hanako by using the past tense form yatta of the verb yaru as an auxiliary verb to the main verb tazuneru.</Paragraph> <Paragraph position="11"> 16 Certain intransitive verbs cannot be made into empathy-loaded verbs since the empathy-loaded versions make no sense, e.g. moreru (leak). Taroo did a stranger a favor in lending him some money.</Paragraph> <Paragraph position="12"> The contrast between 21, 22, and 23 demonstrates that the use of a BRAND-NEW entity in the EMPATHY LOCUS position of the verb give is not acceptable. Therefore an entity in the EMPATHY LOCUS position is ranked in a higher position on the Cf than the subject.</Paragraph> <Paragraph position="13"> model EMPATHY as a language-specific discourse factor by adding the EMPATHY-marked discourse entity to the Cf ranking. Then preferences for CONTINUE over RETAIN when EMPATHY is involved can be demonstrated, as in example 24 below: 17</Paragraph> </Section> </Section> <Section position="8" start_page="209" end_page="212" type="metho"> <SectionTitle> SUBJ OBJ </SectionTitle> <Paragraph position="0"> In 24c, the verb invited requires an animate subject and object, and these must be realized by different discourse entities because of the contraindexing constraint.</Paragraph> <Paragraph position="1"> Hanako is the most highly ranked entity from 24b that is realized in 24c, and therefore must be the Cb. The preferred interpretation is therefore she invited him to a movie (Z = 5.25, p < .001). This corresponds to Cfl, the more highly ranked CONTINUE transition, in which Hanako is the preferred center, Cp. This interpretation can be found with minimal processing by trying the Cp(24b), Hanako, as the Cb(24c), by interpreting the subject zero as Hanako. This gives a CONTINUE transition. Then contraindexing constraints mean that Hanako cannot fill both argument positions, so the object position is interpreted as Taroo. This interpretation is found with minimal processing by interleaving the steps of the Centering algorithm proposed in Brennan et al. (1987).</Paragraph> <Paragraph position="2"> Note that nothing special needs to be said about the fact that EMPATHY is the discourse factor that made Hanako the Cp in 24b and thus predicted that Hanako would be the Cb at 24c (pace Brennan, Friedman, and Pollard 1987). The preference in the interpretation follows from the distinction between CONTINUE and RETAIN and the ranking of Cf. Thus, the centering framework is easily adapted to handle this language-specific feature.</Paragraph> <Section position="1" start_page="209" end_page="212" type="sub_section"> <SectionTitle> 3.3 Topic and Empathy </SectionTitle> <Paragraph position="0"> In general the assignment of the EMPATHY relationship is pragmatic. It is determined by the speaker's relation to the discourse participants in the discourse. In 24, for example, the EMPATHY relationship between the speaker and Hanako and between the speaker and Taroo is clear: the use of the empathy verb in the second sentence indicates that the speaker is closer to Hanako than to Taroo.</Paragraph> <Paragraph position="1"> However, besides cases where the speaker clearly expresses who s/he empathizes with, it is also possible for the context to provide some information about the speaker's proximity relationship with discourse participants in the given discourse, so that the hearer can determine the EMPATHY relation that the speaker has in mind. In this paper, we only consider cases where EMPATHY is syntactically marked by the use of empathy-loaded verbs.</Paragraph> <Paragraph position="2"> Kuno's notion of EMPATHY is more general. For instance, Kuno's EMPATHY HIERAR-CHY consists of different scales for EMPATHY that include notions such as TOPIC and SPEAKER (Kuno 1987). Kuno's Topic Empathy Hierarchy suggests that the discourse entity realized as the TOPIC will often coincide with the EMPATHY LOCUS: Topic Empathy Hierarchy: Discourse-Topic > Discourse-Nontopic Given an event or state that involves A and B such that A is coreferential with the topic of the present discourse and B is not, it is easier for the speaker to empathize with A than with B.</Paragraph> <Paragraph position="3"> In support of Kuno's claim, we have found that when no empathy relation is clearly indicated and no topic has been clearly established that it is difficult for a Marilyn Walker et al. Japanese Discourse The TOPIC Mitiko is preferred as the unexpressed subject of the (b) sentence in example 27. TM On the other hand, the subject Mitiko is not strongly preferred, as shown in example 28: the zero in the second sentence in 28 is understood as referring to either Mitiko or my wife. That is, the possible interpretation in these examples shows that the NP my wife, which is realized as the EMPATHY LOCUS, is not as salient as the TOPIC. 19 So why is it easier to empathize with a discourse entity that has been the topic as Kuno demonstrates? It seems important to keep the notions of TOPIC and EMPATHY separate, but in Section 5.1 we will demonstrate an effect where the topic entity is interpreted as the empathy locus. We claim that the ranking of the Cf and the potential for a CONTINUE interpretation determines whether this effect will hold. In other words, the tendency for the topic entity to be interpreted as the empathy locus follows from more general discourse processing factors, such as a hearer preferring CONTINUE transitions within a given local stretch of discourse.</Paragraph> </Section> <Section position="2" start_page="212" end_page="212" type="sub_section"> <SectionTitle> 3.4 Summary </SectionTitle> <Paragraph position="0"> To summarize, we have outlined the roles of discourse markers such as those for TOPIC and EMPATHY by which Japanese grammaticizes some aspects of discourse function, and we have argued that TOPIC and EMPATHY markers can only be used on entities that are already in the discourse context.</Paragraph> <Paragraph position="1"> One factor that hasn't been discussed is the role of pronominalization, but many researchers have argued that discourse entities realized by pronouns are more salient than other discourse entities (Clark and Haviland 1977; Grosz, Joshi, and Weinstein unpublished; Kuno 1976b, 1987). We take zeros in Japanese to be analogous to pronouns in English in this respect. Since pronominalization can apply at any position in the ranking of the Cf, the role of its contribution is particularly interesting when it is in conflict with some other factor such as grammatical function or topic marking. This will be discussed further in Section 5.</Paragraph> </Section> </Section> <Section position="9" start_page="212" end_page="221" type="metho"> <SectionTitle> 4. Initial Center Instantiation INITIAL CENTER INSTANTIATION is a process by which a discourse entity introduced in </SectionTitle> <Paragraph position="0"> a segment-initial utterance becomes the Cb. In our framework, this happens as a side effect of the Centering Algorithm. Typically, when an interpretation is found for the second utterance in a discourse segment, the Cb becomes instantiated. 2deg The Cb of an initial utterance Ui is treated as a variable that is then unified with whatever Cb is assigned to the subsequent utterance Ui+l.</Paragraph> <Paragraph position="1"> Typically, a discourse entity is introduced as a ga-marked subject, and then is referred to by a zero in a subsequent utterance (Clancy and Downing 1987). Consider example 29.</Paragraph> <Paragraph position="2"> 18 The zero may be interpreted as indirectly referring to the speaker. This interpretation is always possible when the verb kureru is used: the use of kureru implies that the speaker is closer to the beneficiary argument (i.e. the 0-marked NP in these examples), and the favor given to this person is understood as a benefit to the speaker as well. 19 Although it seems as though empathy isn't higher than subject, the conflating factor is that topic marking establishes a Cb, whereas in 28 no Cb has been established. This is explained in detail in Section 4. 20 In Walker, Iida, and Cote (1990) we called this Center Establishment. Henceforth we will refer to this process as Center Instantiation in order to avoid confusion with Kameyama's term center establishment, which is a different mechanism in her theory (Kameyama 1985).</Paragraph> <Paragraph position="3"> Using Taroo as the subject in example 29a is not enough to establish this discourse segment as being about Taroo. It is the use of the zero in example 29b that serves to instantiate Taroo as the Cb. By our definition of CONTINUE, 29b is a continue transition, because Cb(29b) = Cp(29b) and there was no Cb in 29a. However, Kuno argues that referring to a discourse entity with a zero is equivalent to marking it as the grammatical topic with wa (Kuno 1972). Our interpretation of this argument is that the use of wa in a discourse-initial utterance instantiates the wa-marked entity as the Cb in one utterance.</Paragraph> <Paragraph position="4"> This claim is supported by the contrast with the GA-WA alternation in examples 30 and 31, where there is a shift in interpretation depending on whether Taroo is marked with wa in the first sentence. 21 In example 30, Taroo is introduced by ga. In this case, it appears that there is a tendency due to lexical semantics to instantiate Ziroo as the Cb in the second utterance. 22 By the centering definitions, taking either Taroo or Ziroo to be the Cb can result in a CONTINUE interpretation. However, assuming that the Cf ordering at example 30a is correct, constraint 3 is violated by the preferred interpretation of 30b. Since both of the entities in Cf(30a) are realized, the Cb in example 30b should be the most highly ranked one. There are two possible conclusions here: (1) In discourse-initial utterances, when 21 These examples were tested by asking survey participants to indicate preference rankings. The numbers given here are only for those subjects who expressed strong preferences; some subjects expressed no preference.</Paragraph> <Paragraph position="5"> 22 The number of subjects here are too small to test statistically.</Paragraph> <Paragraph position="6"> Marilyn Walker et al. Japanese Discourse no clear indication of topic is given, the Cf ordering alone is not a strong constraint; (2) the ordering of the Cf should be partly determined by lexical semantics or other knowledge about the situation being described. However, compare example 30 with example 31.</Paragraph> <Paragraph position="7"> The use of wa in example 31 seems to override the semantic preference that was exhibited in example 30, so that subjects now prefer an interpretation in which Taroo is the Cb. 23 This shows that Taroo has not been instantiated as the Cb when it is time to interpret the two zeros in example 30b. We explain the contrast by assuming that the TOPIC instantiates the Cb when it is first introduced in a discourse-initial utterance, as in example 31a. Then the only way to get a CONTINUE interpretation for 31b is for Taroo to be the Cb at 31b.</Paragraph> <Paragraph position="8"> Furthermore, we can detect no differences in the interpretation of the final utterance between three utterance sequences in which an entity is introduced by wa, and four utterance sequences in which an entity is first introduced by ga and then realized by a zero in the second utterance. This provides further support for the claim that the status of discourse entities realized as grammatical topics and those realized as zero subjects is equivalent.</Paragraph> <Section position="1" start_page="214" end_page="215" type="sub_section"> <SectionTitle> 4.1 Summary </SectionTitle> <Paragraph position="0"> In sum, we have argued that the use of wa in a discourse-initial utterance instantiates the wa-marked entity as the Cb. Cb instantiation can equivalently be done with a twoutterance sequence in which the entity is first introduced as a subject, ga-marked, and then established as the Cb in the following utterance with a zero referring to that entity.</Paragraph> <Paragraph position="1"> In addition, the fact that the Cb is uninstantiated in discourse initial utterances has the effect that the Cf ranking in a discourse initial utterance is not a strong constraint, as it is once a Cb is established.</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 20, Number 2 The rule of ZERO TOPIC ASSIGNMENT defines our distinction between grammatical topic and zero topic. This rule allows a zero that has just been the Cb to continue as the Cp, even when it is not realized in a discourse-salient syntactic position such as subject. We will demonstrate this with examples that realize both grammatical and zero topics. In these cases, the discourse situation is such that the hearer may maintain multiple hypotheses about where the speaker's attention is directed, and must determine whether to apply the default that the grammatical topic is usually the Cp. 24</Paragraph> </Section> <Section position="2" start_page="215" end_page="217" type="sub_section"> <SectionTitle> Zero Topic Assignment </SectionTitle> <Paragraph position="0"> When a zero in Ui+l represents an entity that was the Cb(Ui), and when no other CONTINUE transition is available, that zero may be interpreted as the ZERO TOPIC of Ui+l.</Paragraph> <Paragraph position="1"> What this means is that, in certain discourse environments, the entity that was previously the Cb is predicted to continue as the Cb. We conjecture that ZTA is applicable in all free word-order languages with zeros. 25 However, ZERO TOPIC ASSIGNMENT is optional; here we have suggested two constraints on when it applies. We will give examples below of cases where it doesn't apply.</Paragraph> <Paragraph position="2"> The option of ZERO TOPIC ASSIGNMENT (henceforth ZTA) has been overlooked in previous treatments of zeros in Japanese. ZTA explains why the discourse entity Hanako, which is realized as OBJECT2 in example 32c is interpreted as the SUBJECT of example 32d.</Paragraph> <Paragraph position="3"> The possibility of ambiguity as to the attentional state of the speaker is reflected in the fact that there are two possible Cfs for example 32c; Cf2 of 32c is the only Cf possible without ZTA, and represents a RETAIN rather than a CONTINUE. By the formulation of the ZTA rule above, ZTA is triggered by the fact that no CONTINUE transition is available.</Paragraph> <Paragraph position="4"> The availability of ZTA means that HANAKO can be the Cp even when MITIKO is realized as the subject. This leads to a potential ambiguity in example 32d, because it is possible for a hearer to simultaneously entertain both of the Cf(32c). In this case the ZTA interpretation is preferred (Z = 4.95, p < .001). The less preferred SMOOTH-SHIFT interpretation would result from the algorithm's application to Cf2 of 32c. 26 ZTA explains the contrast between the discourse segments,in examples 32 above and 33 below. The only difference between 32 and 33 is that in 32c, MITIKO is a ga-marked subject, whereas in 33c, MITIKO is a wa-marked subject/grammatical topic. Utterances 32c and 33c have the same meaning. This minimal pair provides a test to see whether ZTA actually characterizes these discourse related effects.</Paragraph> <Paragraph position="5"> The wa marking has the predicted effect. Using the grammatical topic marker wa in example 33c dampens ZTA and thus affects the interpretation of example 33d, which is now completely ambiguous (Z = 0.34, not significantly different than chance). Because the discourse entity realized as the grammatical topic and indicated by the wa-marked NP is the Cp by default, ten subjects who previously did so can no longer get an interpretation that depends on ZTA. It seems that the situation can be characterized as a case of competing defaults, so that in example 33, some hearers apply the default that the wa-marked entity is the Cp, and others apply the default that CONTINUE interpretations are preferred and that zeros realize discourse entities that are ranked highly on the Cf.</Paragraph> <Paragraph position="6"> The RETA1N interpretation in example 33c, Cf2, indicates that these hearers expect the conversation to shift to being about Mitiko; the fact that Mitiko is the Cp(33c), along with constraint 3 will force a shift. Given a SHIFT, the Mitiko invited Hanako to lunch interpretation is preferred because it is the more highly ranked SMOOTH-SHIFT transition. 27 These examples clearly show that the wa-rnarked NP is not always the Cp and support Shibatani's claim that the interpretation of wa depends on the discourse context (Shibatani 1990). The astute reader will have noticed that in the cases where Hanako is a zero topic, the wa-marked Mitiko discourse entity is ranked according to grammatical function. We conjecture that an inference of contrast is supported when the grammatical topic is not the Cp.</Paragraph> <Paragraph position="7"> The following section discusses the interaction of ZTA with empathy. Then in Section 5.2, we discuss further the ramifications of our distinction between grammatical and zero topic.</Paragraph> </Section> <Section position="3" start_page="217" end_page="218" type="sub_section"> <SectionTitle> 5.1 Empathy and Zero Topic Assignment </SectionTitle> <Paragraph position="0"> This section investigates the interaction of EMPATHY and ZERO TOPIC ASSIGNMENT (ZTA).</Paragraph> <Paragraph position="1"> The discourse segment in example 34 is a minimal pair with that in example 35. In 34d the main verb is setumeisita ('explain') without any EMPATHY marking, whereas in 35d, the same sentence occurs with an auxiliary empathy verb as setumeisitekureta.</Paragraph> <Paragraph position="2"> Remember that kureta marks the OBJ or OBJ2 as the EMPATHY LOCUS.</Paragraph> <Paragraph position="3"> 27 If MITIKO could represent a topic object in 33d, there would be another equally ranked SMOOTH-SHIFT interpretation for 33d. However, according to the formulation of ZERO TOPIC ASSIGNMENT, MITIKO cannot be a zero topic because it was not the Cb of the previous utterance, 33c.</Paragraph> <Paragraph position="4"> The interpretations of example 34d show that it is possible for some subjects to interpret Taroo as the zero topic in example 34c. This is possible because Taroo was both the Cp and the Cb for 34a and 34b. The two Cfs of 34c reflect multiple possibilities in attentional state. 28 The competing defaults consist of the assumption that ZTA applies, versus the assumption that subjects are more highly ranked than objects on the Cf. In this case no preference between the two interpretations can be demonstrated (Z = 1.79, not significant).</Paragraph> <Paragraph position="5"> Example 35 is a minimal pair with example 34. In 35d, the speaker provides more syntactic information by using the empathy verb kureta to indicate that the discourse entity realized as the OBJECT2 is the EMPATHY lOCUS.</Paragraph> <Paragraph position="6"> Empathy associates with the previous Cb to yield a CONTINUE transition, and the interpretation changes so that the utterance is no longer ambiguous (Z = 16.24, p < .001). In this case it is possible to interpret both example 35c and example 35d as CONTINUES by assuming ZTA at 35c. This example also validates ZTA because empathy associates with the zero topic (Kuno 1976b, 1987). Furthermore, this minimal pair highlights aspects of the interaction between syntax and inference. The fact that the empathy verb in 35d is the only difference between examples 34 and 35 shows that the preference in interpretation does not follow from inferences based on information about who is likely to explain what to whom, depending on who showed *who the data, or whether the data is new or old.</Paragraph> <Paragraph position="7"> Example 36 contrasts minimally with example 35 but on another dimension. In this case, 36c is a CONTINUE with Taroo realized in subject position, rather than a CONTINUE based on ZTA. The Ziroo explained to Taroo interpretation is again clearly preferred here as in 35d (Z = 3.638, p < .001).</Paragraph> <Paragraph position="8"> In 36 as in 35, EMPATHY associates with the previous Cb, ie. Taroo. This follows from the ordering of the Cf and hearers' preferences for a CONTINUE interpretation. Note that the interpretation of the last utterance in example 36d remains the same as that in example 35d, although in this case it is Taroo that shows Ziroo some old data in example 36c; nevertheless Ziroo is the one who does the explaining. It seems that inference from world knowledge and domain information alone is unlikely to predict which interpretations hearers will prefer. Inferential processes and discourse structure are mutually constraining (Joshi and Weinstein 1981; Nadathur and Joshi 1983; Hudson-D'Zmura 1988).</Paragraph> </Section> <Section position="4" start_page="218" end_page="221" type="sub_section"> <SectionTitle> 5.2 Summary </SectionTitle> <Paragraph position="0"> We proposed a discourse rule of ZERO TOPIC ASSIGNMENT and showed that ZTA is conditioned by the rules and constraints of centering theory: (1) ZTA only applies to discourse entities that were previously the Cb; (2) ZTA is constrained to cases where the only transition available otherwise would be a RETAIN.</Paragraph> <Paragraph position="1"> ZTA arises from the interaction between preferences for CONTINUE transitions (Rule 2) and the fact that Cbs are often zeros (Rule 1). The interaction of these two factors leads to the speculation that when the Cb is realized by a pronoun in a lower ranked Cf position, which gives rise to a RETAIN transition state, that this type of transition is inherently ambiguous. Since different factors contribute to the salience of discourse entities, such as 'subjecthood' and 'pronominalization' (Grosz, Joshi, and Weinstein unpublished), conflicting defaults can arise when these are in conflict with one another.</Paragraph> <Paragraph position="2"> This may be especially true in Japanese since another factor that should contribute to Cf ranking, word order, is not present whenever zeros are involved.</Paragraph> <Paragraph position="3"> These examples highlight the relation between centering and global coherence in discourse. A RETAIN is proposed as a way for a speaker to mark a coordinated transition to a new topic; it predicts a shift (Grosz, Joshi, and Weinstein unpublished; Brennan, Friedman, and Pollard 1987). However, the way in which centering SHIFT transitions are related to larger structures in discourse has not been specified. If a Computational Linguistics Volume 20, Number 2 shift functions as a boundary between segments (Walker 1993b), then the hearer's application of ZTA means that the hearer is assuming that the next utterance will be part of the same discourse segment. In contrast, a hearer's assumption that the current centering transition is a RETAIN means that the hearer assumes that the next utterance will begin a new discourse segment.</Paragraph> <Paragraph position="4"> The relationship between segmentation and hearer's preferences for ZTA or RETAIN interpretations may be affected by other discourse factors. Among these factors, intonation may indicate whether the current utterance should be taken as initiating a new segment and predicting a SHIFT, or continuing the previous one (Silverman 1987; Cahn, 1992; Swerts and Geluykens 1992; Walker and Prince 1994). Another factor may be the inferred relationship that holds between adjacent utterances such as whether it is possible to interpret (d) as Ziroo's reason for having done (c) (Hobbs 1985b).</Paragraph> <Paragraph position="5"> However this is clearly not the only factor, or even necessarily the dominant one, as we have demonstrated. Future research must provide additional constraints on when ZTA is applicable.</Paragraph> </Section> </Section> <Section position="10" start_page="221" end_page="223" type="metho"> <SectionTitle> 6. Related Research </SectionTitle> <Paragraph position="0"> Other researchers working on the interpretation of anaphors have focused on the role of inference from world knowledge (Hobbs 1985b, 1979). While it is important to elucidate the information needed for inference and the type of inferential process involved in discourse interpretation, it is clear from our examples that syntactic realization has a strong effect on the interpretive process and provides processing constraints on inferential processes. We have focused on the interaction between syntax and inference.</Paragraph> <Paragraph position="1"> Our treatment of Japanese discourse phenomena builds on earlier work by Kuno (Kuno 1972, 1973, 1987, 1989). Our Cf ranking is consistent with Kuno's Empathy and Topic Hierarchies and we incorporate a number of Kuno's observations on the function of the grammatical topic marker wa and the role of zeros. We have also incorporated Kuno's notion of EMPATHY by using EMPATHY in the Cf ranking (Kuno 1976a; Kuno and Kaburaki 1977).</Paragraph> <Paragraph position="2"> In recent work, Kuno proposes an algorithmic account of the interpretation of zeros. He claims that there are two types of zero pronouns, PSEUDO-ZERO-PRONOUNS and REAL-ZERO-PRONOUNS (Kuno 1989). REAL-ZERO-PRONOUNS are supposed to have a wa-marked NP or a presentational NP as an antecedent (Yoshimoto 1988). PSEUDO-ZERO-PRONOUNS are actually examples of deletion, and must follow the same order and the same syntactic function as their source NPs. They must obey constraints on deletion such as Kuno's Pecking Order of Deletion Principle: Delete less important information first and more important information last. According to Kuno, the position just to the left of the verb is the default focus position in Japanese, unless the verb itself is the focus. Therefore, since the verb in example 37b is the information focus, the zeros are assumed to be PSEUDO-ZERO-PRONOUNS.</Paragraph> <Paragraph position="3"> Example 37 a. Taroo ga Hanako ni nani o sita no desu ka.</Paragraph> <Paragraph position="4"> Taroo SUBJ Hanako to what OBJ do COMP COPULA Q What did Taroo do to Hanako? b. 0 0 kisu o sita no desu.</Paragraph> <Paragraph position="5"> kiss OBJ did COMP COPULA (lit.) (Taroo) did a kissing (to Hanako).</Paragraph> <Paragraph position="6"> The Taroo dislikes Ziroo interpretation would be an example of ZTA. However, we would predict that the Ziroo dislikes Hanako interpretation would be dispreferred, but this does not seem to be the case. Kuno's analysis treats the zero in the second reading of example 39b as a PSEUDO-ZERO-PRONOUN, which means that it must be interpreted as Hanako since Hanako was the object of the previous utterance.</Paragraph> <Paragraph position="7"> The interpretation of 39b that we would predict as possible would be the Ziroo dislikes Taroo (RETAIN), which native speakers rarely get. However, Kuno's analysis does not block this reading either; the zero in 39b could also be a REAL-ZERO-PRONOUN, with Taroo as its antecedent. Kuno says that this interpretation is dispreferred because of a preference for parallel interpretation (Grober, Beardsley, and Caramazza 1978; Sidner 1979; Kameyama 1988; Kuno 1989). We have claimed here and elsewhere (Brennan, Friedman, and Pollard 1987; Walker, Iida, and Cote 1990) that the preference for parallelism is an epiphenomenon of the ordering of the Cf and the preference for CONTINUE interpretations.</Paragraph> <Paragraph position="8"> Our account cannot explain the contrast between examples 38 and 39. It seems that what is at issue here is the fact that a set of discourse entities plus an open proposition such as X likes Y is what is discourse-old in these examples and not just a discourse entity (Prince 1981a, 1986, 1992). Our conclusion is that these enumerated lists and question-answer discourse segments may need an account of discourse center that is broader than that needed for discourse entities realized as NPS. Ktmo's constraints on Computational Linguistics Volume 20, Number 2 deletion must also be integrated to fully explain when entities or propositions in the discourse may be unexpressed.</Paragraph> <Paragraph position="9"> Our analysis also builds on an earlier analysis put forth by Kameyama (Kameyama 1985, 1986, 1988). Although Kameyama uses the centering terminology, her account is not based on the constraints and rules of centering theory as developed here and presented in (Grosz, Joshi, and Weinstein 1983, unpublished; Brennan, Friedman, and Pollard 1987). Kameyama proposed that the interpretation of zeros in Japanese depends on a default preference hierarchy of syntactic properties to be shared between the antecedent and the zero (Grober, Beardsley, and Caramazza 1978). Kameyama's account of zero interpretation consists roughly of a PROPERTY-SHARING CONSTRAINT, henceforth PS, and an EXPECTED CENTER ORDER, henceforth ECO, which may be paraphrased as follows: PROPERTY-SHARING CONSTRAINT: Two zero-pronouns in adjacent utterances, which co-specify the same Cb-encoding discourse entity, should share one of the following properties (in descending order of preference): 1) both IDENT and SUBJECT, 2) IDENT alone, 3) SUBJECT-alone, 4) both NONIDENT and NONSUBJECT, 5) NONSUBJECT alone, or 6) NONIDENT alone.</Paragraph> </Section> class="xml-element"></Paper>