Identifying Zero Pronouns in Japanese Dialogue 
Kei YOSHIMOTO 
A TR Interpreting Telephony Research Labs 
MID Tower Twin 2 I, 2-1-61 5hiromi Higashi-ku, Osaka 540, Japan 
Abstract 
Japanese dialogue containing zero pronouns is analyzed 
for the purpose of automatic Japanese-English conver- 
sation translation. Topic-driven Discourse Structure is 
formalized which identifies mainly non-human zero 
pronouns as a by-product. Other zero pronouns are handled 
using cognitive and sociolinguistic information in 
honorific, deictic, speech-act and mental predicates. These 
are integrated into the model. 
1. Introduction 
An approach is proposed to automatically analyze 
Japanese dialogue containing zero pronouns, the most 
frequent type of anaphora which corresponds in fimction to 
personal pronouns in English. Zero pronoun is defined as 
an obligatory case noun phrase that is not expressed in the 
utterance but can be understood through other utterances 
in the discourse, context, or out-of-context knowledge. 
Gaps identifiable by syntactico-semantic means, such as 
those in relative clauses and a certain type of subordinate 
verb phrase, are excluded. The input discourse is 
conversation carried out in Japanese by typing at computer 
terminals, a type of conversation which has been proved to 
have the thndamental characteristics common to telephone 
conversation (Arita et al. 1987). 
The key idea of the model is topic, something being 
talked about in the discourse. This notion derives from the 
study of theme and theme by the Prague School (Firbas 
1966). In the following, it is discussed that mainly non- 
human zero pronouns can be identified by means of topic, 
and, to do so, a discourse structure on the basis of 
recursively appearing topics is formalized. Other zero 
pronouns, mainly human ones, are identified using 
cognitive and sociolinguistie information conveyed by 
honorific, deictic, and speech-act predicates as to how the 
omitted cases are related to the speaker or hearer. The co- 
occurence restriction between subject and predicate that 
expresses a mental activity is also utilized. Finally, the 
interaction among these different factors in zero pronoun 
identification is discussed, and a model integrating them is 
proposed. This is to constitute a part of a machine 
translation system being developed at the ATR which deals 
with Japanese-English telephone and inter-terminal 
dialogue. 
2. Zero prm~oun's role in discourse 
An investigation of simulated Japanese inter-terminal 
dialogues (94 sentences, 2 dialogue sequences) and their 
English t~anslation has revealed that out of 53 occurrences 
of personal pronouns in the English translation, 51 
correspond to zero pronouns in the original Japanese text. 
Though the size of the data is limited, this coincides well 
with our intuition about Japanese zero anaphora that it 
performs discourse-grammatical functions including those 
played by personal pronouns in English (for a discussion to 
the same effect, see Kameyama 1985). 
In the same Japanese dialogue data, out of 15 Zero 
pronouns coreferent with non-human antecedents, 14 refer 
to one of the current topics in the discourse. Out of 74 zero 
pronouns corresponding to the first and second persons, 55 
can be identified by means of cognitive and sociolinguistic 
information in honorific, deictic, speech-act, and mental 
predicates. The other 19 examples were either set phrases 
for identifying the hearer, explaining one's intention, and 
responding, etc., or cases understandable only in terms of 
the total context and situation. Besides an approach based 
on heuristic rules, the only possible solution to these would 
be one with planning and/or script. I will here concentrate 
on the major portion of zero anaphora cases that are 
identifiable by topic continuity or predicate information as 
to honorificity, deixis, speech act, or mental activity. 
N.B. Unlike italian, Spanish, etc., in Japanese predicates 
grammatical information such as person, gender and number is not 
indicated morphologically. This is one of the reasons we must 
emphasize pragmatic and discourse-grammatical factors in 
retrieving information referred to by zero anaphora. 
3. Topic-based identification 
3.1. PSG treatment of topic and zero pronoun 
The Japanese topic has the following major 
characteristics: (i) The topic is marked with a postposition 
wa and usually, but not always, preposed. (ii) More than 
one topic can appear in a simple sentence. (iii) With a 
certain type of subordinates, the subordinate predicate is 
controlled obligatorily by a topicalized matrix subject, but 
not by an untopicalized one. (iv) The topic represents what 
is being talked about in the discourse. 
In the following an intrasentential treatment of (i) to 
(iii), a modified version of Yoshimoto (1987) is explained. 
It is based on Head-driven Phrase Structure Grammar 
(HPSG) by Pollard & Sag (1987) and Japanese Phrase 
Structure Grammar (JPSG) by Gunji (1987). 
Topic is represented as a value in the TOPIC feature 
that corresponds to the semantics of topicalized NP(s). The 
TOPIC is a FOOT feature that derives from the lexical 
description of wa. To deal with multi-topic sentences, the 
value of TOPIC is a stack that enables embedding of topics. 
For the type of subordinate whose predicate is controlled by 
a topicalized matrix subject, the subordinate-head particle 
(to be more exact, ADV head) is given a feature 
specification to the effect that the subordinate subject 
unifies with a topicalized matrix subject, but not with an 
untopicalized one. 
This topic description along with other parts of the 
779' 
fundamental grammar of Japanese was implemented on a 
unifica{ion-based parser built up by my colleagues Kiyoshi 
Kogure and Susumu Kat6 (Maeda et al. 1988). 
The anlysis of (l-l-a) is given as (l-l-b). 
(1-1-a) Sightseeingtour wa arimasu ka? 
sightseeing-tour TOP exist-POL QUEST 
is there a sightseeing tour? 
(I-I-b) 
\[\[HEAD \[\[POS(part-of-speech) V\] 
\[CTYPE(con jugation-type) NONC(noncenJugate)\] 
\[CFORM(conjugation-form) SENF(ssntsnce-flnsl ) \]\]\] 
\[SUBCAT {}\] 
\[SEbl \[\[RELN(rslatlon) S(surface)-REQUEST\] 
\[AGEN(agent ) ?SPEAKER\] 
\[SECP(recipient) ?HEARER\] 
\[OBJE(obJect) 
\[\[RELN INFORMIF\] 
\[AGEN ?HEARER\] 
\[RECP ?SPEAKER\] 
\[OBJE \[\[RELN EXIST-i\] 
\[OBJE ?TOP\[\[PARM(paramater) ?X\] 
\[RESTS(restrictio.) 
\[\[RELN SIGHISEEING_TOUR-t\] 
\[OBJE ?X\]\]\]\]\]\]\]\]\]\]\] 
\[TOPIC \[\[FIRST ?TOP\] 
\[REST END\]\]\]\] 
N.B. "?" is a prefix for a tag-name representing a token identity of 
feature structures. 
Omitted obligatory case NPs, i.e. those which are 
specified in the lexical description of the predicate as 
SUBCAT values but are not found explicitly in the 
sentence, are represented as values in the SLASH, 
following HPSG and JPSG. The analysis result of (1-2-a) is 
(1-2-b). 
(1-2-a)   arimasu. 
exist-POL 
There is. 
(1-2-b) 
\[\[HEAD \[\[POS V\]\[CTYPE MASU\]\[CFORM SENF\]\]\] 
\[SLASH {\[\[HEAD \[\[POS P(postpositio.)\] 
.\[FORM ga\] 
\[GRF(grammatieal -function) SUBJ(subject)\]\]\] 
\[SUBCAT {}\] 
\[SEM ?x\]\]}\] 
\[SE~4 \[\[RELN EXIST-I\] 
\[OBJE ?X\]\]\]\] 
Here the SLASH feature represents that in (1-2-a) the 
subject is a zero anaphora. Following JPSG, subcatego- 
rized-for NPs are assigned to the category P (therefore, to 
be more exact, they are PPs), because all (at least written) 
Japanese case NPs are followed by postpositions. 
3.2. Topic-driven discourse structure 
Based on the intrasentential specification of topicalized 
sentences given in the previous section, a discourse-level 
topic structure is formalized, with zero anaphora being 
identified at the same time. 
In (1), the zero pronoun "W' in A1-2 coincides with 
sightseeing tour, a topic in QI-1. However, a naive 
algorithm of finding the most recent topic fails because of 
the topics' recursive structure: the zero indirect object in 
780 
Q3-1 refers to the "higher" topic sightseeing tour in QI-1, not 
the "lower" one hiy6 in Q2-1. 
(1) Q1.1: S~ghtseeingtour w_.aa arimasu ka? 
Is there a sightseeing tour? 
Aul: Hai, 
A1-2: ¢ arimasu. 
Yes, there is. 
Q2.1: ~ wa ikura desu ka? 
expense TOP how-much COP-POL QUEST 
How much does it cost? 
A2-1: ~ 5, O00-en desu. 
5000-yen ¢OP-POL 
(It costs) 5, 000 yen. 
Q3-1: Dewa, ~ sanka o m6sihomimasu. 
then participation oEJ reserve-PoL 
Then I would like to make a reservation for the tour. 
TDS, a discourse model with reeursively occurring 
topics which is based on the same unification parser as the 
intrasentential grammar, identifies zero pronouns as a by- 
product of structuring the discourse. Syntactically, TDS is 
composed of the following single basic structure: 
(2) Co --" 01 ... On (n >= 1) 
The intrasentential analysis result of each sentence, 
except a multi-topic one, unifies with a C. ?Each C has a 
feature TOP that indicates a discourse-level topic value in 
distinction from TOPIC, an intrasentential topic feature. 
N.B. A sentence with n topics unifies with an a-time deep vertical 
tree in which a single C is dominated by another. The leaf node is a 
C whose TOP value is a stack with all the topics in the sentence, 
and each non-terminal node C has a TOP stack containing that of 
the immediately dominated C minus the first member. For 
example, a sentence with three topics tl, t2, t8 (in order of 
appearance) corresponds to the tree: 
C\[TOP <tl> \] 
I CETOP <tz, tl>\] 
I 
C\[TOP <t3, t2, tl>\] 
In (2), the value of the TOP of each of the C1 ..... Cn on the 
right-hand side is a concatenation of its TOPIC value and 
the TOP value of the left-hand side C. 
<i TOP> = append(</ TOPIC>, <0 TOP>) 
(1-<i<n) 
N.B. The rule is stated in an extended version of PATR-II notation. 
"< >" is used to denote a fqature structure path, and "=" to denote 
a token identity relation between two feature structures. 
Between the first value of the TOP of Co and that of Ci a 
whole-part relation holds. This is stipulated by the 
knowledge base. 
The value of TOP of Ci is set as default to that of Ci_l: 
<i TOP> =d <i-1 TOP> (2 -< i -< n) 
% 
J i/ 
El\[TOP <?tl sightseeing tour'7>\] C2\[TOP <?tl >\] C3\[TOP <?tl >\] C4\[TOP <?I1 >\] CT\[TOP <?tl >\] 
(QI-I) (AI.0 (AI-2) ..... ~-~-~-~-~ (Q3-I) 
C,i\[TOP <?t 2 hiy6', ?tl >\] C6\['i'OP <?t2, ?tl >\] 
(Q2-1) (A2-0 
Figure 1. TDS of Discourse Example (1) 
:~Y "-::d" it is denoted that whenever the value of the left- 
band side feature structure is unspecified, it is set to the 
one on the right-hand side. The TOP value of the root C 
unifies with any feature structure, i.e. it is T. 
Sentences with a SLASH value are related to TDS by 
the ibltowing Topic Supplementation Principle (TSP). 
Topic Supplementation Principle (IstVersion) 
1. For a C whose TOP value is a stack <tl .... , tin> and 
whose SI,ASH value is a set {/)1 ..... Pn}, the SEM of each 
of P1 ..... Pn is set to one of tl ..... tin, without the SEM of 
two Ps being assigned to the same t, if the two are 
unifiable. If none of the pairs are unifiable, then the 
rule does not apply. 
The analysis tree of discourse example (1) is shown as 
Figure l.. Sentences QI-I, ALl, A1-2, and Q3-1 share the 
common topic .sightseeing tour, and Q2-1 and A2-1 share 
hiy() (expense). The latter is a subtopic of the tbrmer. 
There are two syntactic possibilities tbr Q3-1's location: 
it can be either in coordination with QI-I, At-I, and A1.2, or 
with Q~.-I at)d A2-1. Itere the former are chosen as its 
coordinates because the knowledge base presents the 
infbrmation ~hat Q3.1's predicate mdsihotnu (reserve) is 
compatible vcith sightseeing tour, but not with hiy~ 
(expense). Note that, while discourse (1) is being analyzed, 
zero pronou~Js in At-2, A2-1, and Q3-1 are also identified. 
(The other '.~ero pronoun in Q3-1, i.e. the subject of the 
sentence, is lef~ unspecified here. Its identification needs 
~peech act cal;egorization of sentences.) 
This topic-based approach is in contrast to Kameyama's 
,Japanese version (Kameyama 1985, Kameyama 1986) of" 
tbcus-based spproach to anaphora by Grosz et al. 1983. In 
her framewock, subjecthood and predicate deixis play the 
principal role, and the fact that topic provides the most 
important clue to anaphora identification in actual spoken 
Japanese discourse is not utilized explicitly. 
,-L3~ Extension of topic introduction 
One of the p~'ob\]ems with the topicobased approach is 
that topics re£erred to by zero pronouns are not always 
e:~'pli('itiy marked by the topic postposition wa. Sometimes, 
the NPs a*'e never fi)und in discourse in s~rictly the same 
tbr~.,.~s as they a,'c ~'ecovered. To deal with all possible cases, 
ihrtt~er elaboration in the inter-field domain of semantics, 
p~~t_~matic~, and discourse grammar is needed. Here I will 
limit my attentio,l to cases analyzable by extending the 
(:urn'eat method. 
First, a certain type of series of words whose function is, 
like wa, to introduce topics into the discourse, such as no h5 
ga, ni tuite desu ga, no ken desu l~,a, and no koto desu ga, are 
handled in the same way as wa both syntactically and 
discourse-grammatically. 
Second, more complicated cases of topic introduction 
sentence patterns are also treated. 
(3) Watasi no y£tzin de sanka o bibS-site iru 
I GEN friend COP participation OnJ want-PROGR 
mono ga iru n desu ga... 
person SBJ exist EXPL-POL INTRD 
A friend of mine wants to participate in the conference. (He ...) 
As illustrated in (3), the sentence pattern <NP ga 
VEXISTENTIAL u/no desu ga> is employed to implicitly 
introduce the NP as a topic into the discourse. To meet 
such cases, the lexical description of the topic-introductory 
ADV head ga is specified so that the SEM value of the 
subject of the subcategorizcd-fbr existential verb unifies 
with the (implicit) topic of the whole sentence. 
4. Identification by means of predicate information 
4.1. Honorific predicate 
Japanese has a rich grammatical system of honorlfics. 
Among them, expressions related to the discussion here are 
subject-honorific and object-honorific predicates. Subject- 
honorific predicate is a form of predicate used to express 
respect to the person referred to by the subject of the 
predicate. Object-honorific predicate is used to express 
respect to the direct or indirect object of the predicate whose 
subject.-agent is the speaker or his/her in-group member. 
In conversation, the omitted subject of subject-honorific 
predicate is typically the hearer. And, conversely, the 
subject of this type of predicate is usually omitted when 
referring to the hearer, as in (4). This is evidently in order 
to avoid the redundancy, in case there is no one else worth 
paying respect to, of the speaker being explicitly indicated 
as subject while at the same time the subject identity is 
virtually limited to the speaker by the predicate's honorific 
information. Likewise, the direct or indirect object of 
object.-honorific predicates is typically the hearer and the 
subject is typically the speaker, and the two NPs are 
usually omitted when this holds, as in example (5). 
(4) ¢ kaigi ni sanka-sarenai no nara, 
conference ()IM2 parl, ieipate-SSJltONlt-NEG COND 
781 
mury~ de ke/~k5 desu. 
free' Ooe all right COP-POL 
If you don't attend the conference, it will be free. 
(5) 0 ¢ thzitu uketuke de ,;iry6syft o o~watasi simasu. 
that day reception 1,OC proceedings OBJ give-OBJIIONR-POL 
Proceedings will be given to you on the first day of the conference 
at the reception. 
~E\[owever, Japanese honoiific predicate forms do not 
correspond to grammatical persons a.¢~ rigidly as the 
Enl"opean languages' verb inflec~ien. Tixe omitted subject 
of (4) and the omitted indirect t)bjeet of (5) may be someone 
else worthy of respect, and the omitted subject of" (5) may be 
the speaker's in-group member. A mechanism is needed 
which identifies the omitted subject of the subject-honorific 
predicate and the object of the object-honorific predicate 
with the hearer, a~d the omitted subject of the object- 
honorific predicate with the speaker by default, and 
otherwise (when specific information is given) identifies 
them with a person explicitly given in the context,. 
Lexical descriptions of honorific verbs and auxiliariez 
must meet the condition above. For example, the lexical 
description of a subject-honorific auxiliary reru is as follows 
(the feature specification depends on that for honorifics by 
Maeda et al. 1988) 
(DEFI_EX re VSTEM () 
\[\[HEAD \[\[POS V\] 
\[crYPE VOW(vowel-st,)m-type, i .e. itidan)\] 
\[CFORM STEM\] 
\[MODL(modat) \[\[DEAC(doactlve) SHON(sbj--honorific)\]\]\]\]\] 
\[SUBCAF {\[\[IIEAD \[\[POS P\]\[FORM Ra\]\[GRF SUBJ\]\]\] 
\[SUBCAT {}\] 
\[SEM ?x\]\] 
\[\[HEAD \[\[POS VII 
\[CTYPE (:OR CONS-UV CON~-V SURU)\] 
\[CFORM VONG(vuice-negtive, i.e. tnizen,~I~ei)\] 
\[MODL IDEAC~\]\]\] 
\[SUBCAT {\[\[HEAD \[\[POS P\]\[FDRM ga\]\[GRF SUDJ\]\]\] 
\[SUBCAT {}\] 
\[SEM ?x\]\]}\] 
\[SEM ?SE~\]\]}\] 
\[SEM ?SEM\] 
\[PIRAG (p ragmatics) 
\[\[SPEAKER ?SPEAKER\] 
\[HEARER ?HEARER\] 
\[RESIRS(restrictions) {\[\[RELN RESPECI'\] 
\[AGEN ?SPEAKER\] 
\[OBJE ?X\]\]}\]\]\]\] 
(?X =d ?SPEAKER)) 
N.B. Tile feature structure of the verbal stem of the auxiliary is 
given above. Conjugational endings are specified separately and 
are utilized in analyzing the auxiliary. The CTYPE value in the 
SUBCAT specifics the conjugation type eI' the subcategorizcd V, i.e. 
consonant-stem-type and suru4ype (Vs with other conjugation 
types are subcategorized-for by rareru, an allomorph of reru). The 
MODL is used to impose conditions on the possibility of mutual 
subcategorization between different ldnds of Vs. In order to meet 
the unorderedness of Japanese case phrases, the value of the 
SUBCAT feature is a set (Gunji 1987) instead of an ordered list 
adopted in the HPSG English gramrnar (Pollard & Sag 1987). The 
set is expressed by a rule reader into its cm'responding possible 
ordered list descriptions. 
The semantic value of the subject (?X) is restricted by 
the PRAG feature (the feature for describing the pragznatic 
constraint) to be someone being respected by the speaker. 
782 
/ 
When it is not filled by the analy,(~is depend'e;~i~ on explicit 
inlbrmation, it deihult~ to the speaker by means of" == d". 
This lexical description is embedded into the total zero 
pronoun identification mechanism by revising TSIJ: 
lopic Supplementation Principle (2nd Version) 
1. For a C whose TOP value is a stack <tj ..... tin> ~t:a(i 
whose SLASH value is a set {P1 ..... Pn}, the gEM of each 
of P1 ..... Pn is set to one oft1 ..... tin, without the SEM of 
two Ps assigned to tim stone t, if the two are unifiable. If 
none of the pairs are unifiable, then the rule does not 
apply. 
2. Non-specified S}~\]iY~ values of obligato~'y case NPs (if' 
honorific, deictic, speech-act, and mental predicates arc 
set to their default values, i.e~ to the speaker or th~:~ 
hearer. 
Description of other subject-honorific and object- 
honorific auxiliaries and verbs are likewise given, and 
their zero pronouns are identified by means of TSP. 
N.B. For object-honorific auxiliaries and verbs, empathy degree is 
also specified. Sec Sections 4.2. and 5. 
4.2. Deictic predictsre 
One of the major features of spoken Japanese discourse 
is its frequent use of" deictic predicates, i.e. forms of 
predicates which change according to the empathic relatio~ 
between tb.e persen~s involved. The most easily understood 
examples are go and come in English. Besides their 
cmmterparts iku and huru, Japanese has a trichotomous 
system of donatory verbs, inc. yaru (give), hureru (give), and 
morau (receive). Kurer~ is used when the receiver is Uhe 
speaker or his/her in-group member (e.g. his/her ihm\[iy)o 
Otherwise yarn is used ~o express give. These forras are 
also employed as ao.~iliarics on the same deictic condition 
when the action expressed by the main verb involves giving 
or receiving of laver. They appear frequently in spoken 
Japanese dialogue as constituents of speech-act~related 
complex predicates. :\[,'or example, 
(6) ¢ ¢ hotel no tehai wa site kureru no desu ~a? 
hotel GEN ~'eservation TOP do-RECFAV EXP!,-POL QUP,~ST 
Could you reserve a hotel \[or me? 
As in (6), the subject and indirect object of the auxiliary 
are typically the hearer slid speaker, respectively, and 
when this is the case, the subject and indirect object are 
usually omitted° I::\[owever, like those in honorific 
predicates, the omii.ted subj¢~,ct and indirect object of deict~c 
auxiliaries have rio fixed case values. They may be son,c: 
in=group member of the speaker or somebody (xther than the 
hearer. For example, the subject (the person(s) thai= 
reserves) of (6) may be the congress office exclusive of the 
hearer, and its indirect object (the person i~hat ~'eceives 
favor b:y the re~'~ervation) ,nay be the speaker's studen t. 
To deal with default and non-default cases o:~ ~ en,itted 
subjects an£l indirect objects, the SEM values of these N:\[):~ 
in hureru's lexical deseripilon are restricted by the 
empathy vah~es in thr~ I\[~RAG features, amt their dJault 
values are given by means of., "=:d"° The latter are de::~lt 
with in connection with TSP. 
(DE;:E.EX k.re V~;FEN ( ) 
\[\[IH~AD \[\[POS V\]\[CiYPE VOW\]\[CFOD$,$ STEM\]\[MODL \[\[\[)ONr BEN\[\]\]\]\]\] 
\[SUBCAI {\[\[HEAt} \[\[POS PIll.OHM Ua\]\[GRF SUaJ\]\]\]\[SURCAT {}\]\[SEM ?X\]\] 
\[\[HEAD \[\[POS P\]\[FORM ni\]\[GRF OBJ2\]\]\]\[SUIJCAT {}\]\[SEM ?Y\]\] 
\[\[IlEAl) \[\[POS V\] 
\[CFORN rE(ta =f,w,.)\] 
\[MODL \[\[iJEAC PASS\]\[ASPC PIIOG\] 
\[OOIE (:OR mini) at:~m)\]\] ionTi--\]\]\] 
\[SUIJCAI {\[\[II(;AO \[\[POS P\]\[FORbl lia\]\[GllF SURJ\]\]\] 
\[SURCAT {}\] 
ISE~l ?X\]\]}\] 
\[SEN ?SI'M\] \]} \] 
!':';EM \[\[RELN GIVE-FAVOI{\] 
\[AGEN ?x\] 
\[RE(:P ?Y\] 
\[OC;,IE ?SEi,I\]\] \] 
\[PIIAG \[\]\[SPEAKER ?SPEAKER\] 
\[HEARFR ?HEARER\] 
iRESFD.<; {\[\[REI..N EMPAflIY-DEGREE\] 
\[MORE 7¥\] 
\[LESS ?x\]\])\]\]\]\] 
t ?X :(I ?READER) 
(?Y :'d ?SPEAK\[It) ) 
NoSe I,ike reru in Section 4.1, the verbal stem i,'\] specified. The 
PRA(\]'s featm'e stipulates that the speaker empathizes more 
wi~.h ?'( than with ?X. 
'\]?he ether deictie auxilimies and verbs are similarly 
t:oeated., 
~7.7~, Speech Act 
Another important type of inibrmation in predicates is 
sQ~v:c.h m,L The type of speech act found to be pervasive in 
;\]ap:me~;e dialogue is request. For all the examples in the 
colt.erred data of request expressions such as NP o o-negai 
~;i~na,'m (~'ive me,D, "V ne\[,aem,:tzu ka? (cm~ i a,~k you t,...?) 
:,J~:>.d i/ le !::tzdasai (please>...), the omitted subject was the 
~:U~Julr.er .'.t~:~d l;he omitLod indirect object was the hearer. 
'.,~e(:a,..ts~; these :7,cro p:,'onouns can be, depending on 
sltaatio~s, othe, than tt.~e first and second persons, the 
doL'm/t t:,~eatment adopted so far is needed. For example, in 
\[.i~.e fcata~r~ >, <&rueture specification of the verb negai (in NP 
o ~,.negai simasu), the default value for the SEibject is set to 
gt~a spe.aker and that: tbr the indirect object to the hearer. 
4./4. Me:~tal predicate 
The i~s\[, faet0r in identifying :\[,ere pronouns is the 
comilth,.,~ h-~ Japanese grammar that, with the sentence- 
ib~i c(l:@:lgatlo~~ form (syfsi-kei) of predicates indleating 
• ,a,o~.i;~. i. m~tivit~es such as belief> hoIm, desire, request, and 
\[~:~;ii~g, (rely the speaker is admitted as the referent of the 
~mit~ed :;!:5:~jeeto This eond.ition :is easily specified in the 
!cxic~d des~:r{pi;ions of the constituents of I;he predicates. 
,'~.x). b~~porf;ant related pheno~nenon is that, even with 
~.:.~n~iiaga~h;n fi;~rms whose subject can grammatically be 
<,.~C:b.er g:~a ~he speaker, examples in the collected data that 
~;~..'.ts me~-.;,ioned i~ Sect;ion 2 were with speakers being 
e.,_~itted ~ ubjects with zery few exceptions. For exainple, all 
:.+~se~; :';.n the data of an aaxillary tat (want to), when 
fii{!.~w~Jd by a complex partlele no desu ga for moderating 
i,ho +'w.iderative expressien, we~'e with speakers being their 
;~<~bie<:7~,<_ ", though i;he subject of this form can be 
gi,+<~e~i c ally other than the speaker. 
For ;~ach usages of mental predicates, default value 
i,ream~el,.t !ike that for honorific and deictlc predicates is 
etthe.ti,m: 
(DEFLEX ta VSTEM () 
\[\[HEAD \[\[POS V\] 
\[CTYPE X\] 
\[CFORM STEM\] 
\[COIl \[\[POS N\]\[FORM no\]\]\]\]\] 
\[SUBCAT (\[\[IIEAD \[\[POS P\]\[FORM ga\]\[GRF SUBJ\]\]\] 
\[SUBCAT {}\] 
\[SEN ?X\]\] 
\[\[ilEAD \[\[POS Viii 
\[SUBCAT {\[\[IIEAD \[\[POS P\]\[GRF SUBJ\]\]\] 
\[SUBCAT {}\] 
\[SEM ?X\]\]}\] 
\[SEX ?Y\]\]}\] 
\[SEM \[\[RELN DES\]IRE\] 
\[EXPfl(oxporioncer) ?X\] 
\[OBJE ?Y\]\]\] 
\[PRAG \[\[SPEAKER ?SPEAKER\] 
\[HEADER ?IIEARER\]\]\]\] 
(?X =d ?SPEAKER)) 
5. Irrtegration of the methods 
Let us see how discourse (7) with zero pronouns 
identifiable by either the topic or the honorific and deictic 
predleates are analyzed using the integrated model of TaP. 
(7) Ol:Syoniti no kinen k6en o syusyd ga suru 
first day GEN commemorative address OBJ premier SI3J do 
to Ossj o-kiki sits no desu ga honE6 desu ks? 
QUO hear-OBJHONK-PST INTRD be-true-POL QUEST 
\] have heard that a commemorative address is given by the 
Prime Minister on the first day. Is it true? 
Al:Iie, syusy6 ni wa dmu o-kosi itadakemasen ga, 
no premier OBJ2TOPcome-RECFAV-OBJtlONI~,-POL-NEG ADVS 
0Sll,100Bj2 message o ~_adalLu kotoni natte imasu. 
message OBJ receive-OBJHONlt be-arrmlged-PoL 
No, unfortunately, the Prime Minister does not come. 
Howevur, we win receive a message from hi m. 
Now, the semantic/pragmatic representation corresponding 
to the second half of A1 with the object-honorific and deictie 
verb itadaku is: 
(\[) \[\[SEM \[\[RELN RESULTATIVE\] 
lORd\[ \[\[RELN ARRANGED\] 
\[OBJE \[\[RELN RECEIVE-I\] 
\[AGEN ?XI\] 
\[RECP ?X2\] 
lOB J\[ MESSAGE'\]\]\]\]\]\]\] 
\[SLASH {\[\[HEAD \[\[POS P\]\[FOHM GA\]\[GRF SUBJ\]\]\] 
\[SUBCAT {}\] 
\[SEM ?Xl\]\] 
\[\[II£AD \[\[POS P\]\[FORM NI\]\[GRF OSJ2\]\]\] 
\[SUBCAT {}\] 
\[SEM ?X2\]\])\] 
\[PRAG \[\[SPEAKER ?SPEAKER\] 
\[IIEARER ?IIEARER\] 
\[RESTRS {\[\[RELN POLl?'\[\] 
\[AGEN ?SPEAKER\] 
\[OBJE ?HEARER\]\] 
\[\[RELN RESPECT\] 
\[AGEN ?SPEAKER\] 
\[OBJE ?X2\]\] 
\[\[RELN EMPATHY-DEGREE\] 
\[MORE ?Xl\] 
l:t.ESS ?X2\]\]}\]\]\]\] 
Let us see how unspecified values ?Xl and ?X2 are specified 
(i.e. zero pronouns are identified) while maintaining the 
appropriateness of the PRAG feature structure. There are 
two possibilities fbr this: (1) ?X1 is identified with the topic 
syssyd (Prime Minister) according to the first rule of TSP. 
795 
(2) ?X2 is identified with syusyS. Among these, only (2) can 
fill both ?X1 and ?X2. That is, if ?X2 unifies with syusy5 
and ?X1 with ?SPEAKER (this is further to be set to a 
global variable *ANSWERER* at the discourse 
representation level) by the default rule deriving from the 
lexical description of itadaku (see Sections 4.1 and 4.2). 
Here, there is nothing wrong with the PRAG features. 
On the other hand, if (1) is chosen and ?X1 is set to 
syusyO and ?X2 unifies with ?HEARER as default (as is 
stipulated by the lexical description of itadaku), then the 
PRAG has as one of its RESTRS members 
\[\[RELN EMPATHY-DEGREE\] 
\[MORE syusy6'\] 
\[LESS ?HEARER\]\] 
that is not unifiable with the following part of the 
knowledge base 
\[\[RELN EMPATHY-DEGREE\] 
\[MORE ?HEARER" 
\[LESS syusy6'\]\] 
................................. 
because of the stipulation \[\[RELN EMPATHY-DEGREE\] 
\[MORE ?X\]\[LESS ?Y\]\] A \[\[RELN EMPATHY-DEGREE\]\[MORE ?Y\]\[LESS ?X\]\] = 
1.. 
Likewise, the zero pronouns "~SBJ" in QI and "OSBJ" of 
o-kosi itadakemasen in AI are identified with the speaker. 
The integration of the different approaches are 
illustrated in Figure 2. The figure reflects the ordered 
relation among the three components: what intrasentential 
syntax cannot disambiguate is handled by the topic 
structure, and then the rest goes to the predicate 
inibrmation component. 
N.B. Anaphora identification (beth zero and explicit anaphora) is 
made more effectively and widely if a model of objects appearing in 
the discourse with their linguistically expressed and default PRAG 
features is formalized. This was partly done by Maeda et al. 1988 
by means of Discourse Representation Theory. 
6. Conclusion 
TDS (Topic-driven Discourse Structure), a Japanese 
dialogue discourse structure that resolves zero anaphora 
reference, was proposed on the basis of topic structure. 
Inlbrmation carried by predicates on honorificity, deixis, 
speech act and mental activities is also utilized i~ 
connection with TDS. The method conforms well with the 
way zero anaphora actually functions in spoken Japanese 
discourse. Of the zero pronouns in the inter-terminal 
conversation data, 79.8% were cases identifiable by this 
approach. 
Acknowledgment 
I would like to thank Dr. Akira K urematu, president of 
ATR Interpreting Telephony Research Labs, Dr. Teruaki 
Aizawa, head of Natural Language Understanding 
Department, and my other colleagues for their encourage- 
ment and thought-provoking discussions. 
Figure 2. Integration of the zero anaphora identification methods 
c ..................................................................................................................... 1 
O 1 : Syoniti no kinen kOen o syusy8 ga suru to Os,j o-kiki sita no desu ga hontO desu ka? 
A 1: Iie, syusy8 ni wa 0SBj o-kosi itadakemase~ g~a,¢s~ Oonj.~messageo itadaku -kotoni "~atte ~fin~t~ FrY ........ 
/ ."" / intrasentential identification \ , \] / I s f j / ," / 
based on: SYNTAX \ /" / 
~, ~ intersentential identificati x ~'" ...... --_/---- based on:TOPIC STRU!TURE ....... _~_ jJ 
-'-'-7-'~- based on: PREDICATE INFORMATION ...~I ~" L" ON HONORIFICITY, DEIXIS, etc. 
" " 
UNIFICATION-BASED GRAMMAR \] 
...................................................................................................................................... 
784 

References

Arita, H. et al., 1987, "Media ni izonsuru kaiwa no ySsiki." \[Media- 
dependent conversation manners\] WGNLMeetingReportC1-5, 
Information Processing Society of Japan. 

Firbas, J., 1966, "On defining the theme in functional sentence 
analysis." TravauxLinguistgques dePrague 1. Klincksieck: 

Grosz, B., A. Joshi & S. Weinstein, 1983, "Providing a unified account 
of definite noun phrases in discourse." Proceedings of the 21st 
Annual Meeting of the Association of Computational Linguistics. 

Gunji, T., 1987, Japanese Phrase Structure Grammar. Reidel. 

Kameyama, M., 1985, "Zero anaphora: the case of Japanese." Stanlbrd 
Universily Ph.D. Dissertation. 

Kameyama, M., ,1986, "A Property-sharing constraint in centering," 
Proceedings of the 24th Annual Meeting of the Association of 
Computational Linguistics. 

Kogure, K. et al., 1988, "A Method of analyzing Japanese speech act 
types." The 2nd International Conference on Theoretical and 
Methodological Issues in Machine Translation of Natural 
Languages. 

Maeda, H. et al., 1988, "Parsing Japanese honorifics in unification- 
based grammar." Proceedings of the 26th Annual Meeting of the 
Association of Computational Linguistics. 

Pollard, C. & I. Sag, 1987, Information-Based Syntax and Semantics. 
vol. I. CSLI Lecture Notes 13. 

Yoshimoto, K., 1987, "Identification of Zero Pronouns in Japanese." 
The XIVth International Congress of Linguists. Aug. 10, Berlin. 
