File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2159_metho.xml

Size: 22,067 bytes

Last Modified: 2025-10-06 14:12:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2159">
  <Title>Identifying Zero Pronouns in Japanese Dialogue</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. Topic-based identification
</SectionTitle>
    <Paragraph position="0"> 3.1. PSG treatment of topic and zero pronoun The Japanese topic has the following major characteristics: (i) The topic is marked with a postposition wa and usually, but not always, preposed. (ii) More than one topic can appear in a simple sentence. (iii) With a certain type of subordinates, the subordinate predicate is controlled obligatorily by a topicalized matrix subject, but not by an untopicalized one. (iv) The topic represents what is being talked about in the discourse.</Paragraph>
    <Paragraph position="1"> In the following an intrasentential treatment of (i) to (iii), a modified version of Yoshimoto (1987) is explained. It is based on Head-driven Phrase Structure Grammar (HPSG) by Pollard &amp; Sag (1987) and Japanese Phrase Structure Grammar (JPSG) by Gunji (1987).</Paragraph>
    <Paragraph position="2"> Topic is represented as a value in the TOPIC feature that corresponds to the semantics of topicalized NP(s). The TOPIC is a FOOT feature that derives from the lexical description of wa. To deal with multi-topic sentences, the value of TOPIC is a stack that enables embedding of topics. For the type of subordinate whose predicate is controlled by a topicalized matrix subject, the subordinate-head particle (to be more exact, ADV head) is given a feature specification to the effect that the subordinate subject unifies with a topicalized matrix subject, but not with an untopicalized one.</Paragraph>
    <Paragraph position="3"> This topic description along with other parts of the 779' fundamental grammar of Japanese was implemented on a unifica{ion-based parser built up by my colleagues Kiyoshi Kogure and Susumu Kat6 (Maeda et al. 1988).</Paragraph>
    <Paragraph position="4"> The anlysis of (l-l-a) is given as (l-l-b).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
\[TOPIC \[\[FIRST ?TOP\]
\[REST END\]\]\]\]
</SectionTitle>
    <Paragraph position="0"> N.B. &amp;quot;?&amp;quot; is a prefix for a tag-name representing a token identity of feature structures.</Paragraph>
    <Paragraph position="1"> Omitted obligatory case NPs, i.e. those which are specified in the lexical description of the predicate as SUBCAT values but are not found explicitly in the sentence, are represented as values in the SLASH, following HPSG and JPSG. The analysis result of (1-2-a) is</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="795" type="metho">
    <SectionTitle>
\[SE~4 \[\[RELN EXIST-I\]
\[OBJE ?X\]\]\]\]
</SectionTitle>
    <Paragraph position="0"> Here the SLASH feature represents that in (1-2-a) the subject is a zero anaphora. Following JPSG, subcategorized-for NPs are assigned to the category P (therefore, to be more exact, they are PPs), because all (at least written) Japanese case NPs are followed by postpositions.</Paragraph>
    <Paragraph position="1"> 3.2. Topic-driven discourse structure Based on the intrasentential specification of topicalized sentences given in the previous section, a discourse-level topic structure is formalized, with zero anaphora being identified at the same time.</Paragraph>
    <Paragraph position="2"> In (1), the zero pronoun &amp;quot;W' in A1-2 coincides with sightseeing tour, a topic in QI-1. However, a naive algorithm of finding the most recent topic fails because of the topics' recursive structure: the zero indirect object in  Q3-1 refers to the &amp;quot;higher&amp;quot; topic sightseeing tour in QI-1, not the &amp;quot;lower&amp;quot; one hiy6 in Q2-1.</Paragraph>
    <Paragraph position="3"> (1) Q1.1: S~ghtseeingtour w_.aa arimasu ka? Is there a sightseeing tour? Aul: Hai, A1-2: C/ arimasu.</Paragraph>
    <Paragraph position="4"> Yes, there is.</Paragraph>
    <Paragraph position="5"> Q2.1: ~ wa ikura desu ka? expense TOP how-much COP-POL QUEST How much does it cost? A2-1: ~ 5, O00-en desu.</Paragraph>
    <Paragraph position="6"> 5000-yen C/OP-POL (It costs) 5, 000 yen.</Paragraph>
    <Paragraph position="7"> Q3-1: Dewa, ~ sanka o m6sihomimasu.</Paragraph>
    <Paragraph position="8">  then participation oEJ reserve-PoL Then I would like to make a reservation for the tour. TDS, a discourse model with reeursively occurring topics which is based on the same unification parser as the intrasentential grammar, identifies zero pronouns as a by-product of structuring the discourse. Syntactically, TDS is composed of the following single basic structure: (2) Co --&amp;quot; 01 ... On (n &gt;= 1) The intrasentential analysis result of each sentence, except a multi-topic one, unifies with a C. ?Each C has a feature TOP that indicates a discourse-level topic value in distinction from TOPIC, an intrasentential topic feature. N.B. A sentence with n topics unifies with an a-time deep vertical tree in which a single C is dominated by another. The leaf node is a C whose TOP value is a stack with all the topics in the sentence, and each non-terminal node C has a TOP stack containing that of the immediately dominated C minus the first member. For example, a sentence with three topics tl, t2, t8 (in order of appearance) corresponds to the tree:</Paragraph>
    <Paragraph position="10"> In (2), the value of the TOP of each of the C1 ..... Cn on the right-hand side is a concatenation of its TOPIC value and the TOP value of the left-hand side C.</Paragraph>
    <Paragraph position="12"> N.B. The rule is stated in an extended version of PATR-II notation.</Paragraph>
    <Paragraph position="13"> &amp;quot;&lt; &gt;&amp;quot; is used to denote a fqature structure path, and &amp;quot;=&amp;quot; to denote a token identity relation between two feature structures.</Paragraph>
    <Paragraph position="14"> Between the first value of the TOP of Co and that of Ci a whole-part relation holds. This is stipulated by the knowledge base.</Paragraph>
    <Paragraph position="15"> The value of TOP of Ci is set as default to that of Ci_l:  :~Y &amp;quot;-::d&amp;quot; it is denoted that whenever the value of the leftband side feature structure is unspecified, it is set to the one on the right-hand side. The TOP value of the root C unifies with any feature structure, i.e. it is T.</Paragraph>
    <Paragraph position="16"> Sentences with a SLASH value are related to TDS by  the ibltowing Topic Supplementation Principle (TSP).</Paragraph>
    <Paragraph position="17"> Topic Supplementation Principle (IstVersion) 1. For a C whose TOP value is a stack &lt;tl .... , tin&gt; and whose SI,ASH value is a set {/)1 ..... Pn}, the SEM of each of P1 ..... Pn is set to one of tl ..... tin, without the SEM of two Ps being assigned to the same t, if the two are  unifiable. If none of the pairs are unifiable, then the rule does not apply.</Paragraph>
    <Paragraph position="18"> The analysis tree of discourse example (1) is shown as Figure l.. Sentences QI-I, ALl, A1-2, and Q3-1 share the common topic .sightseeing tour, and Q2-1 and A2-1 share hiy() (expense). The latter is a subtopic of the tbrmer. There are two syntactic possibilities tbr Q3-1's location: it can be either in coordination with QI-I, At-I, and A1.2, or with Q~.-I at)d A2-1. Itere the former are chosen as its coordinates because the knowledge base presents the infbrmation ~hat Q3.1's predicate mdsihotnu (reserve) is compatible vcith sightseeing tour, but not with hiy~ (expense). Note that, while discourse (1) is being analyzed, zero pronou~Js in At-2, A2-1, and Q3-1 are also identified. (The other '.~ero pronoun in Q3-1, i.e. the subject of the sentence, is lef~ unspecified here. Its identification needs ~peech act cal;egorization of sentences.) This topic-based approach is in contrast to Kameyama's ,Japanese version (Kameyama 1985, Kameyama 1986) of&amp;quot; tbcus-based spproach to anaphora by Grosz et al. 1983. In her framewock, subjecthood and predicate deixis play the principal role, and the fact that topic provides the most important clue to anaphora identification in actual spoken Japanese discourse is not utilized explicitly.</Paragraph>
    <Paragraph position="19"> ,-L3~ Extension of topic introduction One of the p~'ob\]ems with the topicobased approach is that topics rePSerred to by zero pronouns are not always e:~'pli('itiy marked by the topic postposition wa. Sometimes, the NPs a*'e never fi)und in discourse in s~rictly the same tbr~.,.~s as they a,'c ~'ecovered. To deal with all possible cases, ihrtt~er elaboration in the inter-field domain of semantics, p~~t_~matic~, and discourse grammar is needed. Here I will limit my attentio,l to cases analyzable by extending the (:urn'eat method.</Paragraph>
    <Paragraph position="20"> First, a certain type of series of words whose function is, like wa, to introduce topics into the discourse, such as no h5 ga, ni tuite desu ga, no ken desu l~,a, and no koto desu ga, are handled in the same way as wa both syntactically and discourse-grammatically.</Paragraph>
    <Paragraph position="21"> Second, more complicated cases of topic introduction sentence patterns are also treated.</Paragraph>
    <Paragraph position="22"> (3) Watasi no yPStzin de sanka o bibS-site iru I GEN friend COP participation OnJ want-PROGR mono ga iru n desu ga...</Paragraph>
    <Paragraph position="23"> person SBJ exist EXPL-POL INTRD A friend of mine wants to participate in the conference. (He ...) As illustrated in (3), the sentence pattern &lt;NP ga VEXISTENTIAL u/no desu ga&gt; is employed to implicitly introduce the NP as a topic into the discourse. To meet such cases, the lexical description of the topic-introductory ADV head ga is specified so that the SEM value of the subject of the subcategorizcd-fbr existential verb unifies with the (implicit) topic of the whole sentence.</Paragraph>
    <Paragraph position="24"> 4. Identification by means of predicate information 4.1. Honorific predicate Japanese has a rich grammatical system of honorlfics.</Paragraph>
    <Paragraph position="25"> Among them, expressions related to the discussion here are subject-honorific and object-honorific predicates. Subjecthonorific predicate is a form of predicate used to express respect to the person referred to by the subject of the predicate. Object-honorific predicate is used to express respect to the direct or indirect object of the predicate whose subject.-agent is the speaker or his/her in-group member. In conversation, the omitted subject of subject-honorific predicate is typically the hearer. And, conversely, the subject of this type of predicate is usually omitted when referring to the hearer, as in (4). This is evidently in order to avoid the redundancy, in case there is no one else worth paying respect to, of the speaker being explicitly indicated as subject while at the same time the subject identity is virtually limited to the speaker by the predicate's honorific information. Likewise, the direct or indirect object of object.-honorific predicates is typically the hearer and the subject is typically the speaker, and the two NPs are usually omitted when this holds, as in example (5).</Paragraph>
    <Paragraph position="26"> (4) C/ kaigi ni sanka-sarenai no nara, conference ()IM2 parl, ieipate-SSJltONlt-NEG COND  mury~ de ke/~k5 desu.</Paragraph>
    <Paragraph position="27"> free' Ooe all right COP-POL If you don't attend the conference, it will be free.</Paragraph>
    <Paragraph position="28"> (5) 0 C/ thzitu uketuke de ,;iry6syft o o~watasi simasu. that day reception 1,OC proceedings OBJ give-OBJIIONR-POL Proceedings will be given to you on the first day of the conference at the reception.</Paragraph>
    <Paragraph position="29"> ~E\[owever, Japanese honoiific predicate forms do not correspond to grammatical persons a.C/~ rigidly as the Enl&amp;quot;opean languages' verb inflec~ien. Tixe omitted subject of (4) and the omitted indirect t)bjeet of (5) may be someone else worthy of respect, and the omitted subject of&amp;quot; (5) may be the speaker's in-group member. A mechanism is needed which identifies the omitted subject of the subject-honorific predicate and the object of the object-honorific predicate with the hearer, a~d the omitted subject of the object-honorific predicate with the speaker by default, and otherwise (when specific information is given) identifies them with a person explicitly given in the context,.</Paragraph>
    <Paragraph position="30"> Lexical descriptions of honorific verbs and auxiliariez must meet the condition above. For example, the lexical description of a subject-honorific auxiliary reru is as follows (the feature specification depends on that for honorifics by Maeda et al. 1988)  N.B. Tile feature structure of the verbal stem of the auxiliary is given above. Conjugational endings are specified separately and are utilized in analyzing the auxiliary. The CTYPE value in the SUBCAT specifics the conjugation type eI' the subcategorizcd V, i.e. consonant-stem-type and suru4ype (Vs with other conjugation types are subcategorized-for by rareru, an allomorph of reru). The MODL is used to impose conditions on the possibility of mutual subcategorization between different ldnds of Vs. In order to meet the unorderedness of Japanese case phrases, the value of the SUBCAT feature is a set (Gunji 1987) instead of an ordered list adopted in the HPSG English gramrnar (Pollard &amp; Sag 1987). The set is expressed by a rule reader into its cm'responding possible ordered list descriptions.</Paragraph>
    <Paragraph position="31"> The semantic value of the subject (?X) is restricted by the PRAG feature (the feature for describing the pragznatic constraint) to be someone being respected by the speaker.  / When it is not filled by the analy,(~is depend'e;~i~ on explicit inlbrmation, it deihult~ to the speaker by means of&amp;quot; == d&amp;quot;. This lexical description is embedded into the total zero  pronoun identification mechanism by revising TSIJ: lopic Supplementation Principle (2nd Version) 1. For a C whose TOP value is a stack &lt;tj ..... tin&gt; ~t:a(i whose SLASH value is a set {P1 ..... Pn}, the gEM of each of P1 ..... Pn is set to one oft1 ..... tin, without the SEM of two Ps assigned to tim stone t, if the two are unifiable. If none of the pairs are unifiable, then the rule does not apply.</Paragraph>
    <Paragraph position="32"> 2. Non-specified S}~\]iY~ values of obligato~'y case NPs (if'  honorific, deictic, speech-act, and mental predicates arc set to their default values, i.e~ to the speaker or th~:~ hearer.</Paragraph>
    <Paragraph position="33"> Description of other subject-honorific and object-honorific auxiliaries and verbs are likewise given, and their zero pronouns are identified by means of TSP.</Paragraph>
    <Paragraph position="34"> N.B. For object-honorific auxiliaries and verbs, empathy degree is also specified. Sec Sections 4.2. and 5.</Paragraph>
    <Paragraph position="35"> 4.2. Deictic predictsre One of the major features of spoken Japanese discourse is its frequent use of&amp;quot; deictic predicates, i.e. forms of predicates which change according to the empathic relatio~ between tb.e persen~s involved. The most easily understood examples are go and come in English. Besides their cmmterparts iku and huru, Japanese has a trichotomous system of donatory verbs, inc. yaru (give), hureru (give), and morau (receive). Kurer~ is used when the receiver is Uhe speaker or his/her in-group member (e.g. his/her ihm\[iy)o Otherwise yarn is used ~o express give. These forras are also employed as ao.~iliarics on the same deictic condition when the action expressed by the main verb involves giving or receiving of laver. They appear frequently in spoken Japanese dialogue as constituents of speech-act~related complex predicates. :\[,'or example, (6) C/ C/ hotel no tehai wa site kureru no desu ~a? hotel GEN ~'eservation TOP do-RECFAV EXP!,-POL QUP,~ST Could you reserve a hotel \[or me? As in (6), the subject and indirect object of the auxiliary are typically the hearer slid speaker, respectively, and when this is the case, the subject and indirect object are usually omitteddeg I::\[owever, like those in honorific predicates, the omii.ted subjC/~,ct and indirect object of deict~c auxiliaries have rio fixed case values. They may be son,c: in=group member of the speaker or somebody (xther than the hearer. For example, the subject (the person(s) thai= reserves) of (6) may be the congress office exclusive of the hearer, and its indirect object (the person i~hat ~'eceives favor b:y the re~'~ervation) ,nay be the speaker's studen t. To deal with default and non-default cases o:~ ~ en,itted subjects anPSl indirect objects, the SEM values of these N:\[):~ in hureru's lexical deseripilon are restricted by the empathy vah~es in thr~ I\[~RAG features, amt their dJault values are given by means of., &amp;quot;=:d&amp;quot;deg The latter are de::~lt with in connection with TSP.</Paragraph>
    <Paragraph position="36">  PRA(\]'s featm'e stipulates that the speaker empathizes more wi~.h ?'( than with ?X.</Paragraph>
    <Paragraph position="37"> '\]?he ether deictie auxilimies and verbs are similarly t:oeated., ~7.7~, Speech Act Another important type of inibrmation in predicates is sQ~v:c.h m,L The type of speech act found to be pervasive in ;\]ap:me~;e dialogue is request. For all the examples in the colt.erred data of request expressions such as NP o o-negai ~;i~na,'m (~'ive me,D, &amp;quot;V ne\[,aem,:tzu ka? (cm~ i a,~k you t,...?) :,J~:&gt;.d i/ le !::tzdasai (please&gt;...), the omitted subject was the ~:U~Julr.er .'.t~:~d l;he omitLod indirect object was the hearer. '.,~e(:a,..ts~; these :7,cro p:,'onouns can be, depending on sltaatio~s, othe, than tt.~e first and second persons, the doL'm/t t:,~eatment adopted so far is needed. For example, in \[.i~.e fcata~r~ &gt;, &lt;&amp;rueture specification of the verb negai (in NP o ~,.negai simasu), the default value for the SEibject is set to gt~a spe.aker and that: tbr the indirect object to the hearer. 4./4. Me:~tal predicate The i~s\[, faet0r in identifying :\[,ere pronouns is the comilth,.,~ h-~ Japanese grammar that, with the sentenceib~i c(l:@:lgatlo~~ form (syfsi-kei) of predicates indleating * ,a,o~.i;~. i. m~tivit~es such as belief&gt; hoIm, desire, request, and \[~:~;ii~g, (rely the speaker is admitted as the referent of the ~mit~ed :;!:5:~jeeto This eond.ition :is easily specified in the !cxic~d des~:r{pi;ions of the constituents of I;he predicates. ,'~.x). b~~porf;ant related pheno~nenon is that, even with ~.:.~n~iiaga~h;n fi;~rms whose subject can grammatically be &lt;,.~C:b.er g:~a ~he speaker, examples in the collected data that ~;~..'.ts me~-.;,ioned i~ Sect;ion 2 were with speakers being e.,_~itted ~ ubjects with zery few exceptions. For exainple, all :.+~se~; :';.n the data of an aaxillary tat (want to), when fii{!.~w~Jd by a complex partlele no desu ga for moderating i,ho +'w.iderative expressien, we~'e with speakers being their ;~&lt;~bie&lt;:7~,&lt;_ &amp;quot;, though i;he subject of this form can be gi,+&lt;~e~i c ally other than the speaker.</Paragraph>
    <Paragraph position="38"> For ;~ach usages of mental predicates, default value i,ream~el,.t !ike that for honorific and deictlc predicates is  Let us see how discourse (7) with zero pronouns identifiable by either the topic or the honorific and deictic predleates are analyzed using the integrated model of TaP. (7) Ol:Syoniti no kinen k6en o syusyd ga suru first day GEN commemorative address OBJ premier SI3J do to Ossj o-kiki sits no desu ga honE6 desu ks? QUO hear-OBJHONK-PST INTRD be-true-POL QUEST \] have heard that a commemorative address is given by the Prime Minister on the first day. Is it true? Al:Iie, syusy6 ni wa dmu o-kosi itadakemasen ga, no premier OBJ2TOPcome-RECFAV-OBJtlONI~,-POL-NEG ADVS 0Sll,100Bj2 message o ~_adalLu kotoni natte imasu.</Paragraph>
    <Paragraph position="39"> message OBJ receive-OBJHONlt be-arrmlged-PoL No, unfortunately, the Prime Minister does not come.</Paragraph>
    <Paragraph position="40"> Howevur, we win receive a message from hi m.</Paragraph>
    <Paragraph position="41"> Now, the semantic/pragmatic representation corresponding to the second half of A1 with the object-honorific and deictie  Let us see how unspecified values ?Xl and ?X2 are specified (i.e. zero pronouns are identified) while maintaining the appropriateness of the PRAG feature structure. There are two possibilities fbr this: (1) ?X1 is identified with the topic syssyd (Prime Minister) according to the first rule of TSP.  (2) ?X2 is identified with syusyS. Among these, only (2) can fill both ?X1 and ?X2. That is, if ?X2 unifies with syusy5 and ?X1 with ?SPEAKER (this is further to be set to a global variable *ANSWERER* at the discourse representation level) by the default rule deriving from the lexical description of itadaku (see Sections 4.1 and 4.2). Here, there is nothing wrong with the PRAG features.</Paragraph>
    <Paragraph position="42"> On the other hand, if (1) is chosen and ?X1 is set to syusyO and ?X2 unifies with ?HEARER as default (as is stipulated by the lexical description of itadaku), then the PRAG has as one of its RESTRS members  Likewise, the zero pronouns &amp;quot;~SBJ&amp;quot; in QI and &amp;quot;OSBJ&amp;quot; of o-kosi itadakemasen in AI are identified with the speaker. The integration of the different approaches are illustrated in Figure 2. The figure reflects the ordered relation among the three components: what intrasentential syntax cannot disambiguate is handled by the topic structure, and then the rest goes to the predicate inibrmation component.</Paragraph>
    <Paragraph position="43"> N.B. Anaphora identification (beth zero and explicit anaphora) is made more effectively and widely if a model of objects appearing in the discourse with their linguistically expressed and default PRAG features is formalized. This was partly done by Maeda et al. 1988 by means of Discourse Representation Theory.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML