File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/c88-2159_intro.xml
Size: 4,059 bytes
Last Modified: 2025-10-06 14:04:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2159"> <Title>Identifying Zero Pronouns in Japanese Dialogue</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> An approach is proposed to automatically analyze Japanese dialogue containing zero pronouns, the most frequent type of anaphora which corresponds in fimction to personal pronouns in English. Zero pronoun is defined as an obligatory case noun phrase that is not expressed in the utterance but can be understood through other utterances in the discourse, context, or out-of-context knowledge.</Paragraph> <Paragraph position="1"> Gaps identifiable by syntactico-semantic means, such as those in relative clauses and a certain type of subordinate verb phrase, are excluded. The input discourse is conversation carried out in Japanese by typing at computer terminals, a type of conversation which has been proved to have the thndamental characteristics common to telephone conversation (Arita et al. 1987).</Paragraph> <Paragraph position="2"> The key idea of the model is topic, something being talked about in the discourse. This notion derives from the study of theme and theme by the Prague School (Firbas 1966). In the following, it is discussed that mainly non-human zero pronouns can be identified by means of topic, and, to do so, a discourse structure on the basis of recursively appearing topics is formalized. Other zero pronouns, mainly human ones, are identified using cognitive and sociolinguistie information conveyed by honorific, deictic, and speech-act predicates as to how the omitted cases are related to the speaker or hearer. The co-occurence restriction between subject and predicate that expresses a mental activity is also utilized. Finally, the interaction among these different factors in zero pronoun identification is discussed, and a model integrating them is proposed. This is to constitute a part of a machine translation system being developed at the ATR which deals with Japanese-English telephone and inter-terminal dialogue.</Paragraph> <Paragraph position="3"> 2. Zero prm~oun's role in discourse An investigation of simulated Japanese inter-terminal dialogues (94 sentences, 2 dialogue sequences) and their English t~anslation has revealed that out of 53 occurrences of personal pronouns in the English translation, 51 correspond to zero pronouns in the original Japanese text.</Paragraph> <Paragraph position="4"> Though the size of the data is limited, this coincides well with our intuition about Japanese zero anaphora that it performs discourse-grammatical functions including those played by personal pronouns in English (for a discussion to the same effect, see Kameyama 1985).</Paragraph> <Paragraph position="5"> In the same Japanese dialogue data, out of 15 Zero pronouns coreferent with non-human antecedents, 14 refer to one of the current topics in the discourse. Out of 74 zero pronouns corresponding to the first and second persons, 55 can be identified by means of cognitive and sociolinguistic information in honorific, deictic, speech-act, and mental predicates. The other 19 examples were either set phrases for identifying the hearer, explaining one's intention, and responding, etc., or cases understandable only in terms of the total context and situation. Besides an approach based on heuristic rules, the only possible solution to these would be one with planning and/or script. I will here concentrate on the major portion of zero anaphora cases that are identifiable by topic continuity or predicate information as to honorificity, deixis, speech act, or mental activity.</Paragraph> <Paragraph position="6"> N.B. Unlike italian, Spanish, etc., in Japanese predicates grammatical information such as person, gender and number is not indicated morphologically. This is one of the reasons we must emphasize pragmatic and discourse-grammatical factors in retrieving information referred to by zero anaphora.</Paragraph> </Section> class="xml-element"></Paper>