File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0206_metho.xml
Size: 26,569 bytes
Last Modified: 2025-10-06 14:07:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0206"> <Title>An Application of the Interlingua System ISS for Spanish-English Pronominal Anaphora Generation,</Title> <Section position="2" start_page="42" end_page="42" type="metho"> <SectionTitle> 1 System Architecture </SectionTitle> <Paragraph position="0"> The complete approach that solves and generates the anaphor is based on the scheme of Figure 1.</Paragraph> <Paragraph position="1"> Translation is carried out in two stages: from the source language to the Interlingua, and from the Interlingua into the target language. Modules for analysis are independent from modules for generation..In this paper, although we have only studied the Spanish and English languages, our approach is easily extended to other languages, i.e. multilingual system, in the sense that any analysis module can be linked to any generation module.</Paragraph> <Paragraph position="2"> As can be observed in Figure 1, there are three independent modules in the process of generation: Analysis, Interlingua and Generation modules.</Paragraph> </Section> <Section position="3" start_page="42" end_page="42" type="metho"> <SectionTitle> 2 Analysis module </SectionTitle> <Paragraph position="0"> The analysis is carried out by means of SUPAR (Slot Unification Parser for Anaphora resolution) system, presented in Femindez et al. (2000). SUPAR is a computational system focused on anaphora resolution. It can deal with several kinds of anaphora, such as pronominal anaphora, one-anaphora, surface-count anaphora and definite descriptions. In this paper, we focus on pronominal anaphora resolution and generation into the target language. In pronominal anaphora resolution in both the Spanish and English languages, the system has achieved an accuracy of 84% and 87% respectively.</Paragraph> <Paragraph position="1"> A grammar defined by means of the grammatical formalism SUG (Slot Unification Grammar) is used as input of SUPAR. A translator that transforms SUG rules into Prolog clauses has been developed. This translator will provide a Prolog program that will parse each sentence. SUPAR allows to carry out either a full or a partial parsing of the text, with the same parser and grammar. Here, partial parsing techniques have been used due to the unavoidable incompleteness of the grammar and the use of unrestricted texts (corpora) as inputs. These unrestricted corpora used as input for the partial parser contain the words tagged with their grammatical categories obtained from the output of a part-of-speech (POS) tagger. The word, as it appears in the corpus, its lemma and its POS tag (with morphological information) is supplied for each word in the corpus. The corpus is split into sentences before applying the parsing.</Paragraph> <Paragraph position="2"> The output of the parsing module will be the Slot Structure (SS) that stores the necessary information 2 for Natural Language Processing (NLP) problem resolution. This SS will be the input for the following module in which NLP problems (anaphora, extraposition, ellipsis, etc.) will be treated and solved.</Paragraph> <Paragraph position="3"> In Fernindez et al. (1998), a partial parsing 3 strategy that provides all the necessary information for resolving anaphora is presented. This partial parsing shows that only the following constituents are necessary for anaphora resolution: co-ordinated prepositional and noun phrases, pronouns, conjunctions and</Paragraph> </Section> <Section position="4" start_page="42" end_page="43" type="metho"> <SectionTitle> 2 The SS stores for each constituent the following </SectionTitle> <Paragraph position="0"> information: constituent name (NP, PP, etc.), semantic and morphologic information, discourse marker (identifier of the entity or discourse object) and the SS of its subconstituents.</Paragraph> <Paragraph position="1"> 3 It is important to emphasize that the system allows to carry out a full parsing of the text. In this paper, partial parsing with no semantic information is used in the evaluation of our approach.</Paragraph> <Paragraph position="2"> verbs, regardless of the order in which they appear in the text. The free words consist of constituents that are not covered by this partial parsing (e.g. adverbs).</Paragraph> <Paragraph position="3"> After applying the anaphora resolution module, a new Slot Structure (SS') is obtained. In this new structure the correct antecedent (chosen from the possible candidates) for each anaphoric expression will be stored together with its morphological and semantic information. SS' will be the input for the lnterlingua system.</Paragraph> </Section> <Section position="5" start_page="43" end_page="45" type="metho"> <SectionTitle> 3 Interlingua system (ISS) </SectionTitle> <Paragraph position="0"> As said before, the Interlingua system takes the SS of the sentence after applying the anaphora resolution module as input. This system, named lnterlingua Slot Structure (1SS), generates an interlingua representation from the SS of the sentence.</Paragraph> <Paragraph position="1"> SUPAR generates one SS for each sentence from the whole text and it solves intrasentential and intersententiai anaphora. Then, 1SS generates the interlingua representation of the whole text. This is one of the main advantages of 1SS because it is possible to generate intersentential pronominal anaphora.</Paragraph> <Paragraph position="2"> To begin with, 1SS splits sentences into clauses 4. To identify a new clause when partial parsing has been carried out, the following heuristic has been applied: H1 Let us assume that the beginning of a new clause has been found when a verb is parsed and a free conjunction is subsequently parsed. In this particular case, a free conjunction does not imply conjunctions that join co-ordinated noun and prepositional phrases. It refers, here, to conjunctions that are parsed in our partial parsing scheme.</Paragraph> <Paragraph position="3"> Once the text has been split into clauses, the next stage is to generate the interlingua representation for clauses. We have used a complex feature structure for each clause. In As can be observed in Figure 25 , the interlingua is a frame composed of semantic roles and features extracted from the SS of the clause. Semantic roles that have been used in this approach are the following: ACTION, AGENT, THEME and MODIFIER that correspond to verb, subject, object and prepositional phrases of the clause respectively. The notation we have used is based on the representation used in KANT interlingua. To identify these semantic roles when partial parsing has been carried out and no semantic knowledge is used, the following heuristic has been applied: H2 Let us assume that the NP parsed before the verb is the agent of the clause. In the same way, the NP parsed after the verb is the theme of the clause. Finally., all the PP found in the clause are its modifiers.</Paragraph> <Paragraph position="4"> 5 Only the relevant attributes of each semantic role appear in a simplified way in the picture. Additional attributes are added to the semantic roles in order to complete all the necessary information for the interlingua representation.</Paragraph> <Paragraph position="5"> In Figure 2 the following elements have been found: ACTION= 'were', AGENT = 'the boys of the mountains', THEME= ~ (it has not been found any NP after the verb) and MODIFIER = 'in the garden'. These elements are represented by a simple feature structure. Features are represented as attributes with their corresponding values.</Paragraph> <Paragraph position="6"> The semantic role ACTION has the following attributes: Verb with the value of the lemma of the verb; Number, Person and Tense (grammatical features) and Type with the type of the verb: impersonal, transitive, etc.</Paragraph> <Paragraph position="7"> 'The semantic role AGENT has the following fiitributes: Cat that contains the syntactic category of the constituent; Identifier with the value of the discourse marker; Head that contains the lemma of the constituent's head; Number, Gender and Person contain grammatical features of the constituent; MODIFIER that contains all the information about the modifiers (PP) of the NP, and Sem_Ref that contains semantic information about the constituent's head if this information is available. The semantic role THEME has the same attributes as the semantic role AGENT, i.e. the difference is that THEME is the object of the clause and AGENT is the subject.</Paragraph> <Paragraph position="8"> Finally, the semantic role MODIFIER has the following attributes: Cat that contains the syntactic category of the constituent; Identifier with the value of the discourse marker; Prep with the preposition of the constituent and ENTITY, which is the object of the PP and contains the same attributes as the THEME.</Paragraph> <Paragraph position="9"> One clause can have more than one MODIFIER depending on the number of PP that it has. It is important to emphasize that all this information is extracted from the SS of the constituents parsed in the clause.</Paragraph> <Paragraph position="10"> As said before, instead of representing the clauses independently, we are interested in the interlingua representation of the whole input text. With the global representation of the input text we will be able to generate intrasentential and intersentential anaphora. Furthermore, it will be possible to solve and generate coreference chains. Thereby, the scheme of Figure 2 is extended in order to represent all the discourse using the clauses as main units of this representation. In Figure 3 the interlingua representation of the whole text of the example On the left side of Figure 3 the new objects or entities of the discourse are represented.</Paragraph> <Paragraph position="11"> These objects are named ENTITIES and contain the following attributes: Cat, Identifier, Head, Number, Gender, Person and Sem_Ref, due to they can represent an AGENT, a THEME or an object in a MODIFIER.</Paragraph> <Paragraph position="12"> On the right side, the CLAUSES of the text are represented in a simplified way. They contain the semantic role. ACTION with its attributes and the semantic roles AGENT, THEME and MODIFIER that have appeared in the clause. These semantic roles are linked to the ENTITIES that they refer to. It also contains the identifier of the sentence in which the CLAUSE appears (Sentence lD) and the Conjunction that joints two or more CLAUSES in a sentence.</Paragraph> <Paragraph position="13"> In the picture, four ENTITIES and two CLAUSES can be distinguished. The ENTITIES are as follows: ENTITY 1 ('boy'), PRONOUN and contains the link to ENTITY 1) and THEME ('Z', the link to ENTITY 4).</Paragraph> <Paragraph position="14"> The coreference chain can be identified thanks to AGENTS of CLAUSE 1 and CLAUSE 2 ('the boys' and 'they') have their links to the same ENTITY. As can be seen, these links can occur between constituents of different clauses or different sentences. Then, the global system is able to generate intersentential anaphora and identify the coreferenee chains of the text.</Paragraph> </Section> <Section position="6" start_page="45" end_page="48" type="metho"> <SectionTitle> 4 Generation module </SectionTitle> <Paragraph position="0"> The Generation module takes the interlingua representation of the text as input and generates it into the target language. In this paper, we are only describing the generation of pronouns. The generation phase is split into two modules: syntactic generation and morphological generation. In the next two subsections they will be studied in detail. Although the approach presented here is multilingual, we have focused on the generation into the Spanish and English languages.</Paragraph> <Section position="1" start_page="45" end_page="47" type="sub_section"> <SectionTitle> 4.1 Syntactic generation </SectionTitle> <Paragraph position="0"> In syntactic generation the interlingua representation is converted by 'transformational rules' into an ordered surface-structure tree, with appropriate labeling of the leaves with target language grammatical functions and features.</Paragraph> <Paragraph position="1"> The basic task of syntactic generation is to order constituents in the correct sequence for the target language. However, the aim of this work is only the generation of pronominal anaphora into the target language, so we have only focused on the differences between the Spanish and English languages in the generation of the pronoun.</Paragraph> <Paragraph position="2"> These differences are what we have named discrepancies (a study of Spanish-English-Spanish discrepancies is showed in Peral et aL (1999)). In syntactic generation the following discrepancies can be found: syntactic discrepancies and Spanish elliptical zero-subject constructions.</Paragraph> <Paragraph position="3"> This discrepancy is due to the fact that the surface structures of the Spanish sentences are more flexible than the English ones. The constituents of the Spanish sentences can appear without a specific order in the sentence. In order to carry out a correct generation into English, we must firstly reorganize the Spanish sentence.</Paragraph> <Paragraph position="4"> Nevertheless, in the English-Spanish translation, in general, this reorganization is not necessary. Let us see an example with the Spanish sentence (2) A Pedro Io vi ayer.</Paragraph> <Paragraph position="6"> In (2), the object of the verb, A Pedro (to Peter), appears before the verb (in the position of the theoretically subject) and the subject is omitted (this phenomena is usual in Spanish and it will be explained in the next subsection). The PP A Pedro (to Peter) functions as an indirect object of the verb (because it has the preposition A (to)). We can find out the subject since the verb is in first person and singular, so the subject would be the pronoun Yo (1). Moreover, there is a pronoun, lo (him) that functions as complement of the verb vi (saw). This pronoun in Spanish refers to the object of the verb, Peter, when it is moved from its theoretical place after the verb (as it occurs in this sentence).</Paragraph> <Paragraph position="7"> As explained before, it is possible to identify the semantic roles (AGENT, ACTION, etc.) of the previous constituents in the CLAUSE applying a series of heuristics. Once the semantic roles of the constituents have been established, they will be stored in the interlingua representation. The generation into English will be a new clause in which the order of the constituents is the usual in English: AGENT, ACTION, THEME and MODIFIERS.</Paragraph> <Paragraph position="8"> (zero-pronouns) As commented before, the Spanish language allows to omit the pronominal subject of the sentences. These omitted pronouns are usually named zero-pronouns. While in other languages, zero-pronouns may appear in either the subject's or the object's grammatical position, (e.g.</Paragraph> <Paragraph position="9"> Japanese), in Spanish texts, zero-pronouns only appear in the position of the subject. In English texts, this sort of pronoun occurs far less frequently, as the use of them are generally compulsory in the language. Nevertheless, some examples can be found: &quot;Ross carefully folded his trousers and ~.climbed into bed&quot;. (The symbol ~ shows the position of the omitted pronoun).. Target languages with typical elliptical (zero) constructions corresponding to source English pronouns are Italian, Thai, Chinese or Japanese.</Paragraph> <Paragraph position="10"> In order to generate Spanish zero-pronouns into English, they must first be located in the text (ellipsis detection), and then resolved (anaphora resolution). At the ellipsis detection stage, information about the zero-pronoun (e.g.</Paragraph> <Paragraph position="11"> person, gender, and number) must first be obtained from the verb of the clause and then used to identify the antecedent of the zero-pronoun (resolution stage). The detection process depends on the knowledge about the structure of the language itself, which gives us clues to the use of each type of zero-pronoun.</Paragraph> <Paragraph position="12"> The resolution of zero-pronouns has been implemented in SUPAR. As we may work on unrestricted texts to which partial parsing is applied, zero-pronouns must also be detected when we do not dispose of full syntactic information. Once the input text has been split into clauses after applying the heuristic H1, the next problem consists of the detection of the omission of the subject from each clause.</Paragraph> <Paragraph position="13"> If partial parsing techniques have been applied, we can establish the following heuristic to detect the omission of the subject from each clause: H3 After the sentence has been divided into clauses, a noun phrase or a pronoun is sought, for each clause, through the clause constituents on the left-hand side of the verb, unless it is imperative or impersonal. Such a noun phrase or pronoun must agree in person and number with the verb of the clause.</Paragraph> <Paragraph position="14"> Sometimes, gender information of the pronoun can be obtained when the verb is copulative. For example, in: (3) Pedroj vio a Anak en el parque. Ok Estaba muy guapa.</Paragraph> <Paragraph position="15"> (Peterj saw Annk in the park. Shek was very beautiful.) In this example, the verb estaba (was) is copulative, so that its subject must agree in gender and number with its object whenever the object can have either a masculine or a feminine linguistic form (guapo: masc, guapa: fem). We can therefore get information about its gender from the object, guapa (&quot;beautiful&quot; in its feminine form) which automatically assigns it to the feminine gender so the omitted pronoun would have to be she rather than he.</Paragraph> <Paragraph position="16"> After the zero-pronoun has been detected, SUPAR inserts the pronoun (with its information of person, gender and number) in the position in which it has been omitted. This pronoun will be detected and resolved in the following module of anaphora resolution. After that, ISS generates the interlingua representation of the text.</Paragraph> <Paragraph position="17"> In the example (3), two CLAUSES are identified. In the second CLAUSE the zero-pronoun is detected (third person, singular and feminine -she-) and solved (third person, singular and feminine -Ann-). So the AGENT of this CLAUSE is the PRONOUN she and it has a link to the ENTITY Ann (the chosen antecedent).</Paragraph> <Paragraph position="18"> Now, the generation of Spanish zero-pronouns into English is easy because all the information that it is needed is located in the interlingua representatio.n. The English pronoun's information is extracted in the following way: number and person information are obtained from the PRONOUN and gender information is obtained from the Head of its antecedent.</Paragraph> </Section> <Section position="2" start_page="47" end_page="48" type="sub_section"> <SectionTitle> 4.2 Morphological generation </SectionTitle> <Paragraph position="0"> In the morphological generation we mainly have to treat and solve number and gender discrepancies in the generation of pronouns.</Paragraph> <Paragraph position="1"> This problem is generated by the discrepancy between words of different languages that express the same concept. These words can be referred to a singular pronoun in the source language and to a plural pronoun in the target language. For example, in English the concept .people is plural, whereas in Spanish is singular. (4) Tile stadium was full of peoplej. They~ were very angry withthe referee.</Paragraph> <Paragraph position="2"> (5) El estadio estaba /leno de gentei. ~stai estaba muy enfadada con el ~rbitro.</Paragraph> <Paragraph position="3"> In (4), it can be observed that the name people in English has been replaced with the plural pronoun they, whereas in Spanish (5) the name gente has been replaced with the singular pronoun dsta (it). Gender discrepancies also exist in the translation of other languages such as in the German-English translation.</Paragraph> <Paragraph position="4"> In order to take into account number discrepancies in the generation of the pronoun into the target language a set of morphological (number) rules is constructed.</Paragraph> <Paragraph position="5"> In the generation of the pronoun They into Spanish in the example (4), the interlingua representation has a PRONOUN ('they', third person and plural) that it is linked to the ENTITY ('police', plural). For the correct generation into Spanish the following morphological rule is constructed: pronoun + third..person + plural + antecedent (~olice') ~ ~sta (pronoun, third person, feminine and singular) The left-hand side of the morphological rule contains the interlingua representation of the pronoun and the right-hand side contains the pronoun in the target language.</Paragraph> <Paragraph position="6"> In the same way, a set of morphological rules is constructed in order to generate English pronouns. Next, an example of these rules is shown: pronoun + third_person + singular + antecedent ('policla~ ~ they (pronoun, third person and plural) English has less morphological information than Spanish. With reference to plural personal pronouns, the pronoun we can be translated into nosotros (masculine) or nosotras (feminine), you into ustedes (masculine/feminine), vosotros (masculine) or vosotras (feminine) and they into ellos or elias. Furthermore, the singular personal pronoun it can be translated into dl/dste (masculine) or ella/dsta (feminine). For example: (6) Women~ were in the shop. They~ were buying gifts for their husbands.</Paragraph> <Paragraph position="7"> (7) Las mujeresl estaban en la tienda. Ellasi estaban comprando regains pard sus maridos. In Spanish, the plural name mujeres (women) is feminine and is replaced by the personal pronoun elias (plural feminine) (7), whereas in English they is valid for masculine as well as for feminine (6). &quot;, These discrepancies do not always mean that Spanish anaphors bear more information than English one. For example, Spanish possessive adjectives ~ casa) do not carry gender information whereas English possessive adjectives do (his~her house).</Paragraph> <Paragraph position="8"> We can find similar discrepancies among other languages. For example, in the French-German translation, gender is assigned arbitrarily in both languages (although in French is not as arbitrarily as in German). The English-German translation, like English-Spanish, supposes a translation from a language with neutral gender into a language that assigns gender grammatically.</Paragraph> <Paragraph position="9"> As commented, it is important to emphasize that the omission of the pronominal subject is very usual in Spanish. If we want to stress the subject of a clause or distinguish between different possible subjects, we will have to write the pronominal subject. Otherwise, pronominal subject could be omitted. We are interested, however, in the correct generation of pronouns, and therefore, they will never be omitted.</Paragraph> <Paragraph position="10"> Thanks to the fact that our system solves only personal pronouns in third person, we have only studied gender discrepancies in the generation of the third person pronouns. The study has been divided into pronouns with subject role and pronouns with complement role. a) Pronouns with subject role. This kind of pronouns can be identified in the interlingua representation because they have the semantic role of AGENT in a CLAUSE. Their antecedents are established with the links to the ENTITIES.</Paragraph> <Paragraph position="11"> The main problem in the pronoun generation into English consists of the generation of pronoun it. If we have a pronoun with the following attributes: masculine, singular and third person in the interlingua representation, this can be generated into the Spanish pron6uns he or it. If the antecedent of the pronoun refers to a person, we will generate it into he. If the antecedent of the pronoun is an animal or a thing we will generate it into it. These characteristics of the antecedent can be obtained from the semantic information stored in its attribute Sem_Ref. A similar strategy is used to generate the pronouns she or it. With reference to plural personal pronouns: masculine/feminine, plural and third person, they are generated into the English pronoun they.</Paragraph> <Paragraph position="12"> In Figure 4, the set of morphological rules to treat gender discrepancies in English generation of pronouns is shown: pron + third_person + masculine + sing + antec (person) ~ he pmn + thirdperson + masculine + sing + antec (animal or thing) --) it pron + thirdperson + feminine + sing + antec (person) ..-) she pron + third_person + feminine + sing + antec (animal orthing) ~ it pron + thitrlperson + feminine/masculine + plural ~ they In Spanish generation, the main problem consists of the translation of pronoun it The set of morphological rules to treat this case is shown in Figure 5: pron + third_person + sing + antec (animal with masculine gender) ~ ~1 b) Pronouns with complement role. This kind of pronouns can be identified in the interlingua representation because they have the semantic role of THEME or they are in a MODIFIER in a CLAUSE.</Paragraph> <Paragraph position="13"> In the pronoun generation into English, the set of morphological rules of Figure 6 is applied: pron + third_parson + sing + antec (person with masculine gander) ~ him pron + third_person + sing + antec (person with feminine gender) ~ her pron + third_person + sing + antec (animal orthing) ...) it pron / thirdperson + plural + antec (person) ~ them Figure 6.</Paragraph> <Paragraph position="14"> In the process of generating a pronoun with the semantic role of THEME into Spanish, the set of morphological rules of Figure 7 is applied: pron / third_person / plural ~ les Figure 7.</Paragraph> <Paragraph position="15"> On the other hand, if the pronoun is in a MODIFIER, the rules of Spanish generation will be as shown in Figure 8: pron + third_person + sing + antec (masculine gender) ~ ~1 pron + third_person + sing + antec (feminine gender) ...) ella pron + third.person + plural + antec (masculine gender) ~ ellos pron + third_person + plural + antec (feminine gender) ~ elias</Paragraph> </Section> </Section> class="xml-element"></Paper>