File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2143_metho.xml

Size: 21,182 bytes

Last Modified: 2025-10-06 14:14:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2143">
  <Title>A Computational Model for Generating Referring Expressions in a Multilingual Application Domain</Title>
  <Section position="3" start_page="0" end_page="848" type="metho">
    <SectionTitle>
2 Referring expressions in
</SectionTitle>
    <Paragraph position="0"> multilingual pension forms Ore&amp;quot; work on the specification for the referring expressions component started from an analysis of the collected multilingual (English, German, Italian) corpus of texts containing instructions on how to fill out pension forms. From this study, a general typology of entities referred to in the domain emerged together with an indication of how such entities are typically expressed in the different languages (see figure 1). The classification includes: Specific entities. These are extensional entities: individuals or collections of individuals (plurals). In KL-One knowledge representation languages they are represented as instances in the A-box.</Paragraph>
    <Paragraph position="1"> Generic entities. These entities are intensional descriptions of classes of individuals and are often mentioned in administrative doc- null uments, since the entities (persons or inanimate objects) addressed in this kind of texts are not usually specific individuals in the mind of the public administrator but rather all the individuals that belong to a certain class, as in the following example: (l) Married women should send their marriage certificate.</Paragraph>
    <Paragraph position="2"> In K1,-()ne knowledge representation languages generic entities are represented as concepts in the T-box.</Paragraph>
    <Paragraph position="3"> Anchored entities. These are entities that, although generic in nature, can be interpreted us specific whet, they are considered in the specific communicative situation in which the actual applicant reads the instructions to complete the pension form he has in his hands. Consider {br example the following text: &amp;quot;The applicant has to provide all the requested information&amp;quot;. In this situation, the specific person who is reading the form instun,tares the generic applicant. All the entities directly related to the applicant or to the form itself can be considered anchored, as for example: the applicant's name, the applicant's spouse, any applicant's previous job, see,ion 3 of the h)rm,.... The plausibility of this anchoring operation is confirmed by the fact that the linguistic realization choices made for anchored entities (definite tbrms, singular indefinite forms, ...) resemble very much the linguistic choices made tbr specific entities.</Paragraph>
    <Paragraph position="4"> Further investigations on the corpus texts have been conducted to identify language-dependent referring phenomena and general heuristics for the choice of the most appropriate linguistic realization. In general, we found that language style has great influence on the realization choices. When an informal style is used (like in most English documents and in some recent Italian/German forms) the personal distance between the interlocutors (the citizen and the public institution) is reduced using direct references to interlocutors, by means of personal pronouns (&amp;quot;you&amp;quot;, &amp;quot;we&amp;quot;). When the language is more formal, impersonal forms or indirect references are preferred (&amp;quot;the applicant&amp;quot;, &amp;quot;INPS&amp;quot;, &amp;quot;DSS&amp;quot;).</Paragraph>
    <Paragraph position="5"> Apart from style differences, there do exist also differences in realization that depend on the output language. For example, in administrative forms, in full sentences, for entities anchored to the reader English and German typically use possessive noun phrases (like &amp;quot;your spouse&amp;quot;) whereas Italian prefers simple definite forms (e.g. &amp;quot;il centage&amp;quot; \[the spouse\]).</Paragraph>
  </Section>
  <Section position="4" start_page="848" end_page="852" type="metho">
    <SectionTitle>
3 The adopted approach
</SectionTitle>
    <Paragraph position="0"> The linguistic expressions that refer in the text. to tile (tomain entities have to fulfill several properties: null * they must allow the non-ambiguous identification of the entities 2 ; * they should avoid redundancies that could hamper lluency; * they should contribute to the cohesion of the text by signaling semantic links between portions of text; * they should conform to the formMity and politeness requirements imposed to the output texts.</Paragraph>
    <Paragraph position="1"> When we choose to realize a referring expression with an anaphora we fulfill a double fnnction: we introduce some form of economy for the reference, avoiding the repetition of a long linguistic expression, and we enhance the coherence of the text since we signal meaning relations (cohesive ties) between portions of the discourse.</Paragraph>
    <Paragraph position="2"> The choice of the correct referring expression depends on two major factors: (A) the cohesive ties that we want to signal to improve the cohesion of the text; (B) the semantic features that allow the identification of the object in the domain (distinguishing semantic features).</Paragraph>
    <Paragraph position="3"> Another relevant factor is the pragmatic setting of the discourse (formality and politeness).</Paragraph>
    <Paragraph position="4"> To decide on (A), data structures are maintained that keep track of the evolving textual context (discourse structure and focus history) and record the seato-cultural background of the reader 2In some genres the use of ambiguous references may be possible or desirable, for exantple in jokes, but in administrative genre clearness and unambiguity are the primary goals.</Paragraph>
    <Paragraph position="5">  are performed to verify whether the identity of the entity can be recovered from the context or whether there exist semantic relations with other cited entities that; are worth being signaled (e.g. cotnparative relations).</Paragraph>
    <Paragraph position="6"> Once the ties have been determined, the distinguishing semantic features are identified. These semantic features depend on the entity type whether generic, anchored or specific - and on the relationships between the entity and the context whether the entity is new with respect to the context (presenting) or its identity can be recovered (presuming). Figure 2 illustrates a fine grained distinction of semantic features whose combination specify how a referring expression can be built. This network of choices is an adaptation to the GIST application domain of the results presented in (Martin, 1992).</Paragraph>
    <Paragraph position="7"> '\['he total/partial opposition is used to distinguish references to sets of elements from references to portions of sets. The linguistic form of the expression also varies according to the type of speech act that is to be realized, and this justifies the asserting~questioning distinction.</Paragraph>
    <Paragraph position="8"> Entities may be presented as new in the discourse context through references composed by a nominal expression or a pronoun (presenting).</Paragraph>
    <Paragraph position="9"> A presupposed element (presuming) may belong to the cultural/social context, and therefore be described with a unique reference, or it may belong to the textual context. The presumingvariable option corresponds to a textual anaphora. In this case a pronoun or a definite expression can be used. In our system, pronominalization is decided according to new rules extending the Centering Model, as explained in the following section 3.3. Definite expressions are built selecting the appropriate determiner (the, this, that... ) and the information (head, modifiers) to put in the noun phrase. This latter information is determined through the algorithm explained in section 3.2.</Paragraph>
    <Section position="1" start_page="849" end_page="850" type="sub_section">
      <SectionTitle>
3.1 The global algorithm
</SectionTitle>
      <Paragraph position="0"> The submodule for the generation of referring expressions is called during the final stage of the text planning process, when the so called micro-planning (or sentence planning) takes place (Not and Pianta, 1995). The global algorithm implemented has been derived from the network of choices presented above, as emerging from the corpus analysis. The formal approach adopted proved to be particularly suitable to cope with multilinguality issues, since the tests performed at the various choice points can be easily customized according to the output language. The algorithm is activated on each object entity to be referred and accesses the following available contextual information: null Background - the cultural and social context of the reader. At present this is represented by a list of all the entities the reader is supposed to know (e.g. the Department for Social Security, the anchored entities);  RT - the rhetorical tree, specifying how the selected content units will be organized in the final text and which are the semant.ic relations between text spans that will be signaled to enhance the coherence; null AlreadyMentioned - the history of Mready mentioned entities; StylePars - the l)arameters that define l,he style of the output text,; FocusState - the state of the attention of tile reader, organized as (tetailed in section 3.3.</Paragraph>
      <Paragraph position="1"> To model the rhetorical structure of discourse we consider the Rhetorical Structure Theory as developed in (Mann and Thompson, 1!)87). According to this theory, each text can be seen as a sequence of clauses linked together by (semantic) relations. These relations may be grammaticMly, lexically or graphically signaled. About 20 such relations have t)een identified by (Mann and Thompson, t987), e.g. I!\]I,AI~OH, ATION, which occurs when one clause provides more details for a topic presented in the previous clause, C()N~ TI{AST which links two clauses describing similar situations differing in few respects, and so on.</Paragraph>
      <Paragraph position="2"> llere follows a sketch of the globM algorithm implemented (Not, 1995). ~\[b make the reading easier, labels in italics have been introduced to identify the steps of the algorithm corresponding to tile main choice points in figure 2.</Paragraph>
      <Paragraph position="3">  Prelimina.ry step: * (For English) if e iv an anchored entity treat it as if it was a specific entity in Background * (For Italian and German) if e is an anchored entity inside a concept description then treat it as a presenting of ~ generic entity with a nominM expression (goto presenting- nominal-generic) else treat it as if it w;m a specific entity in Back null else use a b;Lre nou\[\[ phrase (ItMian) use a bare noun phrase * e is referred to in a title (but is not anchored to the reader) or in a label use a bare noun phrase (singular or plural according to the nunlber of e) * e C- AlreadyMentioned U Background then \[presuming\]: if e C Background then \[unique\]: if e is-a interlocutor then \[interlocutor\]: if *formMity* = in\[orma.l null then use a pronoun Mse;,use a proper noun (if it exists) or a definite description else \[non-interlocuto@ (English, German) if *formality* = informal and e is anchored to the reader then use a noun phrase with the possessive adjective &amp;quot;your&amp;quot; else use a proper name or a definite description (Italian) use a proper name or a deftnite description else \[variable\]: attempt pronominalization using the algorithm described in section</Paragraph>
    </Section>
    <Section position="2" start_page="850" end_page="850" type="sub_section">
      <SectionTitle>
3.3 accessing FocusState and RT. If e is
</SectionTitle>
      <Paragraph position="0"> pronominalizable then \[pronominal\]: use a pronoun else \[nominal: build an anaphoric expression. Test FocusState to identify the ntost appropriate determiner for the noun phrase. Compute the head and the modifiers using the algorithm described in section 3.2.</Paragraph>
      <Paragraph position="1"> else \[presenting\]: if e stands for a generic person (collection of persons) without any specifi(: property then \[pronominal\]: use im indefinite prol!.oun null else \[nominal: build a noun phrase, choosing the appropriate linguistic form If e is a: - specific entity, build an indefinite singular description or an indefinite plural description according to the nutnher of e - generic entity, in ease: * e is a concept whose meaning is being defined by syntesis, use the bare singular term . e is a concept 1)eing defined through a listing of its components, use a definite singular noun phrase . e appears in a list inside a concept definition, (German, Itdian) use a bare singular or bare plural noun phrase (English) use a definite singular or definite plural noun phrase . e is in a question, use a singular indefinite noun phrase e e is used in procedural descriptions, null (Italian, German) use a definite phlrM description.</Paragraph>
      <Paragraph position="2"> (English) use a bare plural.</Paragraph>
    </Section>
    <Section position="3" start_page="850" end_page="851" type="sub_section">
      <SectionTitle>
3.2 Generating nominal expressions
</SectionTitle>
      <Paragraph position="0"> In this section we focus on the choice of the head and the modifiers tbr noun phrases. (Dale and Reiter, 1995) contains the following list of requirements for a referring expression to obey to Grice's Maxims of conversational irnplicature:  level and other lexically preferred classes whenever possible (Lexical Preference).</Paragraph>
      <Paragraph position="1"> l~equirement (4) suggests that the head of the noun phrase should be chosen among terms of common use or, more in general, among terms that the user is likely to know. In our domain, however, often technical terms can not be avoided since the precise type of document or legal requirement have to be specified. Therefore, for the choice of the head of non-anaphoric expressions the GIST system adopts the strategy of using the most specific superconcept of the entity that has a meaningful lexical item associated (e.g. the specific term &amp;quot;decree absolute&amp;quot; is used instead of the more basic term &amp;quot;certificate&amp;quot;).</Paragraph>
      <Paragraph position="2"> Requirements (1),..,(3) suggest that the modifiers in the noun phrase should not introduce unnecessary information that can hamper the text fluency and yield false implications. The task of selecting the correct modifiers for a non-anaphoric expression is not an easy task, since in the Knowledge Base attributive and distinguishing (restrictive) properties are mixed. In GIST, the semantic relations that are relevant in the definition of distinguishing descriptions have been identified through an accurate domain analysis.</Paragraph>
      <Paragraph position="3"> For example, we have chosen relations like haspartnership, owned-by or attribute-of, characterizing distinguishing descriptions like &amp;quot;the applicant's spouse&amp;quot; or &amp;quot;the applicant's estate&amp;quot;. When an anaphora occurs but a pronoun can not be used, a nominal anaphoric expression is built. The head and the modifiers included in the noun phrase have to allow the identification of the entity among all the ones active in the reader's attention (potential distractors). In GIST we adopt an algorithm which is a simplified variation of the one Dale and l:teiter call the &amp;quot;Incremental Algorithm&amp;quot; (Dale and Reiter, 1995): whenever a new nominal anaphoric expression has to be built, discriminant modifiers are added to the expression until the set of the potential distractors (contrast set) is reduced to an empty set.</Paragraph>
    </Section>
    <Section position="4" start_page="851" end_page="852" type="sub_section">
      <SectionTitle>
3.3 Generating pronouns
</SectionTitle>
      <Paragraph position="0"> For the generation of pronouns an extension to the Centering Model (Grosz et M., 1995) has been defined that captures how the rhetorical evolution of the discourse influences the flow of attention of the reader. The choice of this solution has emerged from the observation that anaphora plays two roles in the discourse: it is not sufficient that a pronoun identifies unambiguously its referent but it has to reinforce the coherence of the text as well, supporting the user's expectations.</Paragraph>
      <Paragraph position="1"> In the Centering Model for each utterance U,~ a list of forward looking centers, Cf(Un), made up of all the entities realized in the utterance, is associated. This list is ordered according to the likelihood for the elements of being the primary focus of the following discourse. The first element in the list is called the preferred center, Cp(U,~). Among the centers another significant entity is identified: the backward looking cen~er, Cb(Un). This represents the primary focus of Un and links the current sentence with the previous discourse.</Paragraph>
      <Paragraph position="2"> The basic constraint on center realization is formulated in the following rules: RULE 1 : If any element of Cf(U,~) is realized by a pronoun in U,+I then the Cb(U,~+I) must be realized by a pronoun also. (Grosz et al., 1995) RULE 1' : If an element in Cf(Un+I) is coreferent with Cp(U,~) then it can be pronominalized. (Kehler, 1993) These rules can be used to constrain pronominalization in the text generation process.</Paragraph>
      <Paragraph position="3"> The Centering Model was first conceived for English, a language where pronouns are always made explicit. But as soon as we consider languages that allow null pronominMization (like Italian) new extensions to the original model have to be designed in order to deal with pronouns with no phonetic content. For Italian, we defined the following rule (Not and Zancanaro, 1996) which is compatible with the results of empirical research presented in (Di Eugenio, 1995): RULE 1&amp;quot; : If the Cb of the current utterance (Cb(U,,+I)) is the same as the Cp of the previous utterance (Cp(U~)) then a null pronoun should be used. If, instead, Cb(U,~+I) # Cp(U,~) and Cb(U,,+I) = Cb(U,~) then a strong pronoun should be used.</Paragraph>
      <Paragraph position="4"> 3.3.1 The proposed extension to the Centering Model Unfortunately, the Centering Model does not capture completely the reader's flow of attention process since it fails to give an account of the expectations raised by the role the clause plays in the discourse. For example consider the following sentences: (2) a. If you are separated, b. \[your spouse\]i should send us \[this part of the form\]j properly filled in.</Paragraph>
      <Paragraph position="5"> c. \[They\]i should use \[the enclosed envelope\]k. d. ek does not need a stamp.</Paragraph>
      <Paragraph position="6"> According to the Centering rules it would not be possible to use a pronoun to realize ek since the main center of utterance d. (the envelope) is  different from the main center of utterance c. (the spouse). But the use of a definite noun phrase to refer back to the envelope would sonnd rather odd to a native speaker.</Paragraph>
      <Paragraph position="7"> Itowever, the rhetorical structure of the text, providing information on the semantic links between utterances, helps understanding how the content presentation progresses. Therefore, we claim that it can be used to explain exceptions to the Centering rules and used to define repairing strategies (Not and Zancanaro, 1996). The advantage of this solution is that it allows us to treat with a uniform approach different types of exceptions that in literature are solved with separated ad-hoc solutions (e.g. parallelism, empathy).</Paragraph>
      <Paragraph position="8"> For exa.inl)le, in (2) above sentence d. is an evident ELABORATION on the envelope that appears in sentence e. When elaborating the description of an object the focus of attention moves onto the objecL itself. Therefore, the rhetorical relation that links e. and d. signals that among the elements in Cf(c) the envelope is the best candidate to be the primary focus of the following sentence d. This means that the rhetorical information ('an &amp;quot;project&amp;quot; the default ordering of the elements in /;he potential focus list Cf(c) onto a new order tha~ reflects more closely the content progression.</Paragraph>
      <Paragraph position="9"> From a computational point of view, the resuiting algorithm h)r pronominalization can be sketched as follows. The reader's attentional state is recorded in two stacks: the Centers tlistory Stack and the BackwaTq Centers Stack collecting respectively the Cf and the Cb of the already produced utterances. Whenever a new utterance is processed, the corresponding Cf and Cb are pushed on the top of the two stacks. The Cf list is ranked according to the default ranking strategy: clause theme &gt; actor &gt; benefic.&gt; actee &gt; others possibly modified by a &amp;quot;projection&amp;quot; imposed by the rhetorical relation. Rules 1' (for English and German) and Rule 1&amp;quot; (for Italian) arc then used to decide wllethcr a pronoun ('an be used or not.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML