File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1019_intro.xml

Size: 6,141 bytes

Last Modified: 2025-10-06 14:06:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1019">
  <Title>Bilingual Hebrew-English Generation of Possessives and Partitives: Raising the Input Abstraction Level</Title>
  <Section position="2" start_page="0" end_page="145" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> One of the first issues to address when selecting a syntactic realization component is whether its input specification language fits the desired application. Traditionally, syntactic realization components have attempted to raise the abstraction level of input specifications for two reasons: (1) to preserve the possibility of paraphrasing and (2) to make it easy for the sentence planner to map from semantic data to syntactic input As new applications appear, that cannot start generation from a semantic input because such an input is not available (for example re-generation of sentences from syntactic fragments to produce summaries (Barzilay et al., 1999) or generation of complex NPs in a hybrid template system for business letters (Gedalia, 1996)), this motivation has lost some of its strength. Consequently, &amp;quot;shallow surface generators&amp;quot; have recently appeared (Lavoie and Rambow, 1997) (Busemann and Horacek, 1998) that require an input considerably less abstract than those required by more traditional realization components such as SURGE (E1hadad and Robin, 1996) or KPML (Bateman, 1997).</Paragraph>
    <Paragraph position="1"> In this paper, we contribute to the debate on selecting an appropriate level of abstraction by considering the case of bilingual generation. We present results obtained while developing the HUGG syntactic realization component for Hebrew (Dahan-Netzer, 1997). One of the goals of this system is to design a generator with an input specification language as similar as possible to that of an English generator, SURGE in our case.</Paragraph>
    <Paragraph position="2"> The ideal scenario for bilingual generation is illustrated in Figure 1. It consists of the  1. Prepare an input specification in one language 2. Translate all the lexical entries (function words do not appear) 3. Generate with any grammar  In the example, the same input structure is used and the generator can produce sentences in both languages if only the lexical items are translated.</Paragraph>
    <Paragraph position="3"> Consider the following paraphrase in English for the same input: John gave Mary a book.</Paragraph>
    <Paragraph position="4"> The Hebrew grammar does not produce such a paraphrase, as there is no equivalent in Hebrew to the dative move alternation.</Paragraph>
    <Paragraph position="5"> In this case, we conclude that the input abstraction level is appropriate. In contrast, if the input had specified a structure such as indirect-object(prep=to/le, np--Mary), then it would not have been abstract enough to serve as a bilingual input structure.</Paragraph>
    <Paragraph position="6"> Similarly, the English possessive marker is very close to the Hebrew &amp;quot;construct state&amp;quot; (smixut): The King's palace Armon ha-melex Palace-cs the-king The following input structure seems, therefore, appropriate for both languages: lex possessor common 1 &amp;quot;palace&amp;quot; / &amp;quot;armon&amp;quot; \[leXdefinite yes&amp;quot;king&amp;quot;/&amp;quot;melex&amp;quot;\] There are, however, divergences between the use of smixut in Hebrew and of the possessive marker in English: Segovia's pupil The pupil of Segovia * talmyd segovyah talmyd Sel segovyah ? The house's windows The windows of the house Haionot ha-bayit ha-Halonot Sel ha-bayit Our goal, therefore, is to design an input structure that is abstract enough to let the grammar decide whether to use a possessive marker vs. an of-construct in English or a Sel-construct vs. a smixut-construction in Hebrew.</Paragraph>
    <Paragraph position="7"> A similar approach has been adopted in generation (Bateman, 1997), (Bateman et al., 1991) and in machine translation most notably in (Dorr, 1994). Dorr focuses on divergences at the clause level as illustrated by the following example: I like Mary Maria me gusta a mi Mary pleases me Dorr selects a representation structure based on Jackendoff's Lexical Conceptual Structures (LCS) (Jackendoff, 1990).</Paragraph>
    <Paragraph position="8"> In the KPML system, the proposed solution is based on the systemic notion of &amp;quot;delicacy&amp;quot; and the assumption is that lowdelicacy input features (the most abstract ones) remain common to the two target languages and high-delicacy features would differ. null In this paper, we focus on the input specification for complex NPs. The main reason for this choice is that the input for NPs in SURGE has remained close to English syntax (low abstraction). It consists of the following main sub-constituents: head, classitier, describer, qualifier and determiner. In previous work (Elhadad, 1996), we discuss how to map a more abstract domain-specific representation to the SURGE input  structure within a sentence planner. When moving to a bilingual generator, we have found the need for a higher level of abstraction to avoid encoding language-specific knowledge in the sentence planners. We specifically discuss here the following decisions: null * How to realize a possessive relation: John's shirt vs. the shirt of John * How to realize a partitive relation: all the kids vs. all of the kids In the rest of the paper, we first present basic contrastive data and existing analyses about possessives and partitives in Hebrew and English. We then present the input features we have designed to cover possessives and partitives in both languages and discuss how these features are used to account for the main decisions required of the realizer. We conclude by an evaluation of the bilingual input structure on a set of 100 sample input structures for complex NPs in the two languages and of the divergences that remain in the generated NPs. In conclusion, this bilingual analysis has helped us identify important abstractions that lead to more fluent generation in both languages.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML