File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-2008_metho.xml
Size: 7,666 bytes
Last Modified: 2025-10-06 14:10:07
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2008"> <Title>ELLEIPO: A module that computes coordinative ellipsis for language generators that don't</Title> <Section position="4" start_page="115" end_page="115" type="metho"> <SectionTitle> 2 Some theoretical background </SectionTitle> <Paragraph position="0"> ELLEIPO is loosely based on Kempen's (subm.) psycholinguistically motivated syntactic theory of clausal coordination and coordinative ellipsis. It departs from the assumption that the generator's strategic (conceptual, pragmatic) component is responsible for selecting the concepts and conceptual structures that enable identification of discourse referents (except in case of syntactically conditioned pronominalization).</Paragraph> <Paragraph position="1"> The strategic component may conjoin two or more clauses into a coordination and deliver as output a non-reduced sequence of conjuncts.1 The concepts in these conjuncts are adorned with reference tags, and identical tags express coreferentiality.2 Structures of this kind serve as input to the (syn)tactical component of the generator, where they are grammatically encoded (lexicalized and given syntactic form) without any form of coordinative ellipsis. The resulting non-elliptical structures are input to ELLEIPO, which computes and executes options for coordinative ellipsis.</Paragraph> <Paragraph position="2"> ELLEIPO's functioning is based on the assumption that coordinative ellipsis does not result from the application of declarative grammar rules for clause formation but from a procedural component that interacts with the sentence generator and may block the overt expression of certain constituents. Due to this feature, ELLEIPO can be combined, at least in principle, with various grammar formalisms. However, this advantage is not entirely gratis: The module needs a formalism-dependent interface that converts gen1The strategic component is also supposed to apply rules of logical inference yielding the conceptual structures that underlie &quot;respectively coordinations.&quot; Hence, the conversion of clausal into NP coordination (such as Anne likes biking and Susi likes skating into Anne and Susi like biking and skating, respectively is supposed to arise in the strategic, not the (syn)tactical component of the generator. This also applies to simpler cases without respectively, such as John is skating and Peter is skating versus John and Peter are skating. The module presented here does not handle these conversions (see Reiter & Dale (2000, pp. 133-139) for examples and possible solutions.) 2Coordinative ellipsis is insensitive to the distinction between &quot;strict&quot; and &quot;sloppy&quot; (token- vs. type-)identity. erator output to a (simple) canonical form.</Paragraph> </Section> <Section position="5" start_page="115" end_page="117" type="metho"> <SectionTitle> 3 A sketch of the algorithm </SectionTitle> <Paragraph position="0"> This sketch presupposes and-coordinations of only n=2 conjuncts. Actually, ELLEIPO handles and-coordinations with ng12 conjuncts if, in every pair of conjuncts, the major constituents embody the same pattern of coreferences and contrasts.</Paragraph> <Paragraph position="1"> ELLEIPO takes as input a non-elliptical syntactic structure that should meet the following four canonical form criteria (see Fig. 1 for the input tree corresponding to example (7).</Paragraph> <Paragraph position="2"> (7) Susi horte dass Hans einen Unfall hatte Susi heard that Hans an accident had und dassf Hansfsterben konnte and that Hans die might 'Susi heard that Hans had an accident and might die' * Categorial (phrasal and lexical) nodes -bolded in Fig. 1 -- carry reference tags (presumably propagated from the generator's strategic component). E.g., the tag &quot;7&quot; is attached to the root and head nodes of both exemplars of NP Hans in Fig. 1, indicating their coreferentiality. For the sake of computational uniformity, we also attach reference tags to non-referring lexical elements. In such cases, the tags denote lexical instead of referential identity. For instance, the fact that the two tokens of subordinating conjunction dass 'that' in Fig. 1 carry the same tag, is interpreted by ELLEIPO as indicating lexical identity. In combination with other properties, this licenses elision of the second dass (see (7)). * The conjuncts are sister nodes separated by coordinating conjunctions; we call these configurations coordination domains. The order of the conjuncts and their constituents is defined.</Paragraph> <Paragraph position="3"> * Every categorial node of the input tree is immediately dominated by a functional node.</Paragraph> <Paragraph position="4"> * Each clausal conjunct is rooted in an S-node whose daughter nodes (immediate constituents) are grammatical functions. Within a clausal conjunct, all functions are represented at the same hierarchical level. Hence, the trees are &quot;flat,&quot; as illustrated in Fig. 1, and similar to the trees in German treebanks (NEGRA-II, TIGER).</Paragraph> <Paragraph position="5"> ELLEIPO starts by demarcating &quot;superclauses.&quot; Kempen (subm.) introduced this notion in his treatment of Gapping and LDG. An S-node dominates a superclause iff it dominates the entire sentence or a clause beginning with a subordinating conjunction (CNJ). In Fig. 1, the strings dominated by S1, S5 and S12 are super-Figure 1. Slightly simplified canonical form of the non-elliptical input tree underlying sentence (7). clauses. Note that S12 includes clause S13, which is not a superclause.</Paragraph> <Paragraph position="6"> Then, ELLEIPO checks all coordination domains for elision options, as follows: * Testing for forward ellipsis: Gapping (including LDG), FCR, or SGF. This involves inspecting (recursively for every S-node) the set of immediate constituents (grammatical functions) of the two conjuncts, and their reference tags.</Paragraph> <Paragraph position="7"> Complete constituents of the right-hand conjunct may get marked for elision, depending on the specific conditions listed in the Appendix.</Paragraph> <Paragraph position="8"> * Testing for BCR. ELLEIPO checks -- wordby-word, going from right to left -- the coreference tags of the conjuncts. As a result, complete or partial constituents in the right-hand periphery of the left conjunct may get marked for elision.</Paragraph> <Paragraph position="9"> The final step of the module is ReadOut. After all coordination domains have been processed, a (possibly empty) subset of the terminal leaves of the input tree has been marked for elision. In the examples below, this is indicated by subscript marks. E.g., the subscript &quot;g&quot; attached to esst 'eat' in (9b) indicates that Gapping is allowed. ReadOut interprets the elision marks and, in 'standard mode,' produces the shortest elliptical string(s) as output (e.g. (9c)). In 'demo mode,' it shows individual and combined elliptical options on user request. Furthermore, auch 'too' is added in case of &quot;Stripping,&quot; i.e. when Gapping leaves only one constituent as remnant.</Paragraph> <Paragraph position="10"> Example (10) illustrates a combination of Gapping and BCR, with the three licensed elliptical output strings shown in (10c). In (11), Gapping combines with BCR in the subordinate clauses. The fact that here, in contrast with (10), the subordinate clauses do not start their own superclauses, now licenses LDG. However, ReadOut prevents LDG to combine with BCR, which would have yielded the unintended string</Paragraph> </Section> class="xml-element"></Paper>