File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1033_intro.xml
Size: 6,343 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1033"> <Title>Building Parallel LTAG for French and Italian Made-H616ne Candito</Title> <Section position="2" start_page="0" end_page="211" type="intro"> <SectionTitle> 1. Introduction Lexicalized Tree Adjoining Grammars </SectionTitle> <Paragraph position="0"> (LTAG) is a formalism integrating lexicon and grammar (Joshi, 87; Schabes et al, 88) : its description units are lexicalized syntactic trees, the elementary trees. The formalism is associated with a tree-rewriting process that links sentences with syntactic structures (in either way), by combining the elementary trees with two operations, adjunction and substitution.</Paragraph> <Paragraph position="1"> We assume the following linguistic features for LTAG elementary trees (Kroch & Joshi, 85; Abeili6, 91; Frank, 92): * lexicalization : elementary trees are anchored by at least one lexical item.</Paragraph> <Paragraph position="2"> * semantic coherence : the set of lexical items on the frontier of an elementary tree forms exactly one semantic unit t.</Paragraph> <Paragraph position="3"> * large domain of locality : the elementary trees anchored by a predicate contain positions for the arguments of the predicate.</Paragraph> <Paragraph position="4"> This last feature is known as the predicate-argument cooccurrence principle (PACP). Trees anchored by a predicate represent the minimal structure so that positions for all arguments are included. These argumental positions are extended either by receiving substitution or by adjoining at a node.</Paragraph> <Paragraph position="5"> Adjunction is used to factor out recursion.</Paragraph> <Paragraph position="6"> Figure 1 shows two elementary trees anchored by the French verbal form mange (eat-pres-sg), whose arguments in the active voice are a subject NP and a direct object NP 2. The first tree shows all arguments in canonical position. The second tree shows a relativized subject and a pronominal object (accusative clitic). The argumental nodes are numbered, according to their oblicity order, by an index starting at 0 in the unmarked case (active). So for instance in passive trees, the subject is number l, not 0.</Paragraph> <Paragraph position="7"> Though LTAG units used during derivation are lexicalized trees, the LTAG internal representation makes use of &quot;pre-lexicalized&quot; structures, that we will call tree sketches, whose anchor is not instantiated and that are shared by several lexicalized trees. The set of tree sketches thus forms a syntactic database, in which lexical items pick up the structures they can anchor.</Paragraph> <Paragraph position="8"> Families group together tree sketches that are likely to be selected by the same lexeme: the tree sketches may show different surface realization of the arguments (pronominal clitic realization, extraction of an argument, subject inversion...) or different diathesis --matchings between semantic arguments and syntactic Thus semantically void lexical forms (functional words) do not anchor elementary trees on their own. And words composing an idiomatic expression are multiple anchors of the same elementary tree.</Paragraph> <Paragraph position="9"> 2 The trees are examples from a French LTAG (Abeill6, 91), with no VP node (but this is irrelevant here). The ,1, means the node must receive substitution. The * means the node must adjoin in another tree.</Paragraph> <Paragraph position="10"> functions-- (active, passive, middle..) or both. The lexical forms select their tree sketches by indicating one or several families, and features. The features may rule out some tree sketches of the selected family, either because of morphological clash (eg. the passive trees are only selected by past participles) or because of idiosyncrasies. For instance, the French verb peser (to weight) can roughly be encoded as selecting the transitive family, but it disallows the passive diathesis.</Paragraph> <Paragraph position="11"> It remains that tree sketches are large linguistic unit. Each represents a combination of linguistic descriptions that are encoded separately in other formalisms. For instance, a tree sketch is in general of depth > 1, and thus corresponds to a piece of derivation in a formalism using CF rewrite rules (cf (Kasper et al, 95) for the presentation of an LTAG as a compiled HPSG).</Paragraph> <Paragraph position="12"> This causes redundancy in the set of tree sketches, which makes it difficult to write or maintain an LTAG. Several authors (Vijay-Shanker et al, 92- hereafter (VSS92)- ; Becker, 93; Evans et al, 95) have proposed practical solutions to represent in a compact way an LTAG. The idea is to represent canonical trees using an inheritance network and to derive marked syntactic constructions from base tree sketches using lexico-syntactic rules.</Paragraph> <Paragraph position="13"> (Candito, 96), building on (VSS92), defines an additional layer of linguistic description, called the metagrammar (MG), that imposes a general organization for syntactic information and formalizes the well-formedness of lexicalized structures. MG not only provides a general overview of the grammar, but also makes it possible for a tool to perform automatically the combination of smaller linguistic units into a tree sketch.</Paragraph> <Paragraph position="14"> This process of tree sketch building is comparable to a context-free derivation - in the generation way- that would build a minimal clause. A first difference is that CF derivation is performed for each sentence to generate, while the tree sketches are built out of an MG at compile time. Another difference is that while CF derivation uses very local units (CF rules), MG uses partial descriptions of trees (Rogers et Vijay-Shanker, 94) more suitable for the expression of syntactic generalizations.</Paragraph> <Paragraph position="15"> MG offers a common, principle-based frame for syntactic description, to fill in for different languages or domains. In section 2 we present the linguistic and formal characteristics of MG (in a slightly modified version), in section 3 the compilation in an LTAG, and in section 4 we describe the instantiation of the MG for French and Italian. Finally we give some possible applications in section 5.</Paragraph> </Section> class="xml-element"></Paper>