File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/c04-1044_abstr.xml
Size: 8,409 bytes
Last Modified: 2025-10-06 13:43:16
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1044"> <Title>Polarization and abstraction of grammatical formalisms as methods for lexical disambiguation</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In the context of lexicalized grammars, we propose general methods for lexical disambiguation based on polarization and abstraction of grammatical formalisms. Polarization makes their resource sensitivity explicit and abstraction aims at keeping essentially the mechanism of neutralization between polarities. Parsing with the simplied grammar in the abstract formalism can be used e ciently for ltering lexical selections. null Introduction There is a complexity issue if one consider exact parsing with large scale lexicalized grammars. Indeed, the number of way of associating to each word of a sentence a corresponding elementary structure|a tagging of the sentence| is the product of the number of lexical entries for each word. The procedure may have an exponential complexity in the length of the sentence. In order to lter taggings, we can use probabilistic methods (Joshi and Srinivas, 1994) and keep only the most probable ones; but if we want to keep all successful taggings, we must use exact methods. Among these, one consists in abstracting information that is relevant for the ltering process, from the formalism F used for representing the concerned grammar G. In this way, we obtain a new formalism Fabs which is a simpli cation of F and the grammar G is translated into a grammar abs(G) in the abstract framework Fabs. From this, disambiguating with G consists in parsing with abs(G). The abstraction is relevant if parsing eliminates a maximum of bad taggings at a minimal cost.</Paragraph> <Paragraph position="1"> (Boullier, 2003) uses such a method for Lexicalized Tree Adjoining Grammars (LTAG) by abstracting a tree adjoining grammar into a context free grammar and further abstracting that one into a regular grammar. We also propose to apply abstraction but after a preprocessing polarization step.</Paragraph> <Paragraph position="2"> The notion of polarity comes from Categorial Grammars (Moortgat, 1996) which ground syntactic composition on the resource sensitivity of natural languages and it is highlighted in Interaction Grammars (Perrier, 2003), which result from re ning and making Categorial Grammars more exible.</Paragraph> <Paragraph position="3"> Polarization of a grammatical formalism F consists in adding polarities to its syntactic structures to obtain a polarized formalism Fpol in which neutralization of polarities is used for controlling syntactic composition. In this way, the resource sensitivity of syntactic composition is made explicit. (Kahane, 2004) shows that many grammatical formalisms can be polarized by generalizing the system of polarities used in Interaction Grammars.</Paragraph> <Paragraph position="4"> To abstract a grammatical formalism, it is interesting to polarize it before because polarities allow original methods of abstraction.</Paragraph> <Paragraph position="5"> The validity of our method is based on a concept of morphism (two instances of which being polarization and abstraction) which characterizes how one should transport a formalism into another.</Paragraph> <Paragraph position="6"> In sections 1 and 2, we present the conceptual tools of grammatical formalism and morphism which are used in the following.</Paragraph> <Paragraph position="7"> In section 3, we de ne the operation of polarizing grammatical formalisms and in section 4, we describe how polarization is used then for abstracting these formalisms.</Paragraph> <Paragraph position="8"> In section 5, we show how abstraction of grammatical formalisms grounds methods of lexical disambiguation, which reduce to parsing in simpli ed formalisms. We illustrate our purpose with an incremental and a bottom-up method.</Paragraph> <Paragraph position="9"> In section 6, we present some experimental results which illustrate the exibility of the approach. null 1 Characterization of a grammatical formalism Taking a slightly modi ed characterization of polarized uni cation grammars introduced by (Kahane, 2004) we de ne a grammatical formalism F (not necessarily polarized) as a quadruple hStructF;SatF;PhonF;RulesFi: 1. StructF is a set of syntactic structures which are graphs1 in which each edge and vertex may be associated with a label representing morpho-syntactic information; we assume that the set of labels associated with F is equipped with subsumption, a partial order denoted v, and with uni cation, an operation denoted t, such that, for any labels l and l0, either ltl0 is not de ned, which is denoted ltl0 =?, or ltl0 is the least upper bound of l and l02; 2. SatF is a subset of StructF, which represents the saturated syntactic structures of grammatical sentences; 3. PhonF is a function that projects every element of SatF in the sentence that has this element as its syntactic structure.</Paragraph> <Paragraph position="10"> 4. RulesF is a set of composition rules between syntactic structures. Every element of RulesF is a speci c method for superposing parts of syntactic structures; this method de nes the characteristics of the parts to be superposed and the uni cation operation between their labels. Notice that we do not ask rules to be deterministic.</Paragraph> <Paragraph position="11"> The composition rules of syntactic structures, viewed as superposition rules, have the fundamental property of monotonicity: they add information without removing it. Hence, the de nition above applies only to formalisms that can be expressed as constraint systems in opposition to transformational systems.</Paragraph> <Paragraph position="12"> Let us give some examples of grammatical formalisms that comply with the de nition above by examining how they do it.</Paragraph> <Paragraph position="13"> In LTAG, StructLTAG represents the set of derived trees, SatLTAG the set of derived trees with a root in the category sentence and without non terminal leaves.</Paragraph> <Paragraph position="14"> the same time, ltl0 be not de ned; if the operation of uni cation is de ned everywhere, the set of labels is a semi-lattice.</Paragraph> <Paragraph position="15"> The projection PhonLTAG is the canonical projection of a locally ordered tree on its leaves. Finally, RulesLTAG is made up of two rules: substitution and adjunction. To view adjunction as a superposition rule, we resort to the monotone presentation of LTAG with quasi-trees introduced by (Vijay-Shanker, 1992).</Paragraph> <Paragraph position="16"> In Lambek Grammars (LG), StructLG is the set of partial proofs and these proofs can be represented in the form of incomplete Lambek proof nets labelled with phonological terms (de Groote, 1999).</Paragraph> <Paragraph position="17"> SatLG represents the set of complete proof nets with the category sentence as their conclusion and with syntactic categories labelled with words as their hypotheses.</Paragraph> <Paragraph position="18"> The projection PhonLG returns the label of the conclusion of complete proof nets.</Paragraph> <Paragraph position="19"> RulesLG is made up of two rules: a binary rule that consists in identifying two dual atomic formulas of two partial proof nets by means of an axiom link and a unary rule that consists in the same operation but inside the same partial proof net.</Paragraph> <Paragraph position="20"> Now, inside a formalism de ned as above, we can consider particular grammars: A grammar G of a formalism F is a subset G StructF of its elementary syntactic structures.</Paragraph> <Paragraph position="21"> A grammar is lexicalized if every element of G is anchored by a word in a lexicon. In LTAG, G is constituted of its initial and auxiliary trees. In LG, G is constituted of the syntactic trees of the formulas representing syntactic categories of words as hypotheses plus a partial proof net anchored by the period and including a conclusion in the category sentence.</Paragraph> <Paragraph position="22"> From a grammar G de ned in a formalism F, we build the set D(G) of its derived syntactic structures by applying the rules of RulesF recursively from the elements of G. The language generated by the grammar is the projection L(G) = PhonF(SatF\D(G)).</Paragraph> </Section> class="xml-element"></Paper>