File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1036_metho.xml
Size: 14,007 bytes
Last Modified: 2025-10-06 14:12:19
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-1036"> <Title>Literal derivation s NP VP N V NP</Title> <Section position="2" start_page="0" end_page="211" type="metho"> <SectionTitle> 1 Lexicalized Tree Adjoining Grammar </SectionTitle> <Paragraph position="0"> Most current linguistic theories give lexical accounts of several phenomena that used to be considered purely syntactic. The information put in the lexicon is thereby increased both in amount and complexity: for example, lexical rules in LFG (Kaplan and Bresnan, 1983), GPSG (Gazdar, Klein, Pullum and Sag, 1985), HPSG (Pollard and Sag, 1987), Comhinatory Categoriai Grammars (Steedman 1988), Karttunen's version of Categorial Grammar (Karttunen 1986, 1988), some versions of GB theory (Chomsky 1981), and Lexicon-Grammars (Gross 1984).</Paragraph> <Paragraph position="1"> We say that a grammar is 'lexicalized' if it consists of: 1 * a finite set of structures associated with each lexical item, which is intended to be the head of these structures; * an operation or operations for composing the structures. The finite set of structures define the domain of locality over which constraints are specified, and these are local with respect to their lexical heads. Context free grammars cannot be in general be lexicalized. However TAGs are 'naturally' lexicalized because they use an extended domain of locality (Schabes, Abeilld and Joshi, 1988). TAGs were first introduced by Joshi, Levy and Takahashi (1975), Joshi (1983-1985) and Kroch and Joshi (1985). It is known that Tree Adjoining Languages (TALs) are mildly context sensitive. TALs properly contain context-free languages. 2 A basic component of a TAG is a finite set of elementary trees, each of which defines domain of locality, and can be viewed as a minimal linguistic structure. The elementary structures are projections of lexical items which serve as heads. We recall that tree structures in TAGs correspond to linguistically minimal but complete structures: the complete argument structure in the case of a predicate, the maximal projection of a category in the case of an argument or an adjunct. If a structure has only one terminal, the terminal is the head of the structure; if there are several terminals, the choice of the head for a given structure is linguistically determined, e.g. by the principles off theory if the structure is off type. The head of NP is N, that of AP is A. S also has to be considered as the projection of a lexical head, usually V. As is obvious, the head must always be lexically present in all of the structures it produces.</Paragraph> <Paragraph position="2"> In the TAG lexicon each item is associated with a structure (or a set of structures), and that structure can be regarded as its category, linguistically speaking. Each lexical item has as many entries in the lexicon as it has Possible category or argument structures. We will now give some examples of structures that appear in this lexicon.</Paragraph> <Paragraph position="3"> Some examples of initial trees are (for simplicity, we have omitted the constraints associated with the They are reduced to a pre-terminal node in the case of simple categories such as COMP or DET (trees 1, 2 and 3) and are expanded into more complex structures in the case of categories taking arguments (tree 4). They correspond to the maximal projection of a category in the case of simple phrases and to trees which will be systematically substituted for one of the argument positions of one of the elementary structures. Trees 6-7 are examples of S-type initial trees: they are usually considered as projections of a verb and usually take nominal complements. The NP-type tree 'Mary' (tree 5), and the NP-type tree 'John' (similar to tree 5), for example, will be inserted by substitution in the tree 6 corresponding to 'NP0 saw NPi' to produce 'John</Paragraph> </Section> <Section position="3" start_page="211" end_page="212" type="metho"> <SectionTitle> 2 Parsing Lexicalized TAGs </SectionTitle> <Paragraph position="0"> We assume that the input sentence is not infinite and that it cannot be syntactically infinitely ambiguous.</Paragraph> <Paragraph position="1"> 'Lexicalization' simplifies the task of a parser in the following sense. The first pass of the parser filters the grammar to a grammar corresponding to the input string. It also puts constraints on the way that adjunctions or substitutions can be performed since each structure has a head whose position in the input string is recorded. The 'grammar' of the parser is reduced to a subset of the entire grammar. Furthermore, since each rule can be used only once, recursion does not lead to the usual non-termination problem. Once a structure has been chosen for a given token, the other possible structures for the same token do not participate in the parse. Of course, if the sentence is ambiguous, there may be more than one choice.</Paragraph> <Paragraph position="2"> If one adopts an off-line parsing algorithm, the parsing problem is reduced to the following two steps: * In the first step the parser will select the set of structures corresponding to each word in the sentence.</Paragraph> <Paragraph position="3"> Each structure can be considered as an encoding of a set of 'rules'.</Paragraph> <Paragraph position="4"> * Then the parser tries to determine whether these structures can be combined to obtain a well-formed structure. In particular, it puts the structures corresponding to arguments into the structures corresponding to predicates, and adjoins, if needed, the auxiliary structures corresponding to adjuncts to what they select (or are selected) for.</Paragraph> <Paragraph position="5"> In principle, any parsing strategy can be applied in the second step, since the number of structures produced is finite, and since each of them corresponds to a token in the input string, the search space is finite and termination is guaranteed. In principle, one can proceed inside out, left to right or in any other way. Of course, standard parsing algorithms can be used, too. In particular, we can use the top-down parsing strategy without encountering the usual problems due to recursion.</Paragraph> <Paragraph position="6"> By assuming that the number of structures associated with a lexical item is finite, since each structure has a lexical item attached to it, we implicitly make the assumption that an input string of finite length cannot be syntactically infinitely ambiguous.</Paragraph> <Paragraph position="7"> An Earley-type parser for TAGs has been investigated by Schabes and Joshi (1988). The algorithm has a linear best time behavior and an O(n 9) worst time behavior. This is the first practical parser for TAGs because as is well known for CFGs, the average behavior of Earley-type parsers is superior to its worst time behavior. We extended it to deal with substitution and feature structures for TAGs. By doing this, we have built a system that parses unification formalisms that have a CFG skeleton and also those that have a TAG skeleton.</Paragraph> <Paragraph position="8"> The Earley-type parser for TAGs can be extended to take advantage of lexicalized TAGs. Once the first pass has been performed, a subset of the grammar is selected. The structures encode the value and positions of their head. Structures of same head value are merged together, and the list of head positions recorded. 4 This enables us to use the head position information while processing efficiently the structures. For example, given the sentence The 1 man 2 who 3 saw 4 the 5 woman 6 who 7 saw s John 9 is 10 happy 11, the following trees are selected after the first pass (among others): 5 since it increases unnecessarily the number of states of the Ear\]ey parser. By factoring recursion, the Ear\]ey parser enables us to process only once parts of a tree that are associated with several lexical items of same value but differeht positions. However, if termination is required for a pure top down parser, it is necessary to distinguish each structure by its head position. Notice that there is only one tree for the relative clauses introduced by saw but that its head position can be 4 or 8. Similarly for who and the.</Paragraph> <Paragraph position="9"> The head positions of each structure imposes constraints on the way that the structures can be combined (the head positions must appear in increasing order in the combined structure). This helps the parser to filter out predictions or completions for adjunction or substitution. For example, the tree corresponding to mere will not be predicted for substitution in any of the trees corresponding to saw since the head positions would not be in the right order.</Paragraph> </Section> <Section position="4" start_page="212" end_page="212" type="metho"> <SectionTitle> 3 Lexicalized TAG for English </SectionTitle> <Paragraph position="0"> A lexicalized TAG for English is under development (Bishop, Cote and Abeill~, 1988). Trees are gathered in tree families when an element of a certain type (e.g. a verb) is associated with more than one tree. We have 55 such tree families that correspond to most of the basic argument structures. There are 10 tree families for simple verb sentences, 17 families for sentences with verbs taking sentential complements, 11 families for light verb-noun constructions, 7 families for verb-particle combinations and 10 families for light verb-adjective constructions. A tree family consists on average of 12 trees, which makes approximately 700 trees total.</Paragraph> <Paragraph position="1"> The grammar covers subcategorization (strictly lexicalized), wh-movement and unbounded dependencies, light verb construction, some idioms, transitivity alternations (such as dative shift or the so-called ergative alternation), subjacency and some island violations.</Paragraph> <Paragraph position="2"> The current size of the lexicon is approximately 1200 words: 750 verbs; 350 nouns; 50 adjectives; 25 prepositions, adverbs and determiners.</Paragraph> <Paragraph position="3"> Subsets are being extracted and interfaced to the parser. Each subset is being incrementally augmented as it is debugged. A similar lexicalized TAG for French is also under development.</Paragraph> </Section> <Section position="5" start_page="212" end_page="213" type="metho"> <SectionTitle> 4 Parsing Idioms in Lexicalized TAGs </SectionTitle> <Paragraph position="0"> In lexicalized TAGs, idioms fall into the same grammar as 'free' sentences (Abeill~ and Schabes, 1989). We assign them regular syntactic structures while representing them semantically as one entry. Transformations and modifiers thus can apply to them. Unlike previous approaches, their variability becomes the general case and their being totally frozen the exception.</Paragraph> <Paragraph position="1"> Idioms are represented by extended elementary trees with multicomponent head. When an idiomatic tree is selected by its head, lexical items are attached to some nodes in the tree. Idiomatic trees are selected by a single head node; however the head value imposes lexical values of other nodes in the tree. This operation of attaching the head item of an idiom and its lexical parts is called lexlcal attachment. The resn\]ting tree has the lexical items corresponding to the pieces of the idiom already attached to it.</Paragraph> <Paragraph position="2"> The parser must be able to conjecture idiomatic and literal interpretation of an idiom. We propose to parse idioms in two steps which are merged in the two steps parsing strategy that we use. The first step performed during the lexical pass selects trees corresponding to the literal and idiomatic interpretation. However it is not always the case that the idiomatic trees are selected as possible candidates. We require that all basic pieces building the minimal idiomatic expression must be present in the input string (in the right order). This condition is a necessary condition for the idiomatic reading but of course it is not sufficient. The second step performs the syntactic analysis as in the usual case. During the second step, idiomatic reading might be rejected. Idioms are thus parsed as any 'free' sentences. Except during the selection process, idioms do not require any special parsing mechanism.</Paragraph> <Paragraph position="3"> Furthermore, our representation allows us to recognize discontinuities in idioms that come from internal structures and from the insertion of modifiers.</Paragraph> <Paragraph position="4"> Take as example the sentential idiom NPo kicked the bucket. We have, among others, the following entries in the lexicon: 6 kicked , V : Tnl (simple transitive verb) (a) kicked , V : Tdnl\[D1 = the, N1 = bucket\] (idiom: kicked the bucket) (b) the , D : cuD (one node tree rooted by D) (c) bucket , N : o~NPdn (NP tree expecting a determiner) (d) John , N : o~NP (NP tree for proper nouns) (e) Suppose that the input sentence is .John kicked the bucket. In the first pass, the trees in Figure 1 are be selected (among others).</Paragraph> <Paragraph position="5"> The first entry for kicked (a) specifies that kicked can be attached under the V node in the tree ~tnl (See the tree o~tnl\[kicked\] in Figure 1). However the second entry for kicked (b) specifies that kicked can be attached under the V node and that the must be attached under the node labeled by D1 and that bucket must be attached under the node labeled N1 in the tree o~tnl (See the tree atdnl\[kicked-the-bucket\] in Figure 1).</Paragraph> <Paragraph position="6"> The sentence can be parsed in two different ways. One derivation is built with the trees: c~tnl\[kicked\] (transitive verb), otNPn\[John\], c~D\[the\] and o~NPn\[bucket\] . It corresponds to the literal interpretation; the other derivation is built with the trees: atdnl\[kicked-the-bucket\] (idiomatic tree) and o~NPn\[John\] (John). However, both derivations have the same derived tree:</Paragraph> </Section> class="xml-element"></Paper>