File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1522_metho.xml
Size: 18,906 bytes
Last Modified: 2025-10-06 14:10:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1522"> <Title>Modeling and Analysis of Elliptic Coordination by Dynamic Exploitation of Derivation Forests in LTAG parsing</Title> <Section position="4" start_page="0" end_page="147" type="metho"> <SectionTitle> 2 Linguistic Motivations : a parallelism </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="147" type="sub_section"> <SectionTitle> of Derivation </SectionTitle> <Paragraph position="0"> The LTAG formalism provides a derivation tree which is strictly the history of the operations needed to build a constituent structure, the derived tree. In order to be fully appropriate for semantic inference 1, the derivation tree should display every syntactico-semantic argument and therefore should be a graph. However to obtain this kind of dependency structure when it is not possible to rely on lexical information, as opposed to (Seddah and Gaiffe, 2005a), is significantly more complicated. An example of this is provided by elliptic coordination.</Paragraph> <Paragraph position="1"> Consider the sentences Figure 3. They all can be analyzed as coordinations of S categories2 with one side lacking one mandatory argument. In (4), one could argue for VP coordination, because the two predicates share the same continuum (same subcategorization frame and semantic space). However the S hypothesis is more generalizable and supports more easily the analysis of coordination of unlike categories (&quot;John is a republican and proud of it&quot; becomes &quot;Johni isj a republican and ei ej proud of it&quot;).</Paragraph> <Paragraph position="2"> The main difficulty is to separate the cases when a true co-indexation occurs ((2) and (4)) from the cases of a partial duplication (in (1), the predicate is not shared and its feature structures could differ on aspects, tense or number3). In an elliptic construction, some words are unrealized. Therefore, their associated syntactic structures are also non-realized, at least to some extent. However, our aim is to get, as a result of the parsing process, the full constituency and dependency structures of the sentence, including erased semantic items (or units) and their (empty) syntactic positions. Since their syntactic realizations have been erased, the construction of the dependency structure can not 1As elementary trees are lexicalized and must have a minimal semantic meaning (Abeille, 1991), the derivation tree can be seen as a dependency tree with respect to the restrictions defined by (Rambow and Joshi, 1994) and (Candito and Kahane, 1998) to cite a few.</Paragraph> <Paragraph position="3"> be anchored to lexical items. Instead, it has to be anchored on non-realized lexical items and guided by the dependency structure of the reference phrase. Indeed, it is because of the parallelism between the reference phrase and the elliptical phrase that an ellipsis can be interpreted.</Paragraph> </Section> </Section> <Section position="5" start_page="147" end_page="147" type="metho"> <SectionTitle> 3 The Fusion Operation </SectionTitle> <Paragraph position="0"> In this research, we assume that every coordinator, which occurs in elided sentences, anchors an initial tree aconj rooted by P and with two substitution nodes of category P (Figure 1). The fu-</Paragraph> <Paragraph position="2"> FIG. 1 - Initial Tree aconj sion operation replaces the missing derivation of any side of the coordinator by the corresponding ones from the other side. It shall be noted that the fusion provide proper node sharing when it is syntactically decidable (cf. 6.4). The implementation relies on the use of non lexicalized trees (ie tree schemes) called ghost trees. Their purpose is to be the support for partial derivations which will be used to rebuild the derivation walk in the elided part. We call the partial derivations ghost derivations. The incomplete derivations from the tree g are shown as a broken tree in Figure 2. The ghost derivations are induced by the inclusion of the ghost tree a' which must be the scheme of the tree a. When the two derivation structures from g and a' are processed by the fusion operation, a complete derivation structure is obtained.</Paragraph> <Paragraph position="3"> Let us go back to the following sentences : (1) Jean aimei Marie et Paul ei Virginie John loves Mary and Paul Virginia (2) Pauli aime Virginie et ei deteste Marie Paul loves Virginia and hates Mary Obviously (1) can have as a logical formula : aime'(jean',Marie')[?]aime'(paul',virginie') whereas (2) is rewritten by eat(paul',apple') [?] buy'(Paul',cherries'). The question is to differentiate the two occurrence of aime' in (1) from the paul' ones. Of course, the second should be noted as a sharing of the same argument when the first is a copy of the predicate aime'. Therefore in order to represent the sharing, we will use the same node in the dependency graph while a ghosted node (noted by ghost(g) in our figures) will be used in the other case. This leads to the analysis figure 4. The level of what exactly should be copied, speaking of level of information, is outside the scope of this paper, but our intuition is that a state between a pure anchored tree and an tree schemata is probably the correct answer. As we said, aspect, tense and in most case diathesis for 4 are shared, as it is showed by the following sentences : (3)*Paul killed John and Bill by Rodger (4)*Paul ate apple and Mary will pears As opposed to (4), we believe &quot;Paul ate apples and Mary will do pears&quot; to be correct but in this case, we do not strictly have an ellipsis but a semi-modal verb which is susbsumed by its co-referent. Although our proposition focuses on syntax-semantic interface, mainly missing syntactic arguments.</Paragraph> </Section> <Section position="6" start_page="147" end_page="148" type="metho"> <SectionTitle> 5 Ghost Trees and Logical Abstractions </SectionTitle> <Paragraph position="0"> Looking either at the approach proposed by (Dalrymple et al., 1991) or (Steedman, 1990) for the treatment of sentences with gaps, we note that in both cases5 one wants to abstract the realized element in one side of the coordination in order to instantiate it in the other conjunct using the coordinator as the pivot of this process. In our analysis, this is exactly the role of ghost trees to support such abstraction (talking either about High Order Variable or l-abstraction). In this regard, the fusion operation has only to check that the derivations induced by the ghost tree superimpose well with the derivations of the realized side.</Paragraph> <Paragraph position="1"> This is where our approach differs strongly from (Sarkar and Joshi, 1996). Using the fusion operation involves inserting partial derivations, which are linked to already existing ones (the realized derivation), into the shared forest whereas using 4w.r.t to the examples of (Dalrymple et al., 1991), i.e &quot;It is possible that this result can be derived (..) but I know of no theory that does so.&quot; 5Footnote n@3, page 5 for (Dalrymple et al., 1991), and pages 41-42 for (Steedman, 1990).</Paragraph> <Paragraph position="2"> the conjoin operation defined in (Sarkar and Joshi, 1996) involves merging nodes from different trees while the tree anchored by a coordinator acts similarly to an auxiliary tree with two foot nodes.</Paragraph> <Paragraph position="3"> This may cause difficulties to derive the now dag into a linear string. In our approach, we use empty lexical items in order to leave traces in the derivation forest and to have syntacticly motivated derived tree (cf fig. 5) if we extract only the regular LTAG &quot;derivation item&quot; from the forest.</Paragraph> </Section> <Section position="7" start_page="148" end_page="149" type="metho"> <SectionTitle> 6 LTAG implementation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="148" end_page="148" type="sub_section"> <SectionTitle> 6.1 Working on shared forest </SectionTitle> <Paragraph position="0"> A shared forest is a structure which combines all the information coming from derivation trees and from derived trees. Following (Vijay-Shanker and Weir, 1993; Lang, 1991), each tree anchored by the elements of the input sentence is described by a set of rewriting rules. We use the fact that each rule which validates a derivation can infer a derivation item and has access to the whole chart in order to prepare the inference process.</Paragraph> <Paragraph position="1"> The goal is to use the shared forest as a guide for synchronizing the derivation structures from both parts of the coordinator.</Paragraph> <Paragraph position="2"> This forest is represented by a context free grammar augmented by a stack containing the current adjunctions (Seddah and Gaiffe, 2005a), which looks like a Linear Indexed Grammar (Aho, 1968).</Paragraph> <Paragraph position="3"> Each part of a rule corresponds to an item a la Cock Kasami Younger described by (Shieber et al., 1995), whose form is < N,POS,I,J,STACK > with N a node of an elementary tree, POS the situation relative to an adjunction (marked [?] if an adjunction is still possible, [?] otherwise). This is marked on figure 5 with a bold dot in high position, [?], or a bold dot in low position, [?]). I and J are the start and end indices of the string dominated by the N node. STACK is the stack containing all the call of the subtrees which has started an adjunction et which must be recognized by the foot recognition rules. We used S as the starting symbol of the grammar and n is the length of the initial string. Only the rules which prove a derivation are shown in figure 6.</Paragraph> <Paragraph position="4"> The form of a derivation item is</Paragraph> <Paragraph position="6"> where Name is the derivation, typed Type6, of the tree gfrom to the node Node of gto.7</Paragraph> </Section> <Section position="2" start_page="148" end_page="149" type="sub_section"> <SectionTitle> 6.2 Overview of the process </SectionTitle> <Paragraph position="0"> We refer to a ghost derivation as any derivation which occurs in a tree anchored by an empty element, and ghost tree as a tree anchored by this empty element. As we can see in figure 5, we assume that the proper ghost tree has been selected. So the problem remains to know which structure we have to use in order to synchronize our derivation process.</Paragraph> <Paragraph position="1"> Elliptic substitution of an initial ghost tree on a tree aconj : Given a tree aconj (see Fig.</Paragraph> <Paragraph position="2"> 1) anchored by a coordinator and an initial tree a1 of root P to be substituted in the leftmost P node of aconj. Then the rule corresponding to the traversal of the Leftmost P node would be</Paragraph> <Paragraph position="4"> So if this rule is validated, then we infer a derivation item called D1 :<PaconjG,a1,aconj,subst,-> .</Paragraph> <Paragraph position="5"> Now, let us assume that the node situated to the right of the coordinating conjunction dominates a phrase whose verb has been erased (as in et Paul _ Virginie) and that there exists a tree of Root P with two argument positions (a quasi tree like N0VN1 in LTAG literature for example). This ghost tree is anchored by an empty element and is called aghost. We have a rule, called Call-subst-ghost, describing the traversal of this node :</Paragraph> <Paragraph position="7"> where the non-instantiated variable, ? , indicates the missing information in the synchronized tree.</Paragraph> <Paragraph position="8"> If our hypothesis is correct, this tree will be anchored by the anchor of a1. So we have to prepare this anchoring by performing a synchronization with existing derivations. This leads us to infer a ghost substitution derivation of the tree a1 on the node PaconjD. The inference rule which produces the</Paragraph> <Paragraph position="10"> The process which is almost the same for the remaining derivations, is described section 6.4.</Paragraph> </Section> <Section position="3" start_page="149" end_page="149" type="sub_section"> <SectionTitle> 6.3 Ghost derivation and Item retrieving </SectionTitle> <Paragraph position="0"> In the last section we have described a ghost derivation as a derivation which deals with a tree anchored by an empty element, either it is the source tree or the destination tree. In fact we need to keep marks on the shared forest between what we are really traversing during the parsing process and what we are synchronizing, that is why we need to have access to all the needed informations.</Paragraph> <Paragraph position="1"> But the only rule which really knows which tree will be either co-indexed or duplicated is the rule describing the substitution of the realized tree.</Paragraph> <Paragraph position="2"> So, we have to get this information by accessing the corresponding derivation item. If we are in a two phase generation process of a shared forest8 we can generate simultaneously the substitution rules for the leftmost and rightmost nodes of the tree anchored by a coordination and then we can easily get the right synchronized derivation from the start. Here we have to fetch from the chart this item using unification variables through the path of the derivations leading to it.</Paragraph> <Paragraph position="3"> Let us call &quot;climbing&quot; the process of going from a leaf node N of a tree g to the node belonging to the tree anchored by a coordinator (aconj) and which dominates this node.</Paragraph> <Paragraph position="4"> This &quot;climbing&quot; gives us a list of linked derivations (ie. [< gx(N),gy,gx,Type,IsGhost > ,< gz(N),gx,gz,Type1,IsGhost1 >,..] where g(N) is the node of the tree g where the derivation takes place9). The last returned item is the one who has an exact counterpart in the other conjunct, and which is easy to recover as shown by the inference rule in the previous section. Given this item, we start the opposite process, called &quot;descent&quot;, which use the available data gathered by the climbing (the derivation starting nodes, the argumental position marked by an index on nodes in TAG gram8The first phase is the generation of the set of rules, (Vijay-Shanker and Weir, 1993), and the second one is the forest traversal (Lang, 1992). See (Seddah and Gaiffe, 2005b) for a way to generate a shared derivation forest where each derivation rule infers its own derivation item, directly prepared during the generation phase.</Paragraph> <Paragraph position="5"> 9The form of a derivation item is defined section 6.1 mars..) to follow a parallel path. Our algorithm can be considered as taking the two resulting lists as a parameter to produce the correct derivation item.</Paragraph> <Paragraph position="6"> If we apply a two step generation process (shared forest generation then extraction), the &quot;descent&quot; and the &quot;climbing&quot; phase can be done in parallel in the same time efficient way than(2005a).</Paragraph> </Section> <Section position="4" start_page="149" end_page="149" type="sub_section"> <SectionTitle> 6.4 Description of inference rules </SectionTitle> <Paragraph position="0"> In this section we will describe all of the inferences relative to the derivation in the right part, resp. left, of the coordination, seen in figure 5.</Paragraph> <Paragraph position="1"> In the remainder of this paper, we describe the inference rules involved in so called predicative derivations (substitutions and ghost substitutions).</Paragraph> <Paragraph position="2"> Indeed, the status of adjunction is ambiguous. In the general case, when an adjunct is present on one side only of the conjunct, there are two possible readings : one reading with an erased (co-indexed) modifier on the other side, and one reading with no such modifier at all on this other side. In the reading with erasing, there is an additionnal question, which occurs in the substitution case as well : in the derivation structure, shall we co-index the erased node with its reference node, or shall we perform a (partial) copy, hence creating two (partially co-indexed) nodes ? The answer to this question is non-trivial, and an appropriate heuristics is needed. A first guess could be the following : any fully erased node (which spans an empty range) is fully co-indexed, any partially erased node is copied (with partial co-indexation). In particular, erased verbs are always copied, since they can not occur without non-erased arguments (or modifiers).</Paragraph> <Paragraph position="3"> Elliptic substitution of an initial tree a on a ghost tree gghost : If a tree a substituted in a node Ni of a ghost tree gghost (ie. Derivation g-Der2' on figure 5), where i is the traditional index of an argumental position (N0,N1...) of this tree ; and if there exists a ghost derivation of a substitution of the tree gghost into a coordination tree aconj (Der. g-Der1) and therefore if this ghost derivation pertains to a tree aX where a substitution derivation exists node Ni,(Der.</Paragraph> <Paragraph position="4"> Der2) then we infer a ghost derivation indicating the substitution of a on the forwarded tree aX through the node Ni of the ghost tree gghost (Der.</Paragraph> <Paragraph position="5"> This is the mechanism seen in the analysis of &quot;Jean aime Marie et Pierre Virginie&quot; to provide the derivation tree.</Paragraph> <Paragraph position="6"> Elliptic substitution of a initial ghost tree aghost on a tree g substituted on an tree aconj : We are here on a kind of opposite situation, we have a realized subtree which lacks one of its argument such as Jeani dormit puis oi mourut (Johni slept then oi died). So we have to first let a mark in the shared forest, then fetch the tree substituted on the left part of the coordination, and get the tree which has substituted on its ith node, then we will be able to infer the proper substitution. We want to create a real link, because as opposed to the last case, it's really a link, so the resulting structure would be a graph with two links out of the tree anchored by Jean, one to [dormir] (to sleep) and one to [mourir] (to die).</Paragraph> <Paragraph position="7"> If a ghost tree aghost substituted on a node Ni of a tree a (Der. g-Der1'), if this tree a has been substituted on a substitution node,PconjD, in the rightmost part of a tree aconj, (Der. Der1) anchored by a coordinating conjunction, if the leftmost part node, PconjL, of aconj received a substitution of a tree as, (Der. Der2) and if this tree has a substitution of a tree afinal on its ith node, (Der. Der3) then we infer an item indicating a derivation between the tree afinal and the tree a on its node Ni, (Der. g-Der1)10.</Paragraph> </Section> </Section> class="xml-element"></Paper>