XML Viewer - p06-1146

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1146_metho.xml
Size: 19,753 bytes
Last Modified: 2025-10-06 14:10:23
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1146">
  <Title>Optimal Constituent Alignment with Edge Covers for Semantic Projection</Title>
  <Section position="5" start_page="1162" end_page="1163" type="metho">
    <SectionTitle>
3 Globally optimal constituent alignment
</SectionTitle>
    <Paragraph position="0"> We model constituent alignment as a minimum weight bipartite edge cover problem. A bipartite graph is a graph G = (V,E) whose node set V is partitioned into two nonempty sets V1 and V2 in such a way that every edge E joins a node in V1 to a node in V2. In a weighted bipartite graph a weight is assigned to each edge. An edge cover is a subgraph of a bipartite graph so that each node is linked to at least one node of the other partition. A minimum weight edge cover is an edge cover with the least possible sum of edge weights.</Paragraph>
    <Paragraph position="1"> In our projection application, the two partitions are the sets of source and target sentence constituents, Us and Ut, respectively. Each source node is connected to all target nodes and each target node to all source nodes; these edges can be thoughtofaspotentialconstituentalignments.The edge weights, which represent the (dis)similarity between nodes us and ut are set to 1[?]sim(us,ut).2 The minimum weight edge cover then represents the alignment with the maximal similarity between source and target constituents. Below, we present details on graph edge covers and a more restricted kind, minimum weight perfect bipartite matchings. We also discuss their computation.</Paragraph>
    <Paragraph position="2"> Edgecovers GivenabipartitegraphG,aminimum weight edge cover Ae can be defined as:</Paragraph>
    <Paragraph position="4"> An example edge cover is illustrated in Figure 2 (middle). Edge covers are somewhat more constrained compared to the local model described above: all source and target nodes have to take part in some alignment. We argue that this is desirable in modelling constituent alignment, since important linguistic units will not be ignored. As can be seen, edge covers allow one-to-many alignments which are common when translating from one language to another. For example, an English constituent might be split into several German constituents or alternatively two English constituents might be merged into a single German constituent.</Paragraph>
    <Paragraph position="5"> In Figure 2, the source nodes (3) and (4) correspond to target node (4). Since each node of either side has to participate in at least one alignment, edge covers cannot account for insertions arising when constituents in the source language have no counterpart in their target language, or vice versa, as is the case for deletions.</Paragraph>
    <Paragraph position="6"> Weighted perfect bipartite matchings Perfect bipartite matchings are a more constrained version of edge covers, in which each node has exactly one adjacent edge. This restricts constituent  (Us,Ut: sets of source and target constituents; r1,r2: two semantic roles). Left: local forward alignment; middle: edge cover; right: perfect matching with dummy nodes alignment to a bijective function: each source constituent is linked to exactly one target constituent, and vice versa. Analogously, a minimum weight perfect bipartite matching Am is a minimum weight edge cover obeying the one-to-one constraint:</Paragraph>
    <Paragraph position="8"> An example of a perfect bipartite matching is given in Figure 2 (right), where each node has exactly one adjacent edge. Note that the target side contains two nodes labelled (d), a shorthand for &amp;quot;dummy&amp;quot; node. Since sentence pairs will often differ in length, the resulting graph partitions will have different sizes as well. In such cases, dummy nodes are introduced in the smaller partition to enable perfect matching. Dummy nodes are assigned a similarity of zero with all other nodes.</Paragraph>
    <Paragraph position="9"> Alignments to dummy nodes (such as for source nodes (3) and (6)) are ignored during projection.</Paragraph>
    <Paragraph position="10"> Perfect matchings are more restrictive models of constituent alignment than edge covers. Being bijective, the resulting alignments cannot model splitting or merging operations at all. Insertions and deletions can be modelled only indirectly by aligning nodes in the larger partition to dummy nodes on the other side (see the source side in Figure 2 where nodes (3) and (6) are aligned to (d)).</Paragraph>
    <Paragraph position="11"> Section 5 assesses if these modelling limitations impact the quality of the resulting alignments.</Paragraph>
    <Paragraph position="12"> Algorithms Minimum weight perfect matchings in bipartite graphs can be computed efficiently in cubic time using algorithms for network optimisation (Fredman and Tarjan, 1987; timeO(|Us|2log|Us|+|Us|2|Ut|))oralgorithmsfor the equivalent linear assignment problem (Jonker and Volgenant, 1987; time O(max(|Us|,|Ut|)3)).</Paragraph>
    <Paragraph position="13"> Their complexity is a linear factor slower than the quadratic runtime of the local optimisation methods presented in Section 2.</Paragraph>
    <Paragraph position="14"> The computation of (general) edge covers has been investigated by Eiter and Mannila (1997) in the context of distance metrics for point sets. They show that edge covers can be reduced to minimum weight perfect matchings of an auxiliary bipartite graph with two partitions of size |Us|+|Ut|. This allows the computation of general minimum weight edge covers in time O((|Us|+|Ut|)3).</Paragraph>
  </Section>
  <Section position="6" start_page="1163" end_page="1164" type="metho">
    <SectionTitle>
4 Filtering via Tree Pruning
</SectionTitle>
    <Paragraph position="0"> We introduce two filtering techniques which effectively remove constituents from source and target treesbeforealignmenttakesplace.Treepruningas  apreprocessingstepismoregeneralandmoreefficient than our original post-processing filter (Pado and Lapata, 2005) which was embedded into the similarity function. Not only does tree pruning not interfere with the similarity function but also reduces the size of the graph, thus speeding up the algorithms discussed in the previous section.</Paragraph>
    <Paragraph position="1"> We present two instantiations of tree pruning: word-based filtering, which subsumes our earlier  method,andargument-basedfiltering,whicheliminates unlikely argument candidates. Word-based filtering This technique removes terminal nodes from parse trees according to certain linguistic or alignment-based criteria. We apply two word-based filters in our experiments. The first removes non-content words, i.e., all words which are not adjectives, adverbs, verbs, or nouns, from the source and target sen- null novel filter which removes all words which remain unaligned in the automatic word alignment. Non-terminal nodes whose terminals are removed by these filters, are also pruned.</Paragraph>
    <Paragraph position="2"> Argument filtering Previous work in shallow semantic parsing has demonstrated that not all nodes in a tree are equally probable as semantic roles for a given predicate (Xue and Palmer, 2004). In fact, assuming a perfect parse, there is a &amp;quot;set of likely arguments&amp;quot;, to which almost all semantic roles roles should be assigned to. This set of likely arguments consists of all constituents which are a child of some ancestor of the predicate, provided that (a) they do not dominate the predicate themselves and (b) there is no sentence boundary between a constituent and its predicate.</Paragraph>
    <Paragraph position="3"> This definition covers long-distance dependencies such as control constructions for verbs, or support constructions for nouns and adjectives, and can be extended slightly to accommodate coordination.</Paragraph>
    <Paragraph position="4"> This argument-based filter reduces target trees to a set of likely arguments. In the example in Figure 3, all tree nodes are removed except Kim and punktlich zu kommen.</Paragraph>
  </Section>
  <Section position="7" start_page="1164" end_page="1165" type="metho">
    <SectionTitle>
5 Evaluation Set-up
</SectionTitle>
    <Paragraph position="0"> Data For evaluation, we used the parallel corpus3 from our earlier work (Pado and Lapata, 2005). It consists of 1,000 English-German sentence pairs from the Europarl corpus (Koehn, 2005). The sentences were automatically parsed (using Collin's 1997 parser for English and Dubey's 2005 parser for German), and manually annotated with FrameNet-like semantic roles (see Pado and Lapata 2005 for details.) Word alignments were computed with the GIZA++ toolkit (Och and Ney, 2003), using the  coli.uni-saarland.de/~pado/projection/.</Paragraph>
    <Paragraph position="1"> entire English-German Europarl bitext as training data (20M words). We used the GIZA++ default settings to induce alignments for both directions (source-target, target-source). Following common practiseinMT(Koehnetal.,2003),weconsidered only their intersection (bidirectional alignments are known to exhibit high precision). We also produced manual word alignments for all sentences in our corpus, using the GIZA++ alignments as a startingpointandfollowingtheBlinkerannotation guidelines (Melamed, 1998).</Paragraph>
    <Paragraph position="2"> Method and parameter choice The constituent alignment models we present are unsupervised in that they do not require labelled data for inferring correct alignments. Nevertheless, our models have three parameters: (a) the similarity measure for identifying semantically equivalent constituents; (b) the filtering procedure for removing noise in the data (e.g., wrong alignments); and (c) the decision procedure for projection.</Paragraph>
    <Paragraph position="3"> We retained the similarity measure introduced in Pado and Lapata (2005) which computes the overlap between a source constituent and its candidate projection, in both directions. Let y(cs) and y(ct) denote the yield of a source and target constituent, respectively, and al(T) the union of all word alignments for a token set T:</Paragraph>
    <Paragraph position="5"> We examined three filtering procedures (see Section 4): removing non-aligned words (NA), removing non-content words (NC), and removing unlikely arguments (Arg). These were combined with three decision procedures: local forward alignment (Forward), perfect matching (Perf-Match), and edge cover matching (EdgeCover) (see Section 3). We used Jonker and Volgenant's (1987) solver4 to compute weighted perfect matchings.</Paragraph>
    <Paragraph position="6"> In order to find optimal parameter settings for our models, we split our corpus randomly into a development and test set (both 50% of the data) and examined the parameter space exhaustively on the development set. The performance of the best models was then assessed on the test data.</Paragraph>
    <Paragraph position="7"> The models had to predict semantic roles for German, using English gold standard roles as input, and were evaluated against German gold standard  roles. To gauge the extent to which alignment errors are harmful, we present results both on intersective and manual alignments.</Paragraph>
    <Paragraph position="8"> Upper bound and baseline In Pado and Lapata (2005), we assessed the feasibility of semantic role projection by measuring how well annotators agreed on identifying roles and their spans. We obtained an inter-annotator agreement of 0.84 (F-score), which can serve as an upper bound for the projection task. As a baseline, we use a simple word-based model (WordBL) from the same study. The units of this model are words, and the span of a projected role is the union of all target terminals aligned to a terminal of the source role.</Paragraph>
  </Section>
  <Section position="8" start_page="1165" end_page="1166" type="metho">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> Development set Our results on the development set are summarised in Table 1. We show how performance varies for each model according to different filtering procedures when automatically produced word alignments are used. No filtering is applied to the baseline model (WordBL).</Paragraph>
    <Paragraph position="1"> Without filtering, local and global models yield comparableperformance.Modelsbasedonperfect bipartite matchings (PerfMatch) and edge covers (EdgeCover) obtain slight F-score improvements over the forward alignment model (Forward). It is worth noticing that PerfMatch yields a significantly higher precision (using a kh2 test, p &lt; 0.01) than Forward and EdgeCover. This indicates that, even without filtering, PerfMatch delivers rather accurate projections, however with low recall.</Paragraph>
    <Paragraph position="2"> Model performance seems to increase with tree pruning. When non-aligned words are removed (Table 1, NA Filter), PerfMatch and EdgeCover reach an F-score of 67.2 and 66.5, respectively.</Paragraph>
    <Paragraph position="3"> This is an increase of approximately 3% over the local Forward model. Although the latter model yields high precision (74.1%), its recall is significantly lower than PerfMatch and EdgeCover (p &lt; 0.01). This demonstrates the usefulness of filtering for the more constrained global models which as discussed in Section 3 can only represent a limited set of alignment possibilities.</Paragraph>
    <Paragraph position="4"> The non-content words filter (NC filter) yields smaller improvements. In fact, for the Forward model, results are worse than applying no filtering at all. We conjecture that NC is an overly aggressive filter which removes projection-critical words. This is supported by the relatively low recall values. In comparison to NA, recall drops by 8.3% for Forward and by almost 6% for PerfMatch and EdgeCover. Nevertheless, both PerfMatch and EdgeCover outperform the local Forward model. PerfMatch is the best performing model reaching an F-score of 64.0%.</Paragraph>
    <Paragraph position="5"> We now consider how the models behave when the argument-based filter is applied (Arg, Table 1, bottom). As can be seen, the local model benefits most from this filter, whereas PerfMatch is worst affected; it obtains its highest precision (80.4%) as well as its lowest recall (48.1%). This is somewhat expected since the filter removes the majority of nodes in the target partition causing a proliferation of dummy nodes. The resulting edge covers are relatively &amp;quot;unnatural&amp;quot;, thus counterbalancing the advantages of global optimisation.</Paragraph>
    <Paragraph position="6"> To summarise, we find on the development set that PerfMatch in the NA Filter condition obtains the best performance (F-score 67.2%), followed  condition. In general, PerfMatch seems less sensitive to the type of filtering used; it yields best results in three out of four filtering conditions (see boldface figures in Table 1). Our results further indicate that Arg boosts the performance of the local model by guiding it towards linguistically appropriate alignments.5 A comparative analysis of the output of PerfMatch and EdgeCover revealed that the two models make similar errors (85% overlap). Disagreements, however, arise with regard to misparses. Consider as an example the sentence pair: The Charter is [NP an opportunity to bring the EU closer to the people.] Die Charta ist [NP eine Chance], [S die EU den Burgern naherzubringen.] An ideal algorithm would align the English NP to both the German NP and S. EdgeCover, which can model one-to-many-relationships, acts &amp;quot;confidently&amp;quot; and aligns the NP to the German S to maximise the overlap similarity, incurring both a precision and a recall error. PerfMatch, on the other hand, cannot handle one-to-many relationships, acts &amp;quot;cautiously&amp;quot; and aligns the English NP to a dummy node, leading to a recall error. Thus, even though EdgeCover's analysis is partly right, it will come out worse than PerfMatch, given the current dataset and evaluation method.</Paragraph>
    <Paragraph position="7"> Test set We now examine whether our results carry over to the test data. Table 2 shows the 5Experiments using different filter combinations did not lead to performance gains over individual filters and are not reported here due to lack of space.</Paragraph>
    <Paragraph position="8"> performance of the best models (Forward (Arg), PerfMatch (NA), and EdgeCover (NA)) on automatic (Intersective) and manual (Manual) alignments.6 All models perform significantly better than the baseline but significantly worse than the upper bound (both in terms of precision and recall, p &lt; 0.01). PerfMatch and EdgeCover yield better F-scores than the Forward model. In fact, PerfMatch yields a significantly better precision than Forward (p &lt; 0.01).</Paragraph>
    <Paragraph position="9"> Relatively small performance gains are observed when manual alignments are used. The F-score increases by 2.9% for Forward, 2.2% for PerfMatch, and 1.9% for EdgeCover. Also note that this better performance is primarily due to a significant increase in recall (p &lt; 0.01), but not precision. This is an encouraging result indicating that our filters and graph-based algorithms eliminate alignment noise to a large extent. Analysis of the models' output revealed that the remaining errors are mostly due to incorrect parses (none of the parsers employed in this work were trained on the Europarl corpus) but also to modelling deficiencies. Recall from Section 3 that our global models cannot currently capture one-to-zero correspondences, i.e., deletions and insertions.</Paragraph>
  </Section>
  <Section position="9" start_page="1166" end_page="1167" type="metho">
    <SectionTitle>
7 Related work
</SectionTitle>
    <Paragraph position="0"> Previous work has primarily focused on the projectionofgrammatical(YarowskyandNgai,2001) null and syntactic information (Hwa et al., 2002). An exception is Fung and Chen (2004), who also attempt to induce FrameNet-style annotations in Chinese. Their method maps English FrameNet entries to concepts listed in HowNet7, an on-line ontology for Chinese, without using parallel texts.</Paragraph>
    <Paragraph position="1"> The present work extends our earlier projection framework (Pado and Lapata, 2005) by proposing global methods for automatic constituent alignment. Although our models are evaluated on the semantic role projection task, we believe they also show promise in the context of statistical machine translation. Especially for systems that use syntactic information to enhance translation quality. For example, Xia and McCord (2004) exploit constituent alignment for rearranging sentences in the source language so as to make their word or- null der similar to that of the target language. They learn tree reordering rules by aligning constituents heuristically using a naive local optimisation procedure analogous to forward alignment. A similar approach is described in Collins et al. (2005); however, the rules are manually specified and the constituent alignment step reduces to inspection of the source-target sentence pairs. The global optimisation models presented in this paper could be easily employed for the reordering task common to both approaches.</Paragraph>
    <Paragraph position="2"> Other approaches treat rewrite rules not as a preprocessing step (e.g., for reordering source strings), but as a part of the translation model itself (Gildea, 2003; Gildea, 2004). Constituent alignments are learnt by estimating the probability of tree transformations, such as node deletions, insertions, and reorderings. These models have a  greaterexpressivepowerthanouredgecovermodels; however, this implies that approximations are often used to make computation feasible.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML