File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1612_metho.xml
Size: 16,791 bytes
Last Modified: 2025-10-06 14:09:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1612"> <Title>Explorations in Sentence Fusion[?]</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Data collection and Annotation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 General approach </SectionTitle> <Paragraph position="0"> Alignment has become standard practice in data-driven approaches to machine translation (e.g. [Och and Ney, 2000]).</Paragraph> <Paragraph position="1"> Initially work focused on word-based alignment, but more recent research also addresses alignment at the higher levels (substrings, syntactic phrases or trees), e.g.,[Gildea, 2003].</Paragraph> <Paragraph position="2"> The latter approach seems most suitable for current purposes, where we want to express that a sequence of words in one sentence is related to a non-identical sequence of words in another sentence (a paraphrase, for instance). However, if we allow alignment of arbitrary substrings of two sentences, then the number of possible alignments grows exponentially to the number of tokens in the sentences, and the process of alignment - either manually or automatically - may become infeasible. An alternative, which seems to occupy the middle ground between word alignment on the one hand and alignment of arbitrary substrings on the other, is to align syntactic analyses. Here, following [Barzilay, 2003], we will align sentences at the level of dependency structures. Unlike to [Barzilay, 2003], we are interested in a number of different alignment relations between sentences, and pay special attention to the feasibility of this alignment task.</Paragraph> <Paragraph position="3"> heb ik in the loop van mijn leven heel veel contacten gehad met heel veel serieuze personen. (lit. 'Thus have I in the course of my life very many contacts had with very many serious persons').</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Corpus </SectionTitle> <Paragraph position="0"> For evaluation and parameter estimation we have developed a parallel monolingual corpus consisting of two different Dutch translations of the French book &quot;Le petit prince&quot; (the little prince) by Antoine de Saint-Exup'ery (published 1943), one by Laetitia de Beaufort-van Hamel (1966) and one by Ernst Altena (2000). The texts were automatically tokenized and split into sentences, after which errors were manually corrected. Corresponding sentences from both translations were manually aligned; in most cases this was a one-to-one mapping but occasionally a single sentence in one version mapped onto two sentences in the other: Next, the Alpino parser for Dutch (e.g., [Bouma et al., 2001]) was used for part-of-speech tagging and lemmatizing all words, and for assigning a dependency analysis to all sentences. The POS labels indicate the major word class (e.g. verb, noun, pron, and adv). The dependency relations hold between tokens and are the same as used in the Spoken Dutch Corpus (see e.g., [van der Wouden et al., 2002]). These include dependencies such as head/subject, head/modifier and coordination/conjunction. See Figure 1 for an example. If a full parse could not be obtained, Alpino produced partial analyses collected under a single root node. Errors in lemmatization, POS tagging, and syntactic dependency parsing were not subject to manual correction.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Task definition </SectionTitle> <Paragraph position="0"> A dependency analysis of a sentence S yields a labeled directed graph D = <V,E> , where V (vertices) are the nodes, and E (edges) are the dependency relations. For each node v in the dependency structure for a sentence S, we define STR(v) as the substring of all tokens under v (i.e., the composition of the tokens of all nodes reachable from v). For example, the string associated with node persoon in Figure 1 is heel veel serieuze personen ('very many serious persons').</Paragraph> <Paragraph position="1"> An alignment between sentences S and Sprime pairs nodes from the dependency graphs for both sentences. Aligning node v from the dependency graph D of sentence S with node vprime from the graph Dprime of Sprime indicates that there is a relation between STR(v) and STR(vprime), i.e., between the respective sub-strings associated with v and vprime. We distinguish five potential, mutually exclusive, relations between nodes (with illustrative examples): 1. v equals vprime iff STR(v) and STR(vprime) are literally identical (abstracting from case and word order) Example: &quot;a small and a large boa-constrictor&quot; equals &quot;a large and a small boa-constrictor&quot;; 2. v restates vprime iff STR(v) is a paraphrase of STR(vprime) (same information content but different wording), Example: &quot;a drawing of a boa-constrictor snake&quot; restates &quot;a drawing of a boa-constrictor&quot;; 3. v specifies vprime iff STR(v) is more specific than STR(vprime), Example: &quot;the planet B 612&quot; specifies &quot;the planet&quot;; 4. v generalizes vprime iff STR(vprime) is more specific than STR(v), Example: &quot;the planet&quot; generalizes &quot;the planet B 612&quot;; 5. v intersects vprime iff STR(v) and STR(vprime) share some informational content, but also each express some piece of information not expressed in the other, Example: &quot;Jupiter and Mars&quot; intersects &quot;Mars and Venus&quot; Note that there is an intuitive relation with entailment here: both equals and restates can be understood as mutual entailment (i.e., if the root nodes of the analyses corresponding S and Sprime stand in an equal or restate relation, S entails Sprime and Sprime entails S), if S specifies Sprime then S also entails Sprime and if S generalizes Sprime then S is entailed by Sprime.</Paragraph> <Paragraph position="2"> An alignment between S and Sprime can now formally be defined on the basis of the respective dependency graphs D = <V,E> and Dprime = <Vprime,Eprime> as a graph A = <VA,EA> , such that EA = {<v,l,vprime> |v [?] V & vprime [?] Vprime & l(STR(v), STR(vprime))}, where l is one of the five relations defined above. The nodes of A are those nodes from D en Dprime which are aligned, formally defined as</Paragraph> <Paragraph position="4"> A complete example alignment can be found in the Appendix,</Paragraph> <Paragraph position="6"> ment between annotators 1 and 2 before (A1,A2) and after (A1prime,A2prime) revision , and between the consensus and annotator 1 (Ac,A1prime) and annotator 2 (Ac,A2prime) respectively.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Alignment tool </SectionTitle> <Paragraph position="0"> For creating manual alignments, we developed a special-purpose annotation tool called Gadget ('Graphical Aligner of Dependency Graphs and Equivalent Tokens'). It shows, side by side, two sentences, as well as their respective dependency graphs. When the user clicks on a node v in the graph, the corresponding string (STR(v)) is shown at the bottom. The tool enables the user to manually construct an alignment graph on the basis of the respective dependency graphs. This is done by focusing on a node in the structure for one sentence, and then selecting a corresponding node (if possible) in the other structure, after which the user can select the relevant alignment relation. The tool offers additional support for folding parts of the graphs, highlighting unaligned nodes and hiding dependency relation labels. See Figure 4 in the Appendix for a screen shot of Gadget.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2.5 Results </SectionTitle> <Paragraph position="0"> All text material was aligned by the two authors. They started doing the first ten sentences of chapter one together in order to get a feel for the task. They continued with the remaining sentences from chapter one individually. The total number of nodes in the two translations of the chapter was 445 and 399 respectively. Inter-annotator agreement was calculated for two aspects: alignment and relation labeling. With respect to alignment, we calculated the precision, recall and F-score</Paragraph> <Paragraph position="2"> where Areal is the set of all real alignments (the reference or golden standard), Apred is the set of all predicted alignments, and Apred[?]Areal is the set all correctly predicted alignments.</Paragraph> <Paragraph position="3"> For the purpose of calculating inter-annotator agreement, one of the annotations (A1) was considered the 'real' alignment, the other (A2) the 'predicted'. The results are summarized in Table 1 in column (A1,A2).</Paragraph> <Paragraph position="4"> Next, both annotators discussed the differences in alignment, and corrected mistaken or forgotten alignments. This improved their agreement as shown in column (A1prime,A2prime). In</Paragraph> <Paragraph position="6"> relation labeling between annotators 1 and 2 before (A1,A2) and after (A1prime,A2prime) revision , and between the consensus and annotator 1 (Ac,A1prime) and annotator 2 (Ac,A2prime) respectively.</Paragraph> <Paragraph position="7"> addition, they agreed on a single consensus annotation (Ac).</Paragraph> <Paragraph position="8"> The last two columns of Table 1 show the results of evaluating each of the revised annotations against this consensus annotation. The F-score of .96 can therefore be regarded as the upper bound on the alignment task.</Paragraph> <Paragraph position="9"> In a similar way, the agreement was calculated for the task of labeling the alignment relations. Results are shown in Table 2, where the measures are weighted precision, recall and F-score. For instance, the precision is the weighted sum of the separate precision scores for each of the five relations.</Paragraph> <Paragraph position="10"> The table also shows the k-score, which is another commonly used measure for inter-annotator agreement [Carletta, 1996].</Paragraph> <Paragraph position="11"> Again, the F-score of .97 can be regarded as the upper bound on the relation labeling task.</Paragraph> <Paragraph position="12"> We think these numbers indicate that the labeled alignment task is well defined and can be accomplished with a high level of inter-annotator agreement.</Paragraph> </Section> <Section position="5" start_page="0" end_page="5" type="metho"> <SectionTitle> 3 Automatic alignment </SectionTitle> <Paragraph position="0"> In this section, we describe the alignment algorithm that we use (section 3.1), and evaluate its performance (section 3.2).</Paragraph> <Section position="1" start_page="0" end_page="5" type="sub_section"> <SectionTitle> 3.1 Tree alignment algorithm </SectionTitle> <Paragraph position="0"> The tree alignment algorithm is based on [Meyers et al., 1996], and similar to that used in [Barzilay, 2003]. It calculates the match between each node in dependency tree D against each node in dependency tree Dprime. The score for each pair of nodes only depends on the similarity of the words associated with the nodes and, recursively, on the scores of the best matching pairs of their descendants. For an efficient implementation, dynamic programming is used to build up a score matrix, which guarantees that each score will be calculated only once.</Paragraph> <Paragraph position="1"> Given two dependency trees D and Dprime, the algorithm builds up a score function S(v,vprime) for matching each node v in D against each node vprime in Dprime, which is stored in a matrix M. The value S(v,vprime) is the score for the best match between the two subtrees rooted at v in D and at vprime in Dprime. When a value for S(v,vprime) is required, and is not yet in the matrix, it is recursively computed by the following formula:</Paragraph> <Paragraph position="3"> where v1,...,vn denote the children of v and vprime1,...,vprimem denote the children of vprime. The three terms correspond to the three ways that nodes can be aligned: (1) v can be directly aligned to vprime; (2) any of the children of v can be aligned to vprime; (3) v can be aligned to any of the children of vprime. Notice that the last two options imply skipping one or more edges, and leaving one or more nodes unaligned.1 The function TREEMATCH(v,vprime) is a measure of how well the subtrees rooted at v and vprime match: Here [?]-v i denotes the dependency relation from v to vi. P(v,vprime) is the set of all possible pairings of the n children of v against the m children of vprime, which is the power set of {1,...,n}x{1,...,m}. The summation in (5) ranges over all pairs, denoted by (i,j), which appear in a given pairing p [?] P(v,vprime). Maximizing this summation thus amounts to finding the optimal alignment of children of v to children of vprime.</Paragraph> <Paragraph position="4"> NODEMATCH(v,vprime) [?] 0 is a measure of how well the label of node v matches the label of vprime.</Paragraph> <Paragraph position="5"> RELMATCH([?]-v i,[?]-v primej) [?] 0 is a measure for how well the dependency relation between node v and its child vi matches that of the dependency relation between node vprime and its child vj.</Paragraph> <Paragraph position="6"> Since the dependency graphs delivered by the Alpino parser were usually not trees, they required some modification in order to be suitable input for the tree alignment algorithm. We first determined a root node, which is defined as a node from which all other nodes in the graph can be reached. In the rare case of multiple root nodes, an arbitrary one was chosen. Starting from this root node, any cyclic edges were temporarily removed during a depth-first traversal of the graph. The resulting directed acyclic graphs may still have some amount of structure sharing, but this poses no problem for the algorithm.</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 3.2 Evaluation of automatic alignment </SectionTitle> <Paragraph position="0"> We evaluated the automatic alignment of nodes, abstracting from relation labels, as we have no algorithm for automatic labeling of these relations yet. The baseline is achieved by aligning those nodes with stand in an equals relation to each other, i.e., a node v in D is aligned to a node vprime in Dprime iff STR(v) =STR(vprime). This alignment can be constructed relatively easy.</Paragraph> <Paragraph position="1"> The alignment algorithm is tested with the following</Paragraph> <Paragraph position="3"> It reserves the highest value for a literal string match, a somewhat lower value for matching lemmas, and an even lower value in case of a synonym, hyperonym or hyponym relation.</Paragraph> <Paragraph position="4"> The latter relations are retrieved from the Dutch part of EuroWordnet [Vossen, 1998]. For the RELMATCH function, we simply used a value of 1 for identical dependency relations, and 0 otherwise. These values were found to be adequate in a number of test runs on two other, manually aligned chapters (these chapters were not used for the actual evaluation). In the future we intend to experiment with automatic optimizations.</Paragraph> <Paragraph position="5"> We measured the alignment accuracy defined as the percentage of correctly aligned node pairs, where the consensus alignment of the first chapter served as the golden standard. The results are summarized in Table 3. In order to test the contribution of synonym and hyperonym information for node matching, performance is measured with and without the use of Eurowordnet. The results show that the algorithm improves substantially on the baseline. The baseline already achieves a relatively high score (an F-score of .56), which may be attributed to the nature of our material: the translated sentence pairs are relatively close to each other and may show a sizeable amount of literal string overlap. The alignment algorithm (without use of EuroWordnet) loses a few points on precision, but improves a lot on recall (a 200% increase with respect to the baseline), which in turn leads to a substantial improvement on the overall F-score. The use of Eurowordnet leads to a small increase (two points) on both precision and recall (and thus to small increase on F-score). Yet, in comparison with the gold standard human score for this task (.95), there is clearly room for further improvement.</Paragraph> </Section> </Section> class="xml-element"></Paper>