XML Viewer - e06-1036

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1036_metho.xml
Size: 29,131 bytes
Last Modified: 2025-10-06 14:10:03
<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1036">
  <Title>Recognizing Textual Parallelisms with edit distance and similarity degree</Title>
  <Section position="3" start_page="0" end_page="281" type="metho">
    <SectionTitle>
2 Textual parallelism
</SectionTitle>
    <Paragraph position="0"> Our notion of parallelism is based on similarities between syntagmatic and paradigmatic representations of (constituents of) textual units. These similarities concern various dimensions from shallow to deeper description: layout, typography, morphology, lexicon, syntax, and semantics. This account is not limited to the semantic dimension as defined by (Hobbs and Kehler, 1997) who consider text fragments as parallel if the same predicate can be inferred from them with coreferential or similar pairs of arguments.</Paragraph>
    <Paragraph position="1">  Weobserve parallelism atvarious structural levels of text: among heading structures, VP ellipses and others, enumerations of noun phrases in a sentence, enumerations with or without markers such as frame introducers (e.g. &amp;quot;In France, ...In Italy, ...&amp;quot;) or typographical and layout markers. The underlying assumption is that parallelism between some textual units accounts for a rhetorical coordination relation. It means that these units can be regarded as equally important.</Paragraph>
    <Paragraph position="2"> By describing textual units in a two-tier framework composed of a paradigmatic level and syntagmatic level, we argue that, depending on the description granularity we consider (potentially at the character level for item numbering), we can detect a wide variety of parallelism phenomena.</Paragraph>
    <Paragraph position="3"> Among parallelism properties, we note that the parallelism of a given number of textual units is based on the parallelism of their constituents. We also note that certain semantic classes of constituents, such as item numbering, are more effective in marking parallelism than others.</Paragraph>
    <Section position="1" start_page="281" end_page="281" type="sub_section">
      <SectionTitle>
2.1 An example of parallelism
</SectionTitle>
      <Paragraph position="0"> The following example is extracted from our corpus (see section 4.1). In this case, we have an enumeration without explicit markers.</Paragraph>
      <Paragraph position="1"> For the purposes of chaining, each type of link between WordNet synsets is assigned a direction of up, down, or horizontal.</Paragraph>
      <Paragraph position="2"> Upward links correspond to generalization: for example, an upward link from apple to fruit indicates that fruit is more general than apple.</Paragraph>
      <Paragraph position="3"> Downward links correspond to specialization: forexample, alinkfromfruittoapplewould have a downward direction.</Paragraph>
      <Paragraph position="4"> Horizontal links are very specific specializations. The parallelism pattern of the first two items is described as follows: [JJ + suff =ward] links correspond to [NN + suff = alization] : for example , X link from Y to Z . This pattern indicates that several item constituents can be concerned by parallelism and that similarities can be observed at the typographic, lexical and syntactic description levels. Tokens (words or punctuation marks) having identical shallow descriptions are written in italics. The X, Y and Z variables stand for matching any non-parallel textareas between contiguous parallel textual units. Some words are parallel based on their syntactic category (&amp;quot;JJ&amp;quot; / adjectives, &amp;quot;NN&amp;quot; / nouns) or suffix specifications (&amp;quot;suff&amp;quot; attribute). The third item is similar to the first two items but with a simpler pattern: JJ links U [NN + suff =alization] W .</Paragraph>
      <Paragraph position="5"> Parallelism is distinguished by these types of similarities between sentences.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="281" end_page="284" type="metho">
    <SectionTitle>
3 Methods
</SectionTitle>
    <Paragraph position="0"> Three methods were used in this study. Given a pair of sentences, they all produce a score of similarity between these sentences. We first present the preprocessing to be performed on the texts.</Paragraph>
    <Section position="1" start_page="281" end_page="281" type="sub_section">
      <SectionTitle>
3.1 Prior processing applied on the texts
</SectionTitle>
      <Paragraph position="0"> The texts were automatically cut into sentences.</Paragraph>
      <Paragraph position="1"> The first two steps hereinafter have been applied for all the methods. The last third was not applied for the tree editing distance (see 3.3). Punctuation marks and syntactic labels were henceforward considered as words.</Paragraph>
      <Paragraph position="2">  1. Text homogenization: lemmatization together with a semantic standardization. Lexical chains are built using WordNet relations, then words are replaced by their most representative synonym: Horizontal links are specific specializations.</Paragraph>
      <Paragraph position="3"> horizontal connection be specific specialization . 2. Syntactic analysis by (Charniak, 1997)'s parser:</Paragraph>
    </Section>
    <Section position="2" start_page="281" end_page="282" type="sub_section">
      <SectionTitle>
3.2 Wagner &amp; Fischer's string edit distance
</SectionTitle>
      <Paragraph position="0"> This method is based on Wagner &amp; Fischer's string edit distance algorithm (Wagner and Fischer, 1974), applied to sentences viewed as strings of words. It computes a sentence edit distance, using edit operations on these elementary entities.</Paragraph>
      <Paragraph position="1"> The idea is to use edit operations to transform sentence S1 into S2. Similarly to(Wagner and Fischer, 1974), we considered three edit operations:  operations, and the distance between S1 and S2 is the cost of the least cost transformation of S1 into S2. Wagner &amp; Fischer's method provides a simple and effective way (O(|S1||S2|)) to compute it. To reduce size effects, we normalized by |S1|+|S2|2 .</Paragraph>
    </Section>
    <Section position="3" start_page="282" end_page="284" type="sub_section">
      <SectionTitle>
3.3 Zhang &amp; Shasha's algorithm
</SectionTitle>
      <Paragraph position="0"> Zhang &amp; Shasha's method (Zhang and Shasha, 1989; Dulucq and Tichit, 2003) generalizes Wagner &amp; Fischer's edit distance to trees: given two trees T1 and T2, it computes the least-cost sequence of edit operations that transforms T1 into T2. Elementary operations have unitary costs and apply to nodes (labels and words in the syntactic trees). These operations are depicted below: substitution of node c by node g (top figure), insertion of node d (middle fig.), and deletion of node d (bottom fig.), each read from left to right.</Paragraph>
      <Paragraph position="1"> Tree edit distance d(T1,T2) is determined after aseries ofintermediate calculations involving special subtrees of T1 and T2, rooted in keyroots.</Paragraph>
      <Paragraph position="2"> 3.3.1 Keyroots, special subtrees and forests Given a certain node x, L(x) denotes its left-most leaf descendant. L is an equivalence relation overnodes and keyroots (KR)are bydefinition the equivalence relation representatives of highest postfix index. Special subtrees (SST) are the subtrees rooted in these keyroots. Consider a tree T postfix indexed (left figure hereinafter) and its three SSTs (right figure).</Paragraph>
      <Paragraph position="4"> T[1,2,3] is the subtree containing nodes a, b, d.</Paragraph>
      <Paragraph position="5"> A forest of SST(k1) is defined as: T[L(k1),L(k1) + 1,...,x], where x is a node of SST(k1). E.g: SST(3) has 3 forests : T[1] (node a), T[1,2] (nodes a and b) and itself. Forests are ordered sequences of subtrees.</Paragraph>
      <Paragraph position="6"> 3.3.2 An idea of how it works The algorithm computes the distance between all pairs of SSTs taken in T1 and T2, rooted in increasingly-indexed keyroots. In the end, the last SSTs being the full trees, we have d(T1,T2).</Paragraph>
      <Paragraph position="7"> In the main routine, an N1 x N2 array called TREEDIST is progressively filled with values TREEDIST(i,j)equal to the distance between the subtree rooted in T1's ith node and the subtree rooted in T2's jth node. The bottom right-hand cell of TREEDIST is therefore equal to d(T1,T2). Each step of the algorithm determines the edit distance between two SSTs rooted in keyroots (k1,k2) [?] (T1 x T2). An array FDIST is initialized for this step and contains as many lines and columns as the two given SSTs have nodes.</Paragraph>
      <Paragraph position="8"> The array is progressively filled with the distances between increasing forests of these SSTs, similarly to Wagner &amp; Fischer's method. The bottom right-hand value of FDIST contains the distance between the SSTs, which is then stored in TREEDIST in the appropriate cell. Calculations in FDIST and TREEDIST rely on the double recurrence formula depicted below: The first formula is used to compute the distance between two forests (a white one and a black one), each of which is composed of several trees.</Paragraph>
      <Paragraph position="9"> The small circles stand for the nodes of highest postfix index. Distance between two forests is definedastheminimumcostoperation between three possibilities: replacing the rightmost white tree by the rightmost black tree, deleting the white node, or inserting the black node.</Paragraph>
      <Paragraph position="10"> Thesecond formula isanalogous tothefirstone, in the special case where the forests are reduced to a single tree. The distance is defined as the minimum cost operation between: replacing the white node with the black node, deleting the white node, or inserting the black node.</Paragraph>
      <Paragraph position="11">  It is important to notice that the first formula takes the left context of the considered subtrees into account3: ancestor and left sibling orders are preserved. It is not possible to replace the white node with the black node directly, the whole sub-tree rooted in the white node has to be replaced. The good thing is, the cost of this operation has already been computed and stored in TREEDIST.</Paragraph>
      <Paragraph position="12"> Let's see why all the computations required at a given step of the recurrence formula have already been calculated. Let two SSTs of T1 and T2 be rooted in pos1 and pos2. Considering the symmetry of the problem, let's only consider what happens with T1. When filling FDIST(pos1,pos2), all nodes belonging to SST(pos1) are run through, according to increasing postfix indexes. Consider x [?] T[L(pos1),...,pos1]: If L(x) = L(pos1), then x belongs to the left-most branch of T[L(pos1),...,pos1] and forest T[L(pos1),...,x] is reduced to a single tree. By construction, allFDIST(T[L(pos1),...,y],[?])for y [?] x [?] 1 have already been computed. If things are the same for the current node in SST(pos2), then TREEDIST(T[L(pos1),...,x],[?]) can be calculated directly, using the appropriate FDIST values previously computed.</Paragraph>
      <Paragraph position="13"> If L(x) negationslash= L(pos1), then x does not belong to the leftmost branch of T[L(pos1),...,pos1] and therefore x has a non-empty left context T[L(pos1),...,L(x)[?]1]. Let's see why computing FDIST(T[L(pos1),...,x],[?]) requires values which have been previously obtained.</Paragraph>
      <Paragraph position="14">  * If x is a keyroot, since the algorithm runs through keyroots by increasing order, TREEDIST(T[L(x),...,x],[?]) has already been computed.</Paragraph>
      <Paragraph position="15"> * If x is not a keyroot, then there exists a node z such that : x &lt; z &lt; pos1, z is a keyroot and L(z) = L(x). Therefore x belongs to the leftmost branch of T[L(z),...,z], which means TREEDIST(T[L(z),...,x],[?]) has already been computed.</Paragraph>
      <Paragraph position="16">  This final method computes a degree of similarity between two sentences, considered as lists of syntactic (labels) and lexical (words) constituents. Becausesomeconstituents aremorelikely toindicate parallelism than others (e.g: the list item marker is more pertinent than the determiner &amp;quot;a&amp;quot;), a crescent weight function p(x) [?] [0,1] w.r.t. pertinence is assigned to all lexical and syntactic constituents x. A set of special subsentences is then generated: the greatest common divisor of S1 and S2, gcd(S1,S2), is defined as the longest list of words common to S1 and S2. Then for each sentence Si, the set of special subsentences is computed using the words of gcd(S1,S2) according to their order of appearance in Si. For example, if S1 = cabcad and S2 = acbae, gcd(S1,S2) = {c,a,b,a}. The set of subsentences for S1 is {caba,abca} and the set for S2 is reduced to {acba}. Note that any generated sub-sentence is exactly the size of gcd(S1,S2).</Paragraph>
      <Paragraph position="17"> For any two subsentences s1 and s2, we define a degree of similarity D(s1,s2), inspired from string edit distances:</Paragraph>
      <Paragraph position="19"> &gt;&gt;: n size of all subsentences xi ith constituent of s1 dmax max possible dist. between any xi [?] s1 and its parallel constituent in s2, i.e. dmax = n[?]1 d(xi) distance between current constituent xi in s1 and its parallel constituent in s2 p(xi) parallelism weight of xi The further a constituent from s1 is from its symmetric occurrence in s2, the more similar the compared subsentences are. Eventually, the degree of similarity between sentences S1 and S2 is defined as:</Paragraph>
      <Paragraph position="21"> with their subsentences s1 = caba and sprime1 = abca for S1, and s2 = acba for S2. The degrees of parallelism between s1 and s2, and between sprime1 and s2 are computed. The mapping between the parallel constituents is shown below.</Paragraph>
      <Paragraph position="23"/>
    </Section>
  </Section>
  <Section position="5" start_page="284" end_page="286" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> This section describes the methodology employed to evaluate performances. Then, after a preliminary study ofour corpus, results arepresented successively for each method. Finally, the behavior of the methods is analyzed at sentence level.</Paragraph>
    <Section position="1" start_page="284" end_page="284" type="sub_section">
      <SectionTitle>
4.1 Methodology
</SectionTitle>
      <Paragraph position="0"> Our parallelism detection is an unsupervised clustering application: given a set of pairs of sentences, it automatically classifies them into the class of the parallelisms and the remainders class. Pairs were extracted from 5 scientific articles written in English, each containing about 200 sentences: Green (ACL'98), Kan (Kan et al. WVLC'98), Mitkov (Coling-ACL'98), Oakes (IRSG'99) and Sand (Sanderson et al. SIGIR'99).</Paragraph>
      <Paragraph position="1"> The idea was to compute for each pair a parallelism score indicating the similarity between the sentences. Then the choice of a threshold determined which pairs showed a score high enough to be classified as parallel.</Paragraph>
      <Paragraph position="2"> Evaluation was based on a manual annotation we proceeded over the texts. In order to reduce computational complexity, we only considered the parallelism occurring between consecutive sentences. For each sentence, we indicated the index of its parallel sentence. We assumed transitivity of parallelism : if S1//S2 and S2//S3, then S1//S3.</Paragraph>
      <Paragraph position="3"> It was thus considered sufficient to indicate the index of S1 for S2 and the index of S2 for S3 to account for a parallelism between S1, S2 and S3.</Paragraph>
      <Paragraph position="4"> We annotated pairs of sentences where textual parallelism led us to rhetorically coordinate them.</Paragraph>
      <Paragraph position="5"> The decision was sometimes hard to make. Yet we annotated it each time to get more data and to study the behavior of the methods on these examples, possibly penalizing our applications. In the end, 103 pairs were annotated.</Paragraph>
      <Paragraph position="6"> We used the notions of precision (correctness) and recall (completeness). Because efforts in improving one often result in degrading the other, the F-measure (harmonic mean) combines them into a unique parameter, which simplifies comparisons of results. Let P be the set of the annotated parallelisms and Q the set of the pairs automatically classified in the parallelisms after the use of athreshold. Then the associated precision p, recall r and F-measure f are defined as:</Paragraph>
      <Paragraph position="8"> As we said, the unique task of the implemented methods was to assign parallelism scores to pairs of sentences, which are collected in a list. We manually applied various thresholds to the list and computed their corresponding F-measure. We keptasaperformance indicator thebestF-measure found. This was performed for each method and on each text, as well as on the texts all gathered together.</Paragraph>
    </Section>
    <Section position="2" start_page="284" end_page="284" type="sub_section">
      <SectionTitle>
4.2 Preliminary corpus study
</SectionTitle>
      <Paragraph position="0"> This paragraph underlines some of the characteristics of the corpus, in particular the distribution of the annotated parallelisms in the texts for adjacent sentences. The following table gives the percentage of parallelisms for each text:  Greenand Oakes show significantly moreparallelisms than the other texts. Therefore, if we consider a lazy method that would put all pairs in the class of parallelisms, Green and Oakes will yield a priori better results. Precision is indeed directly related tothepercentage ofparallelisms inthetext.</Paragraph>
      <Paragraph position="1"> In this case, it is exactly this percentage, and it gives us a minimum value of the F-measure our methods should at least reach:</Paragraph>
    </Section>
    <Section position="3" start_page="284" end_page="285" type="sub_section">
      <SectionTitle>
4.3 A baseline: counting words in common
</SectionTitle>
      <Paragraph position="0"> We first present the results of a very simple and thus very fast method. This baseline counts the  words sentences S1 and S2 have in common, and normalizes the result by |S1|+|S2|2 in order to reduce size effects. No syntactic analysis nor lexical homogenization was performed on the texts.</Paragraph>
      <Paragraph position="1"> Results for this method are summarized in the following table. The last column shows the loss (%) in F-measure after applying a generic threshold (the optimal threshold found when all texts are gathered together) on each text.</Paragraph>
      <Paragraph position="2"> F-meas. Prec. Recall Thres. Loss  We first note that results are twice as good as with the lazy approach, with Green and Oakes far above the rest. Yet this is not sufficient for a real application. Furthermore, the optimal threshold is very different from one text to another, which makes the learning of a generic threshold able to detect parallelisms for any text impossible. The only advantage here is the simplicity of the method: no prior treatment was performed on the texts before the search, and the counting itself was very fast.</Paragraph>
    </Section>
    <Section position="4" start_page="285" end_page="285" type="sub_section">
      <SectionTitle>
4.4 String edit distance
</SectionTitle>
      <Paragraph position="0"> We present the results for the 1st method below: F-meas. Prec. Recall Thres. Loss  Green and Oakes still yield the best results, but the other texts have almost doubled theirs. Results for Oakes are especially good: an F-measure of 82% guaranties high precision and recall.</Paragraph>
      <Paragraph position="1"> In addition, the use of a generic threshold on each text had little influence on the value of the F-measure. The greatest loss is for Sand and only corresponds to the adjunction of four pairs of sentences intheclass ofparallelisms. Theselection of a unique generic threshold to predict parallelisms should therefore be possible.</Paragraph>
    </Section>
    <Section position="5" start_page="285" end_page="285" type="sub_section">
      <SectionTitle>
4.5 Tree edit distance
</SectionTitle>
      <Paragraph position="0"> The algorithm was applied using unitary edit costs. Since it did not seem natural to establish mappings between different levels of the sentence, edit operations between two constituents of different nature (e.g: substitution of a lexical by a syntactic element) were forbidden by a prohibitive cost (1000). However, this banning only improved the results shyly, unfortunately.</Paragraph>
      <Paragraph position="1"> F-meas. Prec. Recall Thres. Loss  As illustrated in the table above, results are comparable to those previously found. We note an especially good F-measure for Sand: 52%, against 47% for the string edit distance. Optimal thresholds were quite similar from one text to another.</Paragraph>
    </Section>
    <Section position="6" start_page="285" end_page="286" type="sub_section">
      <SectionTitle>
4.6 Degree of similarity
</SectionTitle>
      <Paragraph position="0"> Because of the high complexity of this method, a heuristic was applied. The generation of the sub-sentences is indeed inproducttextCkini, ki being the number of occurrences of the constituent xi in gcd, and ni the number of xi in the sentence. We chose to limit the generation to a fixed amount of subsentences. The constituents that have a great Ckini bring too much complexity: we chose to eliminate their (ni [?] ki) last occurrences and to keep their ki firstoccurrences only togenerate subsequences.</Paragraph>
      <Paragraph position="1"> An experiment was conducted in order to determine the maximum amount of subsentences that could be generated in a reasonable amount of time without significant performance loss and 30 was a sufficient number. In another experiment, different parallelism weights were assigned to lexical constituents and syntactic labels. The aim was to understand their relative importance for parallelisms detection. Results show that lexical constituents have a significant role, but conclusions are more difficult to draw for syntactic labels. It was decided that, from now on, the lexicalweight should begiven the maximum value, 1.</Paragraph>
      <Paragraph position="2"> Finally, we assigned different weights to the syntactic labels. Weights were chosen after counting the occurrences of the labels in the corpus. In fact, we counted for each label the percentage of occurrences that appeared in the gcd of the parallelisms with respect to those appearing in the gcd of the other pairs. Percentages were then rescaled from 0 to 1, in order to emphasize differences  between labels. The obtained parallelism values measured the role of the labels in the detection of parallelism. Results for this experiment appear in the table below.</Paragraph>
      <Paragraph position="3"> F-meas. Prec. Recall Thres. Loss  those obtained in 4.4 and the corresponding thresholds were similar from one text to another.</Paragraph>
      <Paragraph position="4"> This section showed how the three proposed methods outperformed the baseline. Each of them yielded comparable results.</Paragraph>
      <Paragraph position="5"> The next section presents the results at sentence level, together with a comparison of these three methods.</Paragraph>
    </Section>
    <Section position="7" start_page="286" end_page="286" type="sub_section">
      <SectionTitle>
4.7 Analysis at sentence level
</SectionTitle>
      <Paragraph position="0"> The different methods often agreed but sometimes reacted quite differently.</Paragraph>
      <Paragraph position="1"> Well retrieved parallelisms Some parallelisms were found by each method with no difficulty: they were given a high degree of parallelism by each method. Typically, such sentences presented a strong lexical and syntactic similarity, as in the example in section 2.</Paragraph>
      <Paragraph position="2"> Parallelisms hard to find Other parallelisms received very low scores from each method. This happened when the annotated parallelism was lexically and syntactically poor and needed either contextual information or external semantic knowledge to find keywords (e.g: &amp;quot;first&amp;quot;, &amp;quot;second&amp;quot;, ...), paraphrases or patterns (e.g: &amp;quot;X:Y&amp;quot;inthefollowing example(Kan)): Rear: a paragraph in which a link just stopped occurring the paragraph before.</Paragraph>
      <Paragraph position="3"> No link: any remaining paragraphs.</Paragraph>
      <Paragraph position="4"> Different methods, different results Eventually, we present some parallelisms that obtained very different scores, depending on the method.</Paragraph>
      <Paragraph position="5"> First, it seems that a different ordering of the parallel constituents in the sentences alter the performances of the edit distance algorithms (3.2; 3.3). The following example (Green) received a low score with both methods: When we consider AnsV as our dependent variable, the model for the High Web group is still not significant, and there is still a high probability that the coefficient of LI is 0.</Paragraph>
      <Paragraph position="6"> For our Low Web group, who followed significantly more intra-article links than the High Web group, the model that results is significant and has the following equation: &lt;EQN/&gt;.</Paragraph>
      <Paragraph position="7"> This is due to the fact that both algorithms do not allow the inversion of two constituents and thus are unable to find all the links from the first sentence to the other. The parallelism measure is robust to inversion.</Paragraph>
      <Paragraph position="8"> Sometimes, the syntactic parser gave different analyses for the same expression, which made mapping between thesentences containing this expression more difficult, especially for the tree edit distance. The syntactic structure has less importance for the other methods, which are thus more insensitive to an incorrect analysis.</Paragraph>
      <Paragraph position="9"> Finally, the parallelism measure seems more adapted to a diffuse distribution of the parallel constituents in the sentences, whereas edit distances seem more appropriate when parallel constituents are concentrated in a certain part of the sentences, in similar syntactic structures. The following example (Green) obtained very high scores with the edit distances only: Strong relations are also said to exist between words that have synsets connected by a single horizontal link or words that have synsets connected by a single IS-A or INCLUDES relation.</Paragraph>
      <Paragraph position="10"> A regular relation is said to exist between two words when there is at least one allowable path between a synset containing the first word and a synset containing the second word in the Word-Net database.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="286" end_page="287" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> Experimental work in psycholinguistics has shown the importance of the parallelism effect in human language processing. Due to some kind of priming (syntactic, phonetic, lexical, etc.), the comprehension and the production of a parallel utterance is made faster (Dubey et al., 2005).</Paragraph>
    <Paragraph position="1"> So far, most of the works were led in order to acquire resources and to build systems to retrieve specific parallelism phenomena. In the field of information structure theories, (Kruijff-Korbayov'a and Kruijff, 1996) implemented an ad-hoc system  to identify thematic continuity (lexical relation between the subject parts of consecutive sentences). (Luc et al., 1999) described and classified markers (lexical clues, layout and typography) occurring in enumeration structures. (Summers, 1998) also described the markers required for retrieving heading structures. (Charolles, 1997) was involved in the description of frame introducers.</Paragraph>
    <Paragraph position="2"> Integration of specialized resources dedicated to parallelism detection could be an improvement to our approach. Let us not forget that our final aim remains the detection of discourse structures. Parallelism should be considered as an additional feature which among other discourse features (e.g. connectors).</Paragraph>
    <Paragraph position="3"> Regarding the use of parallelism, (Hernandez and Grau, 2005) proposed an algorithm to parse the discourse structure and to select pairs of sentences to compare.</Paragraph>
    <Paragraph position="4"> Confronted to the problem of determining textual entailment4 (the fact that the meaning of one expression can be inferred from another) (Kouylekov and Magnini, 2005) applied the (Zhang and Shasha, 1989)'s algorithm on the dependency trees of pairs of sentences (they did not consider syntactic tags as nodes but only words).</Paragraph>
    <Paragraph position="5"> They encountered problems similar to ours due to pre-treatment limits. Indeed, the syntactic parser sometimes represents in a different way occurrences of similar expressions, making it harder to apply edit transformations. A drawback concerning the tree-edit distance approach is that it is not able to observe the whole tree, but only the subtree of the processed node.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML