XML Viewer - w06-0205

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0205_metho.xml
Size: 28,241 bytes
Last Modified: 2025-10-06 14:10:33
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0205">
  <Title>Automatic Knowledge Representation using a Graph-based Algorithm for Language-Independent Lexical Chaining</Title>
  <Section position="4" start_page="36" end_page="38" type="metho">
    <SectionTitle>
2 Building a Similarity Matrix
</SectionTitle>
    <Paragraph position="0"> In order to build the lexico-semantic knowledge base, the Pole-Based Overlapping Clustering Algorithm needs as input a similarity matrix that gathers the similarities between all the words in the corpus.</Paragraph>
    <Paragraph position="1"> For that purpose, we propose a contextual analysis of each nominal unit (nouns and compound nouns) in the corpus. In particular, each nominal unit is associated to a word context vector and the similarity between nominal units is calculated by the informative similarity measure proposed by (Dias and Alves, 2005).</Paragraph>
    <Section position="1" start_page="36" end_page="37" type="sub_section">
      <SectionTitle>
2.1 Data Preparation
</SectionTitle>
      <Paragraph position="0"> The context corpus is first pre-processed in order to extract nominal units from it. The TnT tagger (Brants, 2000) is first applied to our context corpus to morpho-syntactically mark all the words in it. Once all words have been morpho-syntactically tagged, we apply the statistically-based multiword unit extractor SENTA (Dias et al., 1999) that extracts multiword units based on any input text3. For example, multiword units are compound nouns (free kick), compound determinants (an amount of), verbal locutions (to put forward), adjectival locutions (dark blue) or institutionalized phrases (con carne).</Paragraph>
      <Paragraph position="1"> Finally, we use a set of well-known heuristics (Daille, 1995) to retrieve compound nouns using the idea that groups of words that correspond to a priori defined syntactical patterns such as Adj+Noun, Noun+Noun, Noun+Prep+Noun can be identified as compound nouns. Indeed, nouns usually convey most of the information in a written text. They are the main contributors to the &amp;quot;aboutness&amp;quot; of a text. For example, free kick, city hall, operating system are compound nouns which sense is not compositional i.e. the sense of the multiword unit can 2Of course, other similarity measures (Resnik, 1995; Jiang and Conrath, 1997; Leacock and Chodorow, 1998) could be implemented and should be evaluated in further work. However, we used (Lin, 1998) similarity measure as it has shown improved results for Lexical Chains construction.</Paragraph>
      <Paragraph position="2"> 3By choosing both the TnT tagger and the multiword unit extractor SENTA, we guarantee that our architecture remains as language-independent as possible.</Paragraph>
      <Paragraph position="3">  not be expressed by the sum of its constituents senses. So, identifying lexico-semantic connections between nouns is an adequate means of determining cohesive ties between textual units4.</Paragraph>
    </Section>
    <Section position="2" start_page="37" end_page="37" type="sub_section">
      <SectionTitle>
2.2 Word Context Vectors
</SectionTitle>
      <Paragraph position="0">  Thesimilaritymatrixisamatrixwhereeachcellcorresponds to a similarity value between two nominal units5. In this paper, we propose a contextual analysis of nominal units based on similarity between word context vectors.</Paragraph>
      <Paragraph position="1"> Word context vectors are an automated method for representing information based on the local context of words in texts. So, for each nominal unit in the corpus, we associate an N-dimension vector consisting of its N most related words6.</Paragraph>
      <Paragraph position="2"> In order to find the most relevant co-occurrent nominal units, we implement the Symmetric Conditional Probability (Silva et al., 1999) which is defined in Equation 1 where p(w1,w2), p(w1) and p(w2) are respectively the probability of co-occurrence of the nominal units w1 and w2 and the marginal probabilities of w1 and w2.</Paragraph>
      <Paragraph position="4"> In particular, the window context for the calculation of co-occurrence probabilities is settled to F=20 words. In fact, we count, in all the texts of the corpus, the number of occurrences of w1 and w2 appearing together in a window context of F [?] 2 words. So, p(w1,w2) represents the density function computed as follows: the number of times w1 and w2 co-occur divided by the number of words in the corpus7. In the present work, the values of the SCP(.,.) are not used as a factor of importance between words in the word context vector i.e. no differentiation is made in terms of relevance between the words within the word context vector. This issue will be tackled in future work8.</Paragraph>
      <Paragraph position="5"> 4However, we acknowledge that verbs and adjectives should also be tackled in future work.</Paragraph>
      <Paragraph position="6">  as when they are identified (e.g. President of the United States), they are re-written in the corpus by linking all single words with an underscore (e.g. President of the United States)</Paragraph>
    </Section>
    <Section position="3" start_page="37" end_page="38" type="sub_section">
      <SectionTitle>
2.3 Similarity between Context Vectors
</SectionTitle>
      <Paragraph position="0"> The closeness of vectors in the space is equivalent to the closeness of the subject content. Thus, nominal unitsthatareusedinasimilarlocalcontextwillhave vectors that are relatively close to each other. However, in order to define similarities between vectors, we must transform each word context vector into a high dimensional vector consisting of real-valued components. As a consequence, each co-occurring word of the word context vector is associated to a weight which evaluates its importance in the corpus.</Paragraph>
      <Paragraph position="1">  The weighting score of any word in a document can be directly derived from an adaptation of the score proposed in (Dias and Alves, 2005). In particular, we consider the combination of two main heuristics: the well-known tf.idf measure proposed by (Salton et al., 1975) and a new density measure (Dias and Alves, 2005).</Paragraph>
      <Paragraph position="2"> tf.idf: Given a word w and a document d, the tf.idf(w,d) is defined in Equation 2 wheretf(w,d) is the number of occurrences of w in d, |d |corresponds to the number of words in d, N is the number of documents in the corpus and df(w) stands for the number of documents in the corpus in which the word w occurs.</Paragraph>
      <Paragraph position="4"> (2) density: The basic idea of the word density measure is to evaluate the dispersion of a word within a document. So, very disperse words will not be as relevant as dense words. This density measure dens(.,.) is defined in Equation 3.</Paragraph>
      <Paragraph position="6"> For any given word w, its density dens(w,d) is calculated from all the distances between all its occurrences in document d, tf(w,d). So, dist(o(w,k),o(w,k+1)) calculates the distance that separates two consecutive occurrences of w in terms of words within the document. In particular, e is the obtained by the Symmetric Conditional Probability measure compared to the Pointwise Mutual Information for instance (Cleuziou et al., 2003)  base of the natural logarithm so that ln(e) = 1. This argument is included into Equation 3 as it will give a density value of 1 for any word that only occurs once in the document. In fact, we give this word a high density value.</Paragraph>
      <Paragraph position="7"> final weight: The weighting score weight(w) of any word w in the corpus can be directly derived from the previous two heuristics. This score is defined in Equation 4 where tf and dens are respectively the average of tf(.,.) and dens(.,.) over all the documents in which the word w occurs i.e. Nw.</Paragraph>
      <Paragraph position="9"> The next step aims at determining the similarity between all nominal units. Theoretically, a similarity measure can be defined as follows. Suppose that Xi = (Xi1,Xi2,Xi3,,Xip) is a row vector of observations on p variables associated with a label i.</Paragraph>
      <Paragraph position="10"> The similarity between two words i and j is defined as Sij = f(Xi,Xj) where f is some function of the observed values. In the context of our work, Xi and Xj are 10-dimension word context vectors.</Paragraph>
      <Paragraph position="11"> In order to avoid the lexical repetition problem of similarity measures, (Dias and Alves, 2005) have proposed an informative similarity measure called infoSimBA, which basic idea is to integrate into the Cosine measure, the word co-occurrence factor inferred from a collection of documents with the Symmetric Conditional Probability (Silva et al., 1999). See Equation 5.</Paragraph>
      <Paragraph position="13"> and any Xzv corresponds to the word weighting factor weight(wzv), SCP(wik,wjl) is the Symmetric ConditionalProbabilityvaluebetweenwik, theword that indexes the word context vector i at position k and wjl, the word that indexes the word context vector j at position l.</Paragraph>
      <Paragraph position="14"> In particular, this similarity measure has proved to lead to better results compared to the classical similarity measure (Cosine) and shares the same idea as  the Latent Semantic Analysis (LSA) but in a different manner. Let's consider the following two sentences. null (1) Ronaldo defeated the goalkeeper once more.</Paragraph>
      <Paragraph position="15"> (2) Real_Madrid_striker scored again.</Paragraph>
      <Paragraph position="16">  It is clear that both sentences (1) and (2) are similar although they do not share any word in common. Such a situation would result in a null Cosine value so evidencing no relationship between (1) and (2). To solve this problem, the InfoSimBA(.,.) function would calculate for each word in sentence (2), the product of its weight with each weight of all the words in sentence (1), and would then multiply this product by the degree of cohesiveness existing between those two words calculated by the Symmetric Conditional Probability measure. For example, Real Madrid striker would give rise to the sum of 6 products i.e. Real Madrid striker with Ronaldo, Real Madrid striker with defeated and so on and so forth. As a consequence, sentence (1) and (2) wouldshowahighsimilarityasReal Madrid striker is highly related to Ronaldo.</Paragraph>
      <Paragraph position="17"> Once the similarity matrix is built based on the infoSimBA between all word context vectors of all nominal units in the corpus, we give it as input to the Pole-Based Overlapping Clustering Algorithm (Cleuziou et al., 2004) to build a hierarchy of concepts i.e. our lexico-semantic knowledge base.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="38" end_page="41" type="metho">
    <SectionTitle>
3 Hierarchy of Concepts
</SectionTitle>
    <Paragraph position="0"> Clustering is the task that structures units in such a way it reflects the semantic relations existing betweenthem. Inourframeworknominalunitsarefirst grouped into overlapping clusters (or soft-clusters) such that final clusters correspond to conceptual classes (called &amp;quot;concepts&amp;quot; in the following). Then, concepts are hierarchically structured in order to capture semantic links between them.</Paragraph>
    <Paragraph position="1"> Many clustering methods have been proposed in the data analysis research fields. Few of them propose overlapping clusters as output, in spite of the interest it represents for domains of application  such as Natural Language Processing or Bioinformatics. PoBOC (Pole-Based Overlapping Clustering) (Cleuziou et al., 2004) and CBC (Clustering By Committees) (Pantel and Lin, 2002) are two clustering algorithms suitable for the word clustering task. They both proceed by first constructing tight clusters9 and then assigning residual objects to their most similar tight clusters.</Paragraph>
    <Paragraph position="2"> A recent comparative study (Cicurel et al., 2006) shows that CBC and PoBOC both lead to relevant results for the task of word clustering. Nevertheless CBC requires parameters hard to tune whereas PoBOC is free of any parametrization. The last argument encouraged us to use the PoBOC algorithm.</Paragraph>
    <Paragraph position="3"> Unlike most of commonly used clustering algorithms, the Pole-Based Overlapping Clustering Algorithm shows the following advantages among others : (1) it requires no parameters i.e. input is restricted to a single similarity matrix, (2) the number of final clusters is automatically found and (3) it provides overlapping clusters allowing to take into account the different possible meanings of lexical units.</Paragraph>
    <Section position="1" start_page="39" end_page="40" type="sub_section">
      <SectionTitle>
3.1 A Graph-based Approach
The Pole-Based Overlapping Clustering Algorithm
</SectionTitle>
      <Paragraph position="0"> is based on a graph-theoretical framework. Graph formalism is often used in the context of clustering (graph-clustering). It first consists in defining a graph structure which illustrates the data (vertices) with links (edges) between them and then in proposing a graph-partitioning process.</Paragraph>
      <Paragraph position="1"> Numerous graph structures have been proposed (Estivill-Castro et al., 2001). They all consider the data set as set of vertices but differ on the way to decide that two vertices are connected. Some methodologies are listed below where V is the set of vertices, E the set of edges, G(V,E) a graph and d a  distance measure: * Nearest Neighbor Graph (NNG) : each vertex is connected to its nearest neighbor, * Minimum Spanning Tree (MST) : [?](xi,xj) [?] V xV a path exists between xi and xj in G witha80 (xi,xj)[?]E d(xi,xj) minimized, 9The tight clusters are called &amp;quot;committees&amp;quot; in CBC and &amp;quot;poles&amp;quot; in PoBOC.</Paragraph>
      <Paragraph position="2"> * Relative Neighborhood Graph (RNG) : xi and xj are connected iff [?]xk [?] V \ {xi,xj}, d(xi,xj) [?] max{d(xi,xk),d(xj,xk)} * Gabriel Graph (GG) : xi and xj are connected iff the circle with diameter xixj is empty, * Delaunay Triangulation (DT) : xi and xj are connected iff the associated Voronoi cells are adjacent.</Paragraph>
      <Paragraph position="3">  In particular, an inclusion order exists on these graphs. One can show that NNG [?] MST [?] RNG [?] GG [?] DT.</Paragraph>
      <Paragraph position="4"> Thechoiceofthesuitablegraphstructuredepends on the expressiveness we want an edge to capture and the partitioning process we plan to perform. The  at retrieving dense subsets in a graph where two similar data are connected and two dissimilar ones are disconnected. Noticing that previous structures do not match with this definition of a proximitygraph10, a new variant is proposed with the Pole-Based Overlapping Clustering Algorithm in definition 3.1.</Paragraph>
      <Paragraph position="5"> Definition 3.1 Given a similarity measure s on a data set X, the graph (denoted Gs(V,E)) is defined by the set of vertices V = X and the set of edges E such that (xi,xj) [?] E = xi [?] N(xj)[?]xj [?] N(xi).</Paragraph>
      <Paragraph position="7"> where the notation s(xi,I) denotes the average similarity of xi with the set of objects I i.e.</Paragraph>
      <Paragraph position="9"> This definition of neighborhood is a way to avoid requiringtoaparameterthatwouldbetoodependent of the similarity used. Furthermore, the use of local neighborhoods avoids the use of arbitrary thresholds which mask the variations of densities. Indeed, clusters are extracted from a similarity graph which differs from traditional proximity graphs (Jaromczyk and Toussaint, 1992) in the definition of local 10Indeed, for instance, all of these graphs connect an outlier with at least one other vertex. This is not the case with PoBOC.  neighborhoods which condition edges in the graph.</Paragraph>
      <Paragraph position="10"> Neighborhood is different for each object and is computed on the basis of similarities with all other objects. Finally, an edge connects two vertices if they are both contained in the neighborhood of the otherone. Figure1illustratestheneighborhoodconstraint above. In this case, as xi and xj are not both in the intersection, they would not be connected.</Paragraph>
      <Paragraph position="11">  in the intersection.</Paragraph>
    </Section>
    <Section position="2" start_page="40" end_page="40" type="sub_section">
      <SectionTitle>
3.2 Discovery of Poles
</SectionTitle>
      <Paragraph position="0"> The graph representation helps to discover a set of fully-connected subgraphs (cliques) highly separated, denoted as Poles. Because Gs(V,E) is built such that two vertices xi and xj are connected if and only if they are similar11, a clique has the required properties to be a good cluster. Indeed, such a cluster guarantees that all its constituents are similar.</Paragraph>
      <Paragraph position="1"> The search of maximal cliques in a graph is an NP-complete problem. As a consequence, heuristics are used in order to (1) build a great clique around a starting vertex (Bomze et al., 1999) and (2) choose the starting vertices in such a way cliques are as distant as possible.</Paragraph>
      <Paragraph position="2"> Given a starting vertex x, the first heuristic consists in adding iteratively the vertex xi which satisfies the following conditions: * xi is connected to each vertex in P (with P the clique/Pole in construction), * among the connected vertices, xi is the nearest one in average (s(xi,P)).</Paragraph>
      <Paragraph position="3"> 11In the sense that xi (resp. xj) is more similar to xj (resp. xi) than to other data on average.</Paragraph>
      <Paragraph position="4"> As a consequence, initialized with P = {x}, the clique then grows until no vertex can be added.</Paragraph>
      <Paragraph position="5"> The second heuristic guides the selection of the starting vertices in a simple manner. Given a set of Poles P1,...,Pm already extracted, we select the vertex x as in Equation 8.</Paragraph>
      <Paragraph position="7"> A new Pole is then built from x if and only if x satisfies the following conditions:</Paragraph>
      <Paragraph position="9"> Poles are thus extracted while P1 [?] *** [?] Pm negationslash= X and the next starting vertex x is far enough from the previous Poles. In particular, as Poles represent the seeds of the further final clusters, this heuristic gives no restriction on the number of clusters. The first Pole is obtained from the starting point x[?] that checks Equation 9.</Paragraph>
      <Paragraph position="11"/>
    </Section>
    <Section position="3" start_page="40" end_page="41" type="sub_section">
      <SectionTitle>
3.3 Multi-Assignment
</SectionTitle>
      <Paragraph position="0"> Once the Poles are built, the Pole-Based Overlapping Clustering algorithm uses them as clusters representatives. Membership functions m(.,.) are defined in order to assign each object to its nearest Poles as shown in Equation 10.</Paragraph>
      <Paragraph position="1"> [?]xi [?] X, Pj [?] {P1,...,Pm} : m(xi,Pj) = s(xi,Pj) (10) For each object xi to assign, the set of poles is ordered (P1(xi),...,Pm(xi)) such that P1(xi) denotes the nearest pole12 for xi, P2(xi) the second nearest pole forxi and so on. We first assignxi to its closest Pole (P1(xi)). Then, for each pole Pk(xi)(in the order previously defined) we decide to assign xi to Pk(xi) if it satisfies to the following two condi-</Paragraph>
      <Paragraph position="3"> This methodology results into a coverage of the starting data set with overlapping clusters (extended</Paragraph>
      <Paragraph position="5"/>
    </Section>
    <Section position="4" start_page="41" end_page="41" type="sub_section">
      <SectionTitle>
3.4 Hierarchical Organization
</SectionTitle>
      <Paragraph position="0"> A final step consists in organizing the obtained clusters into a hierarchical tree. This structure is useful to catch the topology of a set of a priori disconnected groups. The Pole-Based Overlapping Clustering algorithm integrates this stage and proceeds by successive merging of the two nearest clusters like for usual agglomerative approaches (Sneath and Sokal, 1973). In this process, the similarity between two clusters is obtained by the average-link</Paragraph>
      <Paragraph position="2"> To deal with overlapping clusters we considere in Equation 11 the similarity between an object and itself to be equal to 1 : s(xi,xi) = 1.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="41" end_page="43" type="metho">
    <SectionTitle>
4 Lexical Chaining Algorithm
</SectionTitle>
    <Paragraph position="0"> Once the lexico-semantic knowledge base has been built, it is possible to use it for Lexical Chaining.</Paragraph>
    <Paragraph position="1"> In this section, we propose a new greedy algorithm which can be seen as an extension of (Hirst and St-Onge, 1997) and (Barzilay and Elhadad, 1997) algorithms as it allows polysemous words to belong to different chains thus breaking the &amp;quot;one-word/oneconcept per document&amp;quot; paradigm (Gale et al., 1992). Indeed, multi-topic documents like web news stories may introduce different topics in the same document/url and do not respect the &amp;quot;one sense per discourse&amp;quot; paradigm. As we want to deal with real-world applications, this characteristic may show interesting results for the specific task of Text Summarization for Web documents. Indeed, comparatively to the experiments made by (Gale et al., 1992) that deal with &amp;quot;well written discourse&amp;quot;, web documents show unusual discourse structures. In some way, our algorithm follows the idea of (Krovetz, 1998).</Paragraph>
    <Paragraph position="2"> Finally, it implements (Lin, 1998)'s information-theoretic definition of similarity as the relatedness criterion for the attribution of words to Lexical Chains.</Paragraph>
    <Section position="1" start_page="41" end_page="41" type="sub_section">
      <SectionTitle>
4.1 Algorithm
</SectionTitle>
      <Paragraph position="0"> Our chaining algorithm is based on both approaches of (Barzilay and Elhadad, 1997) and (Hirst and St-Onge, 1997). So, our chaining model is developed according to all possible alternatives of word senses.</Paragraph>
      <Paragraph position="1"> In fact, all senses of a word are defined by the clusters the word appears in13. We present our algorithm below.</Paragraph>
      <Paragraph position="2"> Begin with no chain.</Paragraph>
      <Paragraph position="3"> For all distinct nominal units in text order do For all its senses do a) - among present chains find the sense which satisfies the relatedness criterion and link the new word to this chain.</Paragraph>
      <Paragraph position="4"> - Remove unappropriate senses of the new word and the chain members.</Paragraph>
      <Paragraph position="5"> b)if no sense is close enough, start a new chain.</Paragraph>
    </Section>
    <Section position="2" start_page="41" end_page="43" type="sub_section">
      <SectionTitle>
End For
End For
End
4.2 Assignment of a word to a Lexical Chain
</SectionTitle>
      <Paragraph position="0"> In order to assign a word to a given Lexical Chain, we need to evaluate the degree of relatedness of the given word to the words in the chain. This is done byevaluatingtherelatednessbetweenalltheclusters present in the Lexical Chain and all the clusters in which the word appears.</Paragraph>
      <Paragraph position="1">  In order to determine if two clusters are semantically related, we use our lexico-semantic knowledge base and apply (Lin, 1998)'s measure of semantic similarity defined in Equation 12.</Paragraph>
      <Paragraph position="3"> The computation of Equation 12 is illustrated below using the fragment of WordNet in Figure 2.</Paragraph>
      <Paragraph position="4">  13From now on, for presentation purposes, we will take as synonymous the words clusters and senses  In this case, it would be easy to compute the similarity between the concepts of hill and coast where the number attached to each node C is P(C). It is shown in Equation 13.</Paragraph>
      <Paragraph position="6"> However, in our taxonomy, as in any knowledge base computed by hierarchical clustering algorithms, onlyleavescontainwords. So, upperclusters (i.e. nodes)inthetaxonomygatheralldistinctwords that appear in the clusters they subsume. We present this situation in Figure 3.</Paragraph>
      <Paragraph position="7">  In particular, clusters C305 and C306 of our hierarchical tree, for the domain of Economy, are represented by the following sets of words C305 ={life, effort, stability, steps, negotiations} and C306 ={steps, restructure, corporations, abuse,</Paragraph>
      <Paragraph position="9"> The relatedness criterion is the threshold that needs to be respected in order to assign a word to a Lexical Chain. In fact, it works like a threshold.</Paragraph>
      <Paragraph position="10"> In this case, it is based on the average semantic similarity between all the clusters present in the taxonomy. So, ifallsemanticsimilaritiesbetweenacandidate word cluster Ck and all the clusters in the chain [?]l,Cl respect the relatedness criterion, the word is 14The value 2843 in Figure 3 is the total number of distinct words in our concept hierarchy.</Paragraph>
      <Paragraph position="11"> assigned to the Lexical Chain. This situation is definedinEquation15wherecisaconstanttobetuned null and n is the number of words in the taxonomy. So, if Equation 15 is satisfied, the word w with cluster Ck is agglomerated to the Lexical Chain.</Paragraph>
      <Paragraph position="12">  In the following section, we present an example of our algorithm.</Paragraph>
      <Paragraph position="13"> 4.2.3 Example of the Lexical Chain algorithm The example below illustrates our Lexical Chain algorithm. Let's consider that a node is created for the first nominal unit encountered in the text i.e. crisis with its sense (C31). The next appearing candidate word is recession which has two senses (C29 and C34). Considering a relatedness criterion equal to 0.81 and the following similarities, simLin(C31,C29) = 0.87, simLin(C31,C34) = 0.82 , the choice of the sense for recession splits the Lexical Chain into two different interpretations as shown in Figure 4, as both similarities overtake the given  The next candidate word trouble has also two senses (C29 and C32). As all the words in a Lexical Chain influence each other in the selection of the respective senses of the new word considered, we have the following situation in Figure 5.</Paragraph>
      <Paragraph position="14"> So, three cases can happen: (1) all similarities overtake the threshold and we must consider both representations, (2) only the similarities related to one representation overtake the threshold and we  only consider this representation or (3) none of the similarities overtake the threshold and we create a new Lexical Chain. So, we proceed with our algorithm for both interpretations.</Paragraph>
      <Paragraph position="15"> Interpretation 1 shows the following similarities simLin(C31,C29) = 0.87, simLin(C31,C32) = 0.75, simLin(C29,C29) = 1.0, simLin(C29,C32) = 0.78 and interpretation 2 the following ones,</Paragraph>
      <Paragraph position="17"> By computing the average similarities for interpretations 1 and 2, we reach the following results: average(Interpretation1) = 0.85 &gt; 0.81 and</Paragraph>
      <Paragraph position="19"> As a consequence, the word trouble is inserted in the Lexical Chain with the appropriate sense (C29) as it maximizes the overall similarity of the chain and the chain members senses are updated. In this example, the interpretation with (C32) is discarded as is the cluster (C34) for recession. This processing is described in Figure 6.</Paragraph>
      <Paragraph position="20">  Once all chains have been computed, only the high-scoring ones must be picked up as representing the important concepts of the original document. Therefore, onemustfirstidentifythestrongest chains. Like in (Barzilay and Elhadad, 1997), we define a chain score which is defined in Equation 16 where |chain |is the number of words in the chain.</Paragraph>
      <Paragraph position="22"> As all chains will be scored, the ones with higher scores will be extracted. Of course, a threshold will have to be defined by the user. In the next section, we will show some qualitative and quantitative results of our architecture.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML