File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2011_metho.xml

Size: 20,978 bytes

Last Modified: 2025-10-06 14:09:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2011">
  <Title>Knowledge Extraction Using Dynamical Updating of Representation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Implementation of the Kintsch-Ericsson
</SectionTitle>
    <Paragraph position="0"> model The approach of the network of propositions yielded two project problems. The creation of the LTWM and the activation of LTM nodes, i.e. the creation of the retrieval cues.</Paragraph>
    <Paragraph position="1"> Kintsch has developed two methods for the definition of the LTWM.</Paragraph>
    <Paragraph position="2"> The first, defined with Van Dijk (T.A. van Dijk, W. Kintsch, 1983), is a manual technique that starts from the propositions present in the text (micropropositions) and using some organizing rules arrives to the definition of macropropositions and macrostructures and even to the definition of LTWM.</Paragraph>
    <Paragraph position="3"> The second is based on the latent semantic analysis (LSA) (T.K. Landauer, P.W. Foltz, D. Laham, 1998). This technique can infer, from the matrix of co-occurrence rates of the words, a semantic space that reflects the semantic relations between words and phrases. This space has typically 300-400 dimensions and allows to represent words, phrases and entire texts in a vectorial form. In this way the semantic relation between two vectors can be estimated by their cosine (a measure that according to Kintsch can be interpreted as a correlation coefficient).</Paragraph>
    <Paragraph position="4"> This latter solution to the problem of the definition of LTWM puts a great and inevitable technical problem. How many objects must be retrieved from the semantic space for every word present in the text ? In some cases, when the textbase, i.e. the representation obtained directly from the text, is sufficiently expressed, the retrieval of knowledge from the LTM is not necessary. In other cases a correct comprehension of the text (or the relative situation model) requires the retrieval of knowledge from the LTM.</Paragraph>
    <Paragraph position="5"> After the creation of the LTWM the integration process begins i.e. the activation of the nodes correspondent to the meaning of the phrase.</Paragraph>
    <Paragraph position="6"> Kintsch uses a diffusion of activation pocedure that is a simplified version of the one developed by McClelland and Rumelhart (J.L. McClelland, D.E.</Paragraph>
    <Paragraph position="7"> Rumelhart, 1986). Firstly an activation vector is defined whose elements are indexed over the nodes of LTWM. Any element's value is &amp;quot;1&amp;quot; or &amp;quot;0&amp;quot; depending on the presence or the absence of the corresponding node in the analyzed phrase (i.e. in the STWM). This vector is multiplied by the matrix of the correlation rates (the weights of the links of the LTWM) and the resulting vector is normalized. This becomes the new activation vector that must be multiplied again by the matrix of the correlation rates. This procedure goes on until the activation vector becomes stable. After the integration process, the irrelevant nodes are deactivated and only those that represent the situation model remain activated.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 An alternative representation of the
Kintsch-Ericsson model
</SectionTitle>
      <Paragraph position="0"> The adoption of a network of propositions for the knowledge representation presents certainly great advantages in comparison with the classic formalisms. While semantic networks, frames and scripts organize knowledge in a more ordered and logical way, the networks of propositions are definitely more disorganized and chaotic, but present the not negligible advantage that are capable to vary dynamically not only in time, on the basis of the past experiences, but also on the basis of the perceived context.</Paragraph>
      <Paragraph position="1"> But the technique worked out by Kintsch and Ericsson for the definition of LTWM presents some limits. Retrieving knowledge from the semantic space is only the first. Another problem is the evolution of the LTWM. The position occupied by a word in the LTWM is determined by the experience, i.e. its past use and this should be a lifetime experience. But this kind of knowledge cannot be reached practically and Kintsch resorts to the use of a dictionary for the definition of the semantic space that represents the LTWM.</Paragraph>
      <Paragraph position="2"> Furthermore the construction-integration process does not always assure the semantic disambiguation of the analysed phrase (W.Kintsch, 1998).</Paragraph>
      <Paragraph position="3"> The use of an external dictionary, as WordNet, (G. A. Miller, 1993) and of particular disambiguation procedures can overcome the last two limits.</Paragraph>
      <Paragraph position="4"> Instead the first problem can be fully solved only by dropping the intermediate representation of the semantic space and by developing new methods for the direct formation of networks of concepts and propositions.</Paragraph>
      <Paragraph position="5"> Let us describe now the system for the automatic acquisition of the knowledge that we developed on the basis of the LTWM model of Kintsch-Ericsson. The lack of adequate textual parsers able to convert the paragraphs of a text in the correspondent atomic propositions has driven us to develop, at least in this initial phase of our project, simple dynamic models of associative networks.</Paragraph>
      <Paragraph position="6">  the dynamical acquisition of knowledge from a repository of documents.</Paragraph>
      <Paragraph position="7"> The part of the document that is analysed (the content of the buffer) must be codified on the basis of the context before being elaborated by the working memory block. The context represents the theme, the subject of the processed text and for its correct characterization not only the information present in the document must be considered, but also the one that can be retrieved from the structure representing the knowledge accumulated during the analysis of the previous documents presented to the system (Long Term Memory).</Paragraph>
      <Paragraph position="8"> For the implementation of the working memory block, self organizing networks with suitable procedures for the labeling of their nodes could be used, but this solution requires a lot of computational time, especially for the analysis of entire repositories of documents.</Paragraph>
      <Paragraph position="9"> So we considered alternative models based on the theory of scale free graphs (R.Albert, A.L.Barabasi, 2001) for the implementation of an associative network.</Paragraph>
      <Paragraph position="10"> The graph theory dealed with regular graphs untill the 50s. Subsequently random graphs were introduced (P.Erdos, A.Renyi, 1959). They were the first simple forms of complex graphs that had ever been studied.</Paragraph>
      <Paragraph position="11"> Their model started with a network made by N isolated nodes. Successively each pair of nodes could be connected with a probability p, leading to a graph having approximately pN(N-1)/2 links.</Paragraph>
      <Paragraph position="12"> But this model was still far from real networks present in nature and artificial systems. So scientists defined other models characterized by an higher complexity level.</Paragraph>
      <Paragraph position="13"> The actual models have three main features.</Paragraph>
      <Paragraph position="14"> First their &amp;quot;small world&amp;quot; structure. That means there is a relatively short path between any two nodes (D.J.Watts, S.H.Strogatz, 1998).</Paragraph>
      <Paragraph position="15"> Second their inherent tendency to cluster that is quantified by a coefficient that was introduced by Watts and Strogatz. Given a node i of ki degree i.e. having ki edges which connect it to ki other nodes, if those make a cluster, they can establish ki(ki-1)/2 edges at best. The ratio between the actual number of edges and the maximum number gives the cluster coefficient of node i. The clustering coefficient of the whole network is the average of the all individual clustering coefficients. In a random graph the clustering coefficient is C = p. In real networks the clustering coefficient is much larger than p.</Paragraph>
      <Paragraph position="16"> Actual graph models are also characterized by a particular degree distribution. While in a random graph the majority of the nodes have approximately the same degree close to the average degree, the degree distribution P(k) of a real network has a power-law tail P(k)~k-?. For this reason these networks are called &amp;quot;scale free&amp;quot; (R.Albert, A.L.Barabasi, 2000).</Paragraph>
      <Paragraph position="17"> Recently it has been found that human knowledge seems to be structured as a scale free graph (M.Steyvers, J.Tenenbaum, 2001).</Paragraph>
      <Paragraph position="18"> Representing words and concepts with nodes, some of these (hubs) establish much more links compared with the other ones.</Paragraph>
      <Paragraph position="19"> In table 2 are reported the average shortest path length, the clustering coefficient and the power law exponent of two different types of semantic networks.</Paragraph>
      <Paragraph position="20">  semantic networks.</Paragraph>
      <Paragraph position="21"> This particular conformation seems to optimize the communication between nodes. Thanks to the presence of the hubs, every pair of nodes can be connected by a low number of links in comparison with a random network with the same dimensions. The definition and the eventual updating of a scale free network does not require a lot of time and the execution of particular processes, as the diffusion of the activation signal, is very fast.</Paragraph>
      <Paragraph position="22"> The textual analysis is performed through the following steps.</Paragraph>
      <Paragraph position="23"> The new text is analysed paragraph by paragraph. The buffer contains not only the words of the paragraph analysed, but also words retrieved from the long term memory using the diffusion of the activation procedure (the activation signal starts from the nodes in the LTM that represents the words in the paragraph). Theoretically, the buffer should contain also the words activated during the analysis of the previous paragraph, but this aspect has not been considered for its computational complexity. The buffer, the working memory and the activated part of the LTM block can be compared (but they are not the same structure) to the LTWM defined by Kintsch and Ericsson.</Paragraph>
      <Paragraph position="24"> During the acquisition of the content of the paragraph a stoplist of words that must not be considered (as articles, pronouns etc.) is used. For any word in the text, the paragraphs where it has appeared (or where it has been inserted after the retrieval procedure) are stored. When the entire text has been parsed and the data of all the N not filtered words have been memorized, the formation of the network of concepts in the working memory begins. The model adopted is similar to the one defined by Bianconi and Barabasi (G.Bianconi, A.Barabasi, 2001). The process starts with a net consisting of N disconnected nodes.</Paragraph>
      <Paragraph position="25"> At every step t=1..N each node (associated to one of the N words) establishes a link with other M units (M=5). If j is the selected unit, the probability that this node establishes a link with the unit i is:</Paragraph>
      <Paragraph position="27"> where ki is the degree of the unit i 1, i.e. the number of links established by it, while Ui is the fitness value associated to the node, and it can be computed as the ratio between the number of paragraphs that contain both i and j and the number of paragraphs that contain either i or j.</Paragraph>
      <Paragraph position="28"> LTM is an associative network that is updated with the content of the WM. Whenever a link of the WM corresponds to a link present in the LTM, the weight of this one is increased by &amp;quot;1&amp;quot;.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Example :
</SectionTitle>
      <Paragraph position="0"> The WM links &amp;quot;Hemingway&amp;quot; to &amp;quot;writer&amp;quot;.</Paragraph>
      <Paragraph position="1"> In the LTM &amp;quot;Hemingway&amp;quot; is linked to &amp;quot;writer&amp;quot; with weight &amp;quot;7&amp;quot; and to &amp;quot;story&amp;quot; with weight &amp;quot;4&amp;quot;. In the updated LTM &amp;quot;Hemingway&amp;quot; is linked to &amp;quot;writer&amp;quot; with weight &amp;quot;8&amp;quot; and to &amp;quot;story&amp;quot; with weight &amp;quot;4&amp;quot; (unchanged).</Paragraph>
      <Paragraph position="2"> To perform the diffusion of the activation signal all the weights must be normalized. In this case &amp;quot;Hemingway&amp;quot; must be linked to &amp;quot;writer&amp;quot; with weight 8/(8+4) and to &amp;quot;story&amp;quot; with weight 4/(8+4).</Paragraph>
      <Paragraph position="3"> Since the scale free network that represents the content of the WM is used to update the content of LTM, this associative networks should take the form of a scale free graph. Unfortunately the modalities of evolution of the LTM does not allow the definition of a simple equivalent mathematic model, that is necessary to make useful previsions about its evolution.</Paragraph>
      <Paragraph position="4"> In the scale free graph models proposed by literature at each temporal step M new nodes are added to the graph, with M defined beforehand.</Paragraph>
      <Paragraph position="5"> These M nodes generally establish M links with M old units of the network. In the system that we have developed, after the analysis of a new document the links related to an unknown number of nodes of the LTM network are updated on the basis of the content of the WM. This number depends on the analysed document because it is the number of the words that have not been filtered by the stoplist.</Paragraph>
      <Paragraph position="6"> Another important difference with other scale free models presented in literature (S.N.</Paragraph>
      <Paragraph position="7"> Dorogovtsev, J.F.F. Mendes, 2001) is the particular fitness function that is used. This function does not depend on a single node but on the considered pair of nodes. If this value is choosen as proportional to the weights of the LTM associative network, the fitness value of a word is not constant but depends on the other word that could be linked to it. For example the noun &amp;quot;house&amp;quot; should present for the link with &amp;quot;door&amp;quot; a 1 Each node is connected to itself by a loop.</Paragraph>
      <Paragraph position="8"> fitness value greater than the ones presented for the links with &amp;quot;person&amp;quot; and &amp;quot;industry&amp;quot;. 3 Evaluation of the WM block To test the validity of the scale free graph model adopted for the WM, we gave 100 files of the Reuters Corpus2 as input to the system disabling the retrieval of information from the LTM.</Paragraph>
      <Paragraph position="9"> Two versions of the model have been tested, one with bidirectional links and the other with directed links (in this case we considered ki = ki(IN) + ki(OUT)). In fig. 2 (http://www.deit.univpm.it/~dragoni /downloads/scale_free.jpg) an example of a network with bidirectional links is represented.</Paragraph>
      <Paragraph position="10"> Please notice that the economic bias of the articles justifies the presence of hubs as &amp;quot;interest rate&amp;quot;, &amp;quot;economy&amp;quot;, etc., while other frequent words as &amp;quot;child&amp;quot;, &amp;quot;restaurant&amp;quot;, etc. establish less link with  graphic using the logarithmic coordinates.</Paragraph>
      <Paragraph position="11">  The degree distribution decays as P(k) ~ k-G with G = 3.2657.</Paragraph>
      <Paragraph position="12"> The degree distribution of a graph with directed links is reported below.</Paragraph>
      <Paragraph position="13">  In order to evaluate the learning capabilities of the system, we applied it on a medical article. The sections of the paper have been presented separately as independent texts regarding the same topic. This choice has been imposed by the necessity to enable also the retrieval of information from LTM.</Paragraph>
      <Paragraph position="14"> As expected, the resulting LTM network was a typical scale-free graph (tab. 2).</Paragraph>
      <Paragraph position="15">  The analysis has been repeated 30 times examining the coherence rate of each resulting LTM representation.</Paragraph>
      <Paragraph position="16"> The coherence measure is based on a kind of transitivity assumption, i.e. if two concepts have similar relationships with other concepts, then the two concepts should be similar.</Paragraph>
      <Paragraph position="17"> The coherence rate is obtained by correlating the LTM ratings given for each item in a pair with all of the other concepts3. Its value can be correctly computed only producing symmetric versions of the LTM data.</Paragraph>
      <Paragraph position="18"> The average coherence rate was 0.45, indicating that the system has conceptualized the terms according to a precise inner schema.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 All the operations described in this section are
</SectionTitle>
    <Paragraph position="0"> performed by the software PCKNOT 4.3, a product of Interlink Inc.</Paragraph>
    <Paragraph position="1"> To evaluate the correctness of this schema we are going to compare the obtained LTM representations with experimental data obtained from a group of human subjects. The subjects will be asked to read the same medical article examined by the system, assigning a rate of similarity to each pair of words that has been considered by the system. A Pathfinder analysis (R.W. Schvaneveldt, F.T. Durso, D.W. Dearholt, 1985.) will be performed on the relatedness matrices provided by human subjects and the LTM matrices in order to extract the so called &amp;quot;latent semantic&amp;quot;, i.e. other implicit relations between words. The obtained matrices will be compared using a similarity rate determined by the correspondence of links in the two types of networks.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Future work
</SectionTitle>
    <Paragraph position="0"> Some important considerations can be made on the overall structure of the system.</Paragraph>
    <Paragraph position="1"> The absence of an external feedback does not guarantee the correspondence between the LTM and the form of representation that must be modelled ( the knowledge of an organization, the knowledge of a working group, the knowledge of a single user ). A possible external feedback could be based on the evaluation of the performances of the system in the execution of particular tasks as the retrieval or the filtering of documents. For example the acceptance or the rejection of the documents selected by the system could be reflected in the updating modality of the LTM. In the first case the content of the WM could be used to strenghten the links in the LTM or to create new ones (as explained previously), in the second case the content of the WM could be used to weaken or delete the links in the LTM.</Paragraph>
    <Paragraph position="2"> During the formation of the network in the WM the information about the weights of the links in LTM is not considered explicitly. Even if the weights can condition the retrieval of the information from the LTM, they could also modify the value of the fitness function used for the computation of the probability of the creation of new links in the WM.</Paragraph>
    <Paragraph position="3"> Furthermore, the association of an age to the links of the LTM could guarantee more plasticity to its structure. Also the ages could be used in the computation of the fitness values, for example in accordance with the modalities suggested by Dorogovtsev (S.N. Dorogovtsev, J.F.F. Mendes, 2000).</Paragraph>
    <Paragraph position="4"> We think that our knowledge acquisition system can be effectively used for the semantic disambiguation, that is the first phase of the analysis in the most recent systems for the extraction of ontologies from texts (R. Navigli, P. Velardi, A. Gangemi, 2003).</Paragraph>
    <Paragraph position="5"> As a further development, we are thinking of extracting from our representation form a simple taxonomy of concepts using techniques for the extraction of subsumption and equivalence relations. These techniques are based on the elaboration of the correlations between concepts expressed as fuzzy relations. A taxonomical representation can be considered as an important step towards the creation of an ontological representation. In this way our system could be used to model the user knowledge representing it in an ontological form.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML