File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1659_metho.xml

Size: 15,046 bytes

Last Modified: 2025-10-06 14:10:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1659">
  <Title>Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement</Title>
  <Section position="5" start_page="501" end_page="502" type="metho">
    <SectionTitle>
3 General Notation
</SectionTitle>
    <Paragraph position="0"> In graph theory, a graph is a set of objects called vertices joined by links called edges. A bipartite graph, also called a bigraph, is a special graph where the set of vertices can be divided into two disjoint sets with no two vertices of the same set sharing an edge.</Paragraph>
    <Paragraph position="1"> The Hypertext Induced Topic Selection (HITS) algorithm is an algorithm for rating, and therefore ranking, web pages. The HITS algorithm makes use of the following observation: when a page (hub) links to another page (authority), the former confers authority over the latter. HITS uses two values for each page, the &amp;quot;authority value&amp;quot; and the &amp;quot;hub value&amp;quot;. &amp;quot;Authority value&amp;quot; and &amp;quot;hub value&amp;quot; are defined in terms of one another in a mutual recursion. An authority value is computed as the sum of the scaled hub values that point to that authority. A hub value is the sum of the scaled authority values of the authorities it points to.</Paragraph>
    <Paragraph position="2"> A template, as we define for this work, is a sequence of generic forms that could generalize  over the given instances. An example template</Paragraph>
  </Section>
  <Section position="6" start_page="502" end_page="502" type="metho">
    <SectionTitle>
PERSON: PERSON Entity
</SectionTitle>
    <Paragraph position="0"> This template could match the sentence: &amp;quot;France's President Jacque Chirac...&amp;quot;. This template is derived from the representation of the Named Entity tags, Part-of-Speech (POS) tags and semantic tags. The choice of the template representation here is for illustration purpose only; any combination of tags, representations and tagging styles might be used.</Paragraph>
    <Paragraph position="1"> A pattern is more specific than a template. A pattern specifies the role played by the tags (first entity, second entity, or relation). An example of a pattern is:</Paragraph>
    <Paragraph position="3"> This pattern indicates that the word(s) with the tag GPE in the sentence represents the second en-tity (Entity 2) in the relation, while the word(s) tagged PERSON represents the first en-tity (Entity 1) in this relation, the &amp;quot;+&amp;quot; symbol means that the (PERSON) entity is repetitive (i.e.</Paragraph>
    <Paragraph position="4"> may consist of several tokens).</Paragraph>
    <Paragraph position="5"> A tuple, in our notation during this paper, is the result of the application of a pattern to unstructured text. In the above example, one result of applying the pattern to some raw text is the following tuple:</Paragraph>
  </Section>
  <Section position="7" start_page="502" end_page="504" type="metho">
    <SectionTitle>
Relation: EMP-Executive
4 The Approach
</SectionTitle>
    <Paragraph position="0"> The unsupervised graph-based mutual reinforcement approach, we propose, depends on the construction of generalized &amp;quot;extraction patterns&amp;quot; that could match many instances. The patterns are then weighted according to their importance by deploying graph based mutual reinforcement techniques. This duality in patterns and extracted information (tuples) could be stated that patterns could match different tuples, and tuples in turn could be matched by different patterns. The proposed approach is composed of two main steps namely, initial patterns construction and pattern weighting or induction. Both steps are detailed in the next sub-sections.</Paragraph>
    <Section position="1" start_page="502" end_page="503" type="sub_section">
      <SectionTitle>
4.1 Initial Patterns Construction
</SectionTitle>
      <Paragraph position="0"> As shown in Figure 1, several syntactic, lexical, and semantic analyzers could be applied to the unstructured text. The resulting analyses could be employed in the construction of extraction patterns. It is worth mentioning that the proposed approach is general enough to accommodate any pattern design; the introduced pattern design is for illustration purposes only.</Paragraph>
      <Paragraph position="1"> Initially, we need to start with some templates and patterns to proceed with the induction process. Relatively large amount of text data is tagged with different taggers to produce the previously mentioned patterns styles. An n-gram language model is built on this data and used to construct weighted finite state machines.</Paragraph>
      <Paragraph position="2"> Paths with low cost (high language model probabilities) are chosen to construct the initial set of templates; the intuition is that paths with low cost (high probability) are frequent and could represent potential candidate patterns.</Paragraph>
      <Paragraph position="3"> The resulting initial set of templates is applied to a very large text data to produce all possible patterns. The number of candidate initial patterns could be reduced significantly by specifying the candidate types of entities; for example we might specify that the first entity could be PEROSN or PEOPLE while the second entity could be OR-GANIZATION, LOCATION, COUNTRY and etc...</Paragraph>
      <Paragraph position="4"> The candidate patterns are then applied to the tagged stream and the unstructured text to collect a set of patterns and matched tuples pairs.</Paragraph>
      <Paragraph position="5"> The following procedure briefs the Initial Pat- null * Apply various taggers on text data and construct templates style.</Paragraph>
      <Paragraph position="6"> * Build n-gram language model on template style data.</Paragraph>
      <Paragraph position="7"> * Construct weighted finite state machines from the n-gram language model.</Paragraph>
      <Paragraph position="8"> * Choose n-best paths in the finite state machines. null * Use best paths as initial templates.</Paragraph>
      <Paragraph position="9"> * Apply initial templates on large text data.</Paragraph>
      <Paragraph position="10"> * Construct initial patterns and associated tuples sets.</Paragraph>
    </Section>
    <Section position="2" start_page="503" end_page="503" type="sub_section">
      <SectionTitle>
4.2 Pattern Induction
</SectionTitle>
      <Paragraph position="0"> The inherent duality in the patterns and tuples relation suggests that the problem could be interpreted as a hub authority problem. This problem could be solved by applying the HITS algorithm to iteratively assign authority and hub scores to patterns and tuples respectively.</Paragraph>
      <Paragraph position="1"> Patterns and tuples are represented by a bipartite graph as illustrated in figure 2. Each pattern or tuple is represented by a node in the graph.</Paragraph>
      <Paragraph position="2"> Edges represent matching between patterns and tuples. The pattern induction problem can be formulated as follows: Given a very large set of data D containing a large set of patterns P which match a large set of tuples T, the problem is to identify P~ , the set of patterns that match the set of the most correct tuples T~ . The intuition is that the tuples matched by many different patterns tend to be correct and the patterns matching many different tuples tend to be good patterns. In other words; we want to choose, among the large space of patterns in the data, the most informative, highest confidence patterns that could identify correct tuples; i.e. choosing the most &amp;quot;authoritative&amp;quot; patterns in analogy with the hub authority problem. However, both P~ and T~ are unknown. The induction process proceeds as follows: each pattern p in P is associated with a numerical authority weight av which expresses how many tuples match that pattern. Similarly, each tuple t in T has a numerical hub weight ht which expresses how many patterns were matched by this tuple. The weights are calculated iteratively as follows:</Paragraph>
      <Paragraph position="4"> where T(p) is the set of tuples matched by p, P(t) is the set of patterns matching t, ( )pa i )1( + is the authoritative weight of pattern p at iteration )1( +i , and ( )th i )1( + is the hub weight of tuple t at iteration )1( +i . H(i) and A(i) are normalization factors defined as:</Paragraph>
      <Paragraph position="6"> Highly weighted patterns are identified and used for extracting relations.</Paragraph>
    </Section>
    <Section position="3" start_page="503" end_page="504" type="sub_section">
      <SectionTitle>
4.3 Tuple Clustering
</SectionTitle>
      <Paragraph position="0"> The tuple space should be reduced to allow more matching between pattern-tuple pairs. This space reduction could be accomplished by seeking a tuple similarity measure, and constructing a weighted undirected graph of tuples. Two tuples are linked with an edge if their similarity measure exceeds a certain threshold. Graph clustering algorithms could be deployed to partition the graph into a set of homogeneous communities or clusters. To reduce the space of tuples, we seek a matching criterion that group similar tuples together. Using WordNet, we can measure the semantic similarity or relatedness between a pair of concepts (or word senses), and by extension, between a pair of sentences. We use the similarity</Paragraph>
      <Paragraph position="2"> measure described in (Wu and Palmer, 1994) which finds the path length to the root node from the least common subsumer (LCS) of the two word senses which is the most specific word sense they share as an ancestor. The similarity score of two tuples, ST, is calculated as follows:</Paragraph>
      <Paragraph position="4"> where SE1, and SE2 are the similarity scores of the first entities in the two tuples, and their second entitles respectively.</Paragraph>
      <Paragraph position="5"> The tuple matching procedure assigns a similarity measure to each pair of tuples in the dataset. Using this measure we can construct an undirected graph G. The vertices of G are the tuples. Two vertices are connected with an edge if the similarity measure between their underlying tuples exceeds a certain threshold. It was noticed that the constructed graph consists of a set of semi isolated groups as shown in figure 3. Those groups have a very large number of inter-group edges and meanwhile a rather small number of intra-group edges. This implies that using a graph clustering algorithm would eliminate those weak intra-group edges and produce separate groups or clusters representing similar tuples. We used Markov Cluster Algorithm (MCL) for graph clustering (Dongen, 2000). MCL is a fast and  An example of a couple of tuples that could be matched by this technique is:</Paragraph>
      <Paragraph position="7"> A bipartite graph of patterns and tuple clusters is constructed. Weights are assigned to patterns and tuple clusters by iteratively applying the HITS algorithm and the highly ranked patterns are then used for relation extraction.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="504" end_page="505" type="metho">
    <SectionTitle>
5 Experimental Setup
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="504" end_page="504" type="sub_section">
      <SectionTitle>
5.1 ACE Relation Detection and Charac-
</SectionTitle>
      <Paragraph position="0"> terization In this section, we describe Automatic Content Extraction (ACE). ACE is an evaluation conducted by NIST to measure Entity Detection and Tracking (EDT) and Relation Detection and Characterization (RDC). The EDT task is concerned with the detection of mentions of entities, and grouping them together by identifying their coreference. The RDC task detects relations between entities identified by the EDT task. We choose the RDC task to show the performance of the graph based unsupervised approach we propose. To this end we need to introduce the notion of mentions and entities. Mentions are any instances of textual references to objects like people, organizations, geopolitical entities (countries, cities ...etc), locations, or facilities. On the other hand, entities are objects containing all mentions to the same object. Here, we present some examples of ACE entities and relations:</Paragraph>
    </Section>
    <Section position="2" start_page="504" end_page="504" type="sub_section">
      <SectionTitle>
Spain's Interior Minister
</SectionTitle>
      <Paragraph position="0"> announced this evening the arrest of separatist organization Eta's presumed leader Ignacio Garcia Arregui. Arregui, who is considered to be the Eta organization's top man, was arrested at 17h45 Greenwich. The Spanish judiciary suspects Arregui of ordering a failed attack on King Juan Carlos in 1995.</Paragraph>
      <Paragraph position="1"> In this fragment, all the underlined phrases are mentions to &amp;quot;Eta&amp;quot; organization, or to &amp;quot;Garcia Arregui&amp;quot;. There is a management relation between &amp;quot;leader&amp;quot; which references to &amp;quot;Garcia Arregui&amp;quot; and &amp;quot;Eta&amp;quot;.</Paragraph>
    </Section>
    <Section position="3" start_page="504" end_page="505" type="sub_section">
      <SectionTitle>
5.2 Patterns Construction and Induction
</SectionTitle>
      <Paragraph position="0"> We used the LDC English Gigaword Corpus, AFE source from January to August 1996 as a source for unstructured text. This provides a total of 99475 documents containing 36 M words. In the performed experiments, we focus on two types of relations EMP-ORG relations and GPE-AFF relations which represent almost 50% of all relations in RDC - ACE task.</Paragraph>
      <Paragraph position="2"> POS (part of speech) tagger and mention tagger were applied to the data, the used pattern design consists of a mix between the part of speech (POS) tags and the mention tags for the words in the unsupervised data. We use the mention tag, if it exists; otherwise we use the part of speech tag.</Paragraph>
      <Paragraph position="3"> An example of the analyzed text and the presumed associated pattern is shown: Text: Eta's presumed leader  An n-gram language model, 5-gram model and back off to lower order n-grams, was built on the data tagged with the described patterns' style. Weighted finite states machines were constructed with the language model probabilities. The n-best paths, 20 k paths, were identified and deployed as the initial template set. Sequences that do not contain the entities of interest, and hence cannot represent relations, were automatically filtered out. This resulted in an initial templates set of around 3000 element. This initial templates set was applied on the text data to establish initial patterns and tuples pairs. Graph based mutual reinforcement technique was deployed with 10 iterations on the patterns and tuples pairs to weight the patterns.</Paragraph>
      <Paragraph position="4"> We conducted two groups of experiments, the first with simple syntactic tuple matching, and the second with semantic tuple clustering as described in section 4.3</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML