XML Viewer - w06-1656

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1656_metho.xml
Size: 12,591 bytes
Last Modified: 2025-10-06 14:10:48
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1656">
  <Title>Boosting Unsupervised Relation Extraction by Using NER</Title>
  <Section position="5" start_page="474" end_page="475" type="metho">
    <SectionTitle>
3 Description of URES
</SectionTitle>
    <Paragraph position="0"> The goal of URES is extracting instances of relations from the Web without human supervision. Accordingly, the input of the system is limited to (reasonably short) definition of the target relations (composed of the relation's schema and a few keywords that enable gathering relevant sentences). For example, this is the description of the acquisition relation: Acquisition(ProperNP, ProperNP) ordered keywords={&amp;quot;acquired&amp;quot; &amp;quot;acquisition&amp;quot;} The word ordered indicates that Acquisition is not a symmetric relation and the order of its arguments matters. The ProperNP tokens indicate the types of the attributes. In the regular mode, there are only two possible attribute types - ProperNP and CommonNP, meaning proper and common noun phrases, respectively. When using the NER Filter component described in the section 4.1 we allow further subtypes of ProperNP, and the predicate definition becomes: acquisition(Company, Company) ...</Paragraph>
    <Paragraph position="1"> The keywords are used for gathering sentences from the Web and for instantiating the generic patterns for seeds generation.</Paragraph>
    <Paragraph position="2"> Additional keywords (such as &amp;quot;acquire&amp;quot;, &amp;quot;purchased&amp;quot;, &amp;quot;hostile takeover&amp;quot;, etc), which can be used for gathering more sentences, are added automatically by using WordNet [18].</Paragraph>
    <Paragraph position="3"> URES consists of several largely independent components; their layout is shown on the Figure 1. The Sentence Gatherer generates (e.g., downloads from the Web) a large set of sentences that may contain target instances. The Seeds Generator, which is essentially equal to the KnowItAll-baseline system, uses a small set of generic patterns instantiated with the predicate keywords to extract a small set of high-confidence instances of the target relations. The Pattern Learner uses the seeds to learn likely patterns of relation occurrences. Then, the Instance Extractor uses the patterns to extracts the instances from the sentences. Those instances can be filtered by a NER Filter, which is an optional part of the system. Finally, the Classifier assigns the confidence score to each extraction.</Paragraph>
    <Section position="1" start_page="475" end_page="475" type="sub_section">
      <SectionTitle>
3.1 Pattern Learner
</SectionTitle>
      <Paragraph position="0"> The task of the Pattern Learner is to learn the patterns of occurrence of relation instances.</Paragraph>
      <Paragraph position="1"> This is an inherently supervised task, because at least some occurrences must be known in order to be able to find patterns among them.</Paragraph>
      <Paragraph position="2"> Consequently, the input to the Pattern Learner includes a small set (10 instances in our experiments) of known instances for each target relation. Our system assumes that the seeds are a part of the target relation definition. However, the set of seeds need not be created manually. Instead, the seeds can be taken automatically from the top-scoring results of a high-precision low-recall unsupervised extraction system, such as KnowItAll. The seeds for our experiments were produced in exactly this way: we used two generic patterns instantiated with the relation name and keywords. Those patterns have a relatively high precision (although low recall), and the top-confidence results, which are the ones extracted many times from different sentences, have close to 100% probability of being correct.</Paragraph>
      <Paragraph position="3"> The Pattern Learner proceeds as follows: first, the gathered sentences that contain the seed instances are used to generate the positive and negative sets. From those sets the patterns are learned. Finally, the patterns are post-processed and filtered. We shall now describe those steps in detail.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="475" end_page="476" type="metho">
    <SectionTitle>
PREPARING THE POSITIVE AND NEGATIVE
SETS
</SectionTitle>
    <Paragraph position="0"> The positive set of a predicate (the terms predicate and relation are interchangeable in our work) consists of sentences that contain a known instance of the predicate, with the instance attributes changed to &amp;quot;&lt;AttrN&gt;&amp;quot;, where N is the attribute index. For example, assuming there is a seed instance Acquisition(Oracle, PeopleSoft), the sentence The Antitrust Division of the U.S. Department of Justice evaluated the likely competitive effects of Oracle's proposed acquisition of PeopleSoft.</Paragraph>
    <Paragraph position="1"> will be changed to The Antitrust Division... ...of &lt;Attr1&gt;'s proposed acquisition of &lt;Attr2&gt;.</Paragraph>
    <Paragraph position="2"> The positive set of a predicate P is generated straightforwardly, using substring search. The negative set of a predicate consists of sentences with known false instances of the predicate similarly marked (with &lt;AttrN&gt; substituted for attributes). The negative set is used by the pattern learner during the scoring and filtering step, to filter out the patterns that are overly general. We generate the negative set from the sentences in the positive set by  changing the assignment of one or both attributes to other suitable entities in the sentence. In the shallow parser based mode of operation, any suitable noun phrase can be assigned to an attribute.</Paragraph>
  </Section>
  <Section position="7" start_page="476" end_page="477" type="metho">
    <SectionTitle>
GENERATING THE PATTERNS
</SectionTitle>
    <Paragraph position="0"> The patterns for the predicate P are generalizations of pairs of sentences from the positive set of P. The function Generalize(s  from the positive set of the predicate. The function generates a pattern that is the best (according to the objective function defined below) generalization of its two arguments. The following pseudocode shows the process of generating the patterns for the  The patterns are sequences of tokens, skips (denoted *), limited skips (denoted *?) and slots. The tokens can match only themselves, the skips match zero or more arbitrary tokens, and slots match instance attributes. The limited skips match zero or more arbitrary tokens, which must not belong to entities of the types equal to the types of the predicate attributes. In the shallow parser based mode, there are only two different entity types ProperNP and CommonNP, standing for proper and common noun phrases.</Paragraph>
    <Paragraph position="1">  ) function takes two sentences and generates the least (most specific) common generalization of both. The function does a dynamical programming search for the best match between the two patterns (Optimal String Alignment algorithm), with the cost of the match defined as the sum of costs of matches for all elements. The exact costs of matching elements are not important as long as their relative order is maintained. We use the following numbers: two identical elements match at cost 0, a token matches a skip or an empty space at cost 10, a skip matches an empty space at cost 2, and different kinds of skip match at cost 3. All other combinations have infinite cost. After the best match is found, it is converted into a pattern by copying matched identical elements and adding skips where non-identical elements are matched. For example, assume the sentences are Toward this end, &lt;Attr1&gt; in July acquired  at total cost = 80. Assuming that &amp;quot;X&amp;quot; belongs to the same type as at least one of the attributes while the other tokens are not entities, the match will be converted to the pattern *? this *? , &lt;Attr1&gt; *? acquired &lt;Attr2&gt;</Paragraph>
    <Section position="1" start_page="476" end_page="477" type="sub_section">
      <SectionTitle>
*
3.2 Classifying the Extractions
</SectionTitle>
      <Paragraph position="0"> The goal of the final classification stage is to filter the list of all extracted instances, keeping the correct extractions and removing mistakes that would always occur regardless of the quality of the patterns. It is of course impossible to know which extractions are correct, but there exist properties of patterns and pattern matches that increase or decrease the confidence in the extractions that they produce. Thus, instead of a binary classifier, we seek a real-valued confidence function c, mapping the set of extracted instances into the [0, 1] segment.</Paragraph>
      <Paragraph position="1"> Since confidence value depends on the properties of particular sentences and patterns, it is more properly defined over the set of single pattern matches. Then, the overall confidence of an instance is the maximum of the confidence values of the matches that produce the instance.</Paragraph>
      <Paragraph position="2"> Assume that an instance E was extracted from a match of a pattern P at a sentence S.</Paragraph>
      <Paragraph position="3">  The following set of binary features may influence the confidence c(E, P, S): f1(E, P, S) = 1, if the number of sentences producing E is greater than one.</Paragraph>
      <Paragraph position="4"> f2(E, P, S) = 1, if the number of sentences producing E is greater than two.</Paragraph>
      <Paragraph position="5"> f3(E, P, S) = 1, if at least one slot of the pattern P is adjacent to a non-stop-word token.</Paragraph>
      <Paragraph position="7"> between the slots of the match M that were matched to skips of the pattern P is 0 (f10), 1 or less (f11), 2 or less (f12) , 3 or less(f13), 5 or less (f14), and 10 or less (f15).</Paragraph>
      <Paragraph position="8"> Utilizing the NER In the URES-NER version the entities of each candidate instance are passed through a simple rule-based NER filter, which attaches a score (&amp;quot;yes&amp;quot;, &amp;quot;maybe&amp;quot;, or &amp;quot;no&amp;quot;) to the argument(s) and optionally fixes the arguments boundaries. The NER is capable of identifying entities of type PERSON and COMPANY (and can be extended to identify additional types).</Paragraph>
      <Paragraph position="9"> The scores mean: &amp;quot;yes&amp;quot; - the argument is of the correct entity type.</Paragraph>
      <Paragraph position="10"> &amp;quot;no&amp;quot; - the argument is not of the right entity type, and hence the candidate instance should be removed.</Paragraph>
      <Paragraph position="11"> &amp;quot;maybe&amp;quot; - the argument type is uncertain, can be either correct or no.</Paragraph>
      <Paragraph position="12"> If &amp;quot;no&amp;quot; is returned for one of the arguments, the instance is removed. Otherwise, an additional binary feature is added to the instance's vector: f16 = 1 iff the score for both arguments is &amp;quot;yes&amp;quot;.</Paragraph>
      <Paragraph position="13"> For bound predicates, only the second argument is analyzed, naturally.</Paragraph>
      <Paragraph position="14"> As can be seen, the set of features above is small, and is not specific to any particular predicate. This allows us to train a model using a small amount of labeled data for one predicate, and then use the model for all other predicates: Training: The patterns for a single model predicate are run over a relatively small set of sentences (3,000-10,000 sentences in our experiments), producing a set of extractions (between 150-300 extractions in our experiments).</Paragraph>
      <Paragraph position="15"> The extractions are manually labeled according to whether they are correct or not. For each pattern match Mk = (Ek, Pk, Sk), the value of the feature vector fk = (f1(Mk), ..., f15(Mk)) is calculated, and the label Lk = +-1 is set according to whether the extraction Ek is correct or no.</Paragraph>
      <Paragraph position="16"> A regression model estimating the function L(f) is built from the training data {(fk, Lk)}. For our classifier we used the BBR (Genkin, Lewis et al. 2004), but other models, such as SVM or NaiveBayes are of course also possible.</Paragraph>
      <Paragraph position="17"> Confidence estimation: For each pattern match M, its score L(f(M)) is calculated by the trained regression model. Note that we do not threshold the value of L, instead using the raw probability value between zero and one.</Paragraph>
      <Paragraph position="18"> The final confidence estimates c(E) for the extraction E is set to the maximum of L(f(M)) over all matches M that produced E.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML