XML Viewer - w97-0110

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0110_abstr.xml
Size: 30,446 bytes
Last Modified: 2025-10-06 13:48:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0110">
  <Title>Corpus Based Statistical Generalization Tree in Rule Optimization *</Title>
  <Section position="1" start_page="0" end_page="88" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> A corpus-based statistical Generalization Tree model is described to achieve rule opthnization for the information extraction task. First, the user creates specific rules for the target information from the sample articles through a training interface. Second, WordNet is applied to generalize noun entities in the specific rules. The degree of generalization is adjusted to fit the user's needs by use of the statistical Generalization Tree model FinaUy, the optimally generalized rules are applied to scan new information. The results of experiments demonstrate the applicability of our Generalization Tree method.</Paragraph>
    <Paragraph position="1"> Introduction Research on corpus-based natural language learning and processing is rapidly accelerating following the introduction of large on-line corpora, faster computers, and cheap storage devices. Recent work involves novel ways to employ annotated corpus in part of speech tagging (Church 1988) (Derose 1988) and the application of mutual information statistics on the corpora to uncover lexical information (Church 1989). The goal of the research is the construction of robust and portable natural language processing systems.</Paragraph>
    <Paragraph position="2"> The wide range of topics available on the Internet calls for an easily adaptable information extraction system for different domains. Adapting an extraction systeem to a new domain is a tedious process. In the traditional customization process, the given corpus must be studied carefully in order to get all the possible ways to express target information. Many research groups are implementing the efficient customization of information extraction systems, such as BBN (Weischedel 1995), NYU (Grishman 1995), SRI (Appelt, Hobbs, et al 1995), SRA (Krupka 1995), MITRE (Aberdeen, Burger, et al 1995), and UMass (Fisher, Soderland, et al 1995).</Paragraph>
    <Paragraph position="3"> &amp;quot;This work has been supported by a Fellowship from IBM Corporation.</Paragraph>
    <Paragraph position="4"> We employ a rule optimization approach and implement it in our tradable information extraction system. The system allows the user to train on a small amount of data in the domain and creates the specific rules.</Paragraph>
    <Paragraph position="5"> Then it automatically extracts a generalization from the tr~iui~g corpus and makes the rule general for the new information, depending on the user's needs. In this way, rule generali~.ation makes the customization for a new domain easier.</Paragraph>
    <Paragraph position="6"> This paper specifically describes the automated rule optimiT.ation method and the usage of WordNet (Miller 1990). A Generalization Tree (GT) model based on the tr~inlng corpus and WordNet is presented, as well as how the GT model is used by our system to automatically learn and control the degree of generalization according to the user's needs.</Paragraph>
    <Section position="1" start_page="0" end_page="81" type="sub_section">
      <SectionTitle>
System Overview
</SectionTitle>
      <Paragraph position="0"> The system cont~i~.~ three major subsystems which, respectively~ address training, rule optlmi~ation, and the scanning of new information. The overall structure of the system is shown in Figure 1. First, each article is partially parsed and segmented into Noun Phrases, Verb Phrases and Prepositional Phrases. An IBM LanguageWare English Dictionary and Computing Term Dictionary, a Partial Parser I, a Tokenizer and a Preprocessor are used in the parsing process.</Paragraph>
      <Paragraph position="1"> The Tokenizer and the Preprocessor are designed to identify some special categories such as e-mail address, phone number, state and city etc. In the training process, the user, with the help of a graphical user intefface(GUI) scans a parsed sample article and indicates a series of semantic net nodes and transitions that he or she would like to create to represent the information of interest. Specifically, the user designates those noun phrases in the article that are of interest and uses the interface commands to translate them IWe wish to thank Jerry Hobbs of SRI for providing us with the finite-state rules for the parser.</Paragraph>
      <Paragraph position="2">  into semantic net nodes. Furthermore, the user designates verb phrases and prepositions that relate the noun phrases and uses commands to translate them into semantic net transitions between nodes. In the process, the user indicates the desired translation of the specific information of interest into semantic net form that can easily be processed by the machl-e. For each headword in a noun phrase, WordNet is used to provide sense information. Usually 90% of words in the domain are used in sense one (the most frequently used sense) as defined in WordNet. However, some words might use other sense. For example, &amp;quot;opening&amp;quot; often appears in the job advertisement domain. But instead of using the first sense as {opening, gap}, it uses the fourth sense as {opportunity, chance}. Based on this scenario, for headwords with senses other than sense one, the user needs to identify the appropriate senses, and the Sense Classifier will keep the record of these headwords and their most frequently used senses.</Paragraph>
      <Paragraph position="3"> When the user takes the action to create the semantic transitions, a Rule Generator keeps track of the user's moves and creates the rules automatically. These rules are specific to the tralni~g articles and they need to be generalized in order to be applied on other unseen articles in the domain. According to ditferent requirements from the user, the Rule Optimization Engine, based on WordNet, generalizes the specific rules created in the training process and forms a set of optimi~.ed rules for processing new information. This rule optimization process will be explained in the later sections. During the sc~nnlng of new information, with the help of a rule matching routine, the system applies the optimized rules on a large number of unseen articles from the domain. For the most headwords in the phrases, if they are not in the Sense Classifier table, sense one in WordNet will be assigned; otherwise, the Sense Classifier will provide the system their most frequently used senses in the domain. The output of the system is a set of semantic transitions for each article that specifically extract information of interest to the user. Those transitions can then be used by a Post-processor to frill templates, answer queries, or generate abstracts (Bagga, Chai 1997).</Paragraph>
    </Section>
    <Section position="2" start_page="81" end_page="81" type="sub_section">
      <SectionTitle>
Rule Operations
</SectionTitle>
      <Paragraph position="0"> Our trainable information extraction system is a rule-based system, which involves three aspects of role operations: rule creation, rule generalization and rule application. null</Paragraph>
    </Section>
    <Section position="3" start_page="81" end_page="82" type="sub_section">
      <SectionTitle>
Ride Creation
</SectionTitle>
      <Paragraph position="0"> In a typical information extraction task, the most interesting part is the events and relationships holding among the events (Appelt, Hobbs, et al 1995). These relationships are usually specified by verbs and prepositions. Based on this observation, the left hand side (LHS) of our meaning extraction rules is made up of three entities. The fn-st and the third entities are the target objects in the form of noun phrases, the second entity is the verb or prepositional phrase indicating the zelationship between the two objects. The right hand side (RHS) of the rule consists of the operations</Paragraph>
      <Paragraph position="2"> required to create a semantic transition-ADD..NODE, ADD.RELATION.</Paragraph>
      <Paragraph position="3"> For example, during the training process, as shown in Figure 2, the user tr~in~ on the sentence &amp;quot;DCR Inc. is looking for C programmers...', and would like to designate the noun phrases(as found by the parser) to be semantic net nodes and the verb phrase to represent a tr0n~ition between them. The training interface provides the user ADD.NODE, ADD.RELATION GUI commands to accomplish this. ADD.NODE is to add an object in the semantic transition.</Paragraph>
      <Paragraph position="4"> ADD.RELATION is to add a relationship between two objects. The specific rule is created automatically by the rule generator according to the user's moves.</Paragraph>
    </Section>
    <Section position="4" start_page="82" end_page="83" type="sub_section">
      <SectionTitle>
Rule Generalization
</SectionTitle>
      <Paragraph position="0"> The rule created by the rule generator as shown in Figure 2 is very specific, and can only be activated by the training sentence. It will not be activated by other sentences such as &amp;quot;IBM Corporation seeks job candidates in Louisville...&amp;quot;. Semantically speaking, these two sentences are very much alike. Both of them express that a company is looking for professional people. However, without generalization, the second sentence will not be processed. So the use of the specific rule is very Hrnited. In order to make the specific rules applicable to a large number of unseen articles in the domain, a comprehensive generalization mechauism is necessary. We use the power of WordNet to achieve generalization.</Paragraph>
      <Paragraph position="1"> Introduction to VCbrdNet WordNet is a large-scale on-line dictionary developed by George Miller and colleagues at Princeton University (Miller, et al 1990a).</Paragraph>
      <Paragraph position="2"> The most useful feature of WordNet to the Natural Language Processing community is its attempt to organize lexical information in terms of word meanings, rather than word forms. Each entry in WordNet is a concept represented by the synset. A synset is a list of synonyms, such as {engineer, applied scientist, technologist} . The information is encoded in the form of semantic networks. For instance, in the network for nouns, there are &amp;quot;part of&amp;quot;, &amp;quot;is_a&amp;quot;, &amp;quot;member of&amp;quot;..., relationships between concepts. Philip Resnik wrote that &amp;quot;...it is d~mcult to ground taxonomic representations such as WordNet in precise formal terms, the use of the WordNet taxonomy makes reasonably clear the nature of the relationships being represented...&amp;quot; (Remik 1993). The hierarchical organization of WordNet by word meanings (Miller 1990) provides the opportunity for automated generalization. With the large amount of information in semantic classification and taxonomy provided in WordNet, many ways of incorporating WordNet semantic features with generalization are foreseeable. At this stage, we only concentrate on the Hypernym/Hyponym feature.</Paragraph>
      <Paragraph position="3"> A hyponym is defined in (Mitler, et al 1990a) as follows: &amp;quot;A noun X is said to be a hyponym of a noun Y if we can say that X is a ldnd of Y. This relation generates a hierarchical tree structure, i.e., a taxonomy.</Paragraph>
      <Paragraph position="4"> A hyponym anywhere in the hierarchy can be said to be &amp;quot;a kind of&amp;quot; all of its superordinateds .... &amp;quot; If X is a hyponym of Y, then Y is a hypernym of X.</Paragraph>
      <Paragraph position="5"> Generalization From the training process, the specific rules contain three entities on the LHS as shown in Figure 3. Each entity (sp) is a quadruple, in the form of (w,c,s,t), where w is the headword of the trained phrase; c is the part of the speech of the word; s is the sense number representing the meaning of w; t is the semantic type identified by the preprocessor for w.</Paragraph>
      <Paragraph position="6"> For each sp = (w,c,s,t), if w exists in WordNet, then there is a corresponding synset in WordNet. The hyponym/hypernym hierarchical structure provides a  1. An Abstract Specific Rule: (wl, c~, s~, tl), (w2, c2, s~, t~),(~s, cs, ss, is) &gt; ADD..NODE(wx), ADD_NODE(to2), ADD.RELATION(w2, wx, ws) 2. A Generalized Rule:  (W1, C1, Sx , T1) * Generalize( spl , hx ), (W2, Cz , S2, T2 ) e Generalize( st~z, hz ), (Ws, C3, Ss, Ts ) ~ Generalize( sI~, hs ) , ) ADD.NODE(Wx), ADD_NODE(Ws), ADD_RELATION(W~,Wx, Ws) Figure 3: Sample Rules sp = (programmer. NG, I, abet_type) various generalization degr~ Genvralize(sp, I) = {engineer, applied scientist, technologist } Gvneralize(sp, 2) = {person, individual, someone,...} Generalize(sp, 3)= {life form, organism, being, ...} Generalize(sp, 4) = {chAry)  way of locating the superordinate concepts of sp. By following additional Hypernymy, we will get more and more generMi~ed concepts and eventually reach the most general concept, such as {entity). As a result, for each concept, different degrees of generalization can be achieved by adjusting the distance between this concept and the most general concept in the WordNet hierarchy (Chai, Bierm~nn 1997).The function to accomplish this task is Generalize(sp, h), which returns a hypernym h levels above the concept ~ in the hierarchy. Generalize(sp, O) returns the synset of sp. For example, in Figure 4, the concept {programmer} is generalized at various levels based on Wordnet Hierarchy. WordNet is an acyclic structure, which suggests that a synset might have more than one hypernym. However, this situation doesn't happen often. In case it happens, the system selects the first hypernym path.</Paragraph>
      <Paragraph position="7"> The process of generMi~.ing rules consists of replacing each sp = (w,c,s,t) in the specific rules by a more general superordinate synset from its hypernym hierarchy in WorclNet by performing the Generalize(sp, h) function. The degree of generalization for rules varies with the variation of h in Generalize(sp, h).</Paragraph>
      <Paragraph position="8"> Figure 3 shows an abstract generalized rule. The E symbol signifies the subsumption relationship. Therefore, a E b signifies that a is subsumed by b, or concept b is a superordinate concept of concept a.</Paragraph>
      <Paragraph position="9"> Opt~m~7.ation Rules with different degrees of generalization on their different constituents will have a different behavior when processing new texts. A set of generalized rules for one domain might be sufficient; but in another domain, they might not be. Wit.hln a particular rule, the user might expect one entity to be relatively specific and the other entity to be more general. For example, if a user is interested in finding all DCR Inc. related jobs, he/she might want to hold the first entity as specific as that in Figure 2, and gener-M~ the third entity. The rule optimization process is to automatically control the degree of generalization in the generuli~d rules to meet user's different needs.</Paragraph>
      <Paragraph position="10"> Optimi~-ation will be described in later sections.</Paragraph>
    </Section>
    <Section position="5" start_page="83" end_page="84" type="sub_section">
      <SectionTitle>
Rule Application
</SectionTitle>
      <Paragraph position="0"> The optimally generalized rules are applied to unseen articles to achieve information extraction in the form of semantic transitions. The generaIi~.ed rule states that the RHS of the rule gets executed if a/l of the following conditions satisfy: * A sentence contains three phrases (not necessarily contiguous) with headwords W1, W2, and Ws.</Paragraph>
      <Paragraph position="1"> * The quadruples corresponding to these head-words are (Wl,C1,Sx,rl), (W2,C2,S2,r2), and (Ws, Cs,Ss, rs).</Paragraph>
      <Paragraph position="2"> * The synsets, in WordNet, corresponding to the quadruples, are subsumed by Cenemlize(spl, hi), Gener~ize(s~, h2), and Gener~/ze(s~, hs)respectively. null Figure 5 shows an example of rule matching and creating a semantic transition for the new information. In  the example, the most general rule is created by generalizing the first and the third entities in the specific rule to their top hypernyms in the hierarchy. Since verbs usually have only one level in the hierarchy, they are generalized to the syuset at the same level.</Paragraph>
    </Section>
    <Section position="6" start_page="84" end_page="86" type="sub_section">
      <SectionTitle>
Rule Optimization
</SectionTitle>
      <Paragraph position="0"> The specific rule can be generalized to the most general rule as in Figure 5. When we apply this most general rule again to the traLning corpus, a set of semantic transitions are created. Some trausitions are relevant, while the others are not. Users are expected to select the relevant transitions through a user interface. We need a mechanism to determine the level of generalization that can achieve best in extracting the relevant information and ignoring the irrelevant information. Therefore, Generalization Tree (GT) is designed to accomplish this task. While maintaining the semantic relationship of objects as in WordNet, GTs collect the relevancy information of all activating objects and automatically find the optimal level of generalization to fit the user's needs. A database is used to maintain the relevancy information for all the objects which activate each most general concept in the most general rule. This database is transformed to the GT structure, which keeps the statistical information of relevancy for each activating object and the semantic relations between the objects from WordNet. The syso tern automatically adjusts the generalization degrees for each noun entity in the rules to match the desires of the user. The idea of this optlmi~tion process is to first keep recall as high as possible by applying the most general rules, then adjust the precision by tuning the rules based on the user's specific inputs.</Paragraph>
      <Paragraph position="1"> An Example of GT Suppose we apply the most general rule in Figure 5 to the training corpus, and the entity three in the rule is activated by a set of objects shown in Table 1. From a user interface and a statistical classifier, the relevancy_rate(reLrate) for each object can be calculated. rel_rate(obj) = count of ob\] being relevant total count of occurenoe of obj As shown in Table 1, for example, rel3ate({analyst...}) = 80%, which indicates that when {entity} in the most general rule is activated by analyst, 80% of time it hits relevant information and 20% of time it hits irrelevant information. On the other hand, it suggests that if {entity} is replaced by the concept {analyst...}, a roughly 80% precision could be achieved in extracting the relevant information. The corresponding GT for Table 1 is shown in Figure 6.</Paragraph>
      <Paragraph position="2"> In GT, each activating object is the leaf node in the tree, with an edge to its immediate hyperaym (parent). For each hypernym list in the database, there is a corresponding path in GT. Besides the ordinary hard edge represented by the solid line, there is a soft edge represented by the dotted line. Concepts connected by the soft edge are the same concepts. The only difference between them is that the leaf node is the actual activating object, while the internal node is the hypernym for other activating objects. Hard edges and soft edges only have representational difference, as to the calculation, they are treated the same. Each  _professional 1 . ... 4 100% software 1 ... 5 0% =.o ! .o. .o. o.. ...</Paragraph>
      <Paragraph position="4"> node has two other fields counts of occurrence and relevancy_rate. For the leaf nodes, those fields can be filled from the database directly. For internal nodes, the instantiation of these fields depends on its children (hyponym) nodes. The calculation will be described later.</Paragraph>
      <Paragraph position="5"> If the relevancy_rate for the root node {entity} is 82.3%, it indicates that, with the probability 82.3%, objects which activate {entity} are relevant. If the user is satisfied with this rate, then it's not necessary to alter the most general concept in the rule. If the user feels the estimated precision is too low, the system will go down the tree, and check the relevancy_rate in the next level. For example, if 87.5% is good enough, then the concept {life form, organism...} will substitute {entity} in the most general rule. If the precision is still too low, the system will go down the tree, find {adult..}, {applicant..}, and replace the concept {entity} in the most general rule with the union of these two concepts.</Paragraph>
      <Paragraph position="6"> Generalization Tree Model Let's suppose Zn is a noun entity in the most general rule, and zn is activated by q concepts el deg, e~, .... eqdeg; the times of activation for each ei deg are represented by c4. Since e~(i &lt; q) activates zn, there exists a hypernym list .... z. in WordNet, where is the immediate hypernym of e~ -I. The system maintains a database of activation information as shown in Table 2, and transforms the database to a GT model automatically.</Paragraph>
      <Paragraph position="7"> GT is an n-ary branching tree structure with the foUowing properties: * Each node represents a concept, and each edge represents the hypernym relationship between the concepts. If ei is the immediate hypernym of ej, then there is an edge between node ei and ej. el is one level above ej in the tree.</Paragraph>
      <Paragraph position="8"> * The root node zn is the most general concept from the most general rule.</Paragraph>
      <Paragraph position="10"> The leaf nodes =o oo .o are the concepts which ~I ' ~' &amp;quot;&amp;quot;&amp;quot;~'q i activate zn. The internal nodes are the concepts ej (i ~ 0 and 1 &lt; j _~ q) from the hypernym paths for the activating concepts.</Paragraph>
      <Paragraph position="11"> For a leaf node eideg:</Paragraph>
      <Paragraph position="13"> Optimized Rule For each noun entity in the most general rule, the system keeps a GT from the tra~in!ng set. Depending on user's di~Ibxent needs, a threshold 0 is pre-selected. For each GT, the system will start from the root node, go dow~ the tree, and find all the nodes e~ such that reIevan~,_rate(ei) _&gt; O. If a node relevancy_rate is higher tl~an O, its children nodes will be ignored. In this way, the\[system maintains a set of concepts whose re/esan~y.r~te is higher than O, which is called Optimized-Concept~. By substituting zn in the most general rule with O p~!mized-Conc~pts, an optimized rule is created to meet ;he user's needs.</Paragraph>
      <Paragraph position="14"> The se ~rehlng algorithm is basically the breadth-first  ze Optimal-Concep~ to be empty set. Preflae threshold 0. If the user wants to get the ~t information and particularly cares about ~c~ion, 0 should be set high; if the user wants act as much as information possible and does :e about the precision, 0 should be set low.</Paragraph>
      <Paragraph position="15"> 2. Starting from the root node z, perform the .Recursive-Search algorithm, which is defined as the following:</Paragraph>
      <Paragraph position="17"> let m denote the number of children nodes o\[ z; let zi denote the child ol z (0 &lt; i _&lt; m);</Paragraph>
      <Paragraph position="19"/>
    </Section>
    <Section position="7" start_page="86" end_page="87" type="sub_section">
      <SectionTitle>
Experiment and Discussion
</SectionTitle>
      <Paragraph position="0"> In this section we present and discuss results from an experiment. The experimental domain is triangle.jobs USENET newsgroup. We trained our system on 24 articles for the extraction of six facts of interests as follows: Company Name. Examples: IBM, Metro Information Services, DCR Inc.</Paragraph>
      <Paragraph position="1"> Position/Title. Examples: programmer, financial analyst, software engineer.</Paragraph>
      <Paragraph position="2"> Experience/Skill. Example: 5 years experience in Oracle.</Paragraph>
      <Paragraph position="3"> * Location. Examples: Winston-Salem, North Carolina. null Benefit. Examples: company matching funds~ comprehensive health plan.</Paragraph>
      <Paragraph position="4"> Contact Info. Examples: Fax is 919-660-6519, e-mail address.</Paragraph>
      <Paragraph position="5"> The testing set contained 162 articles from the same domain as the system was trained on. Out of 162 articles, 21 articles were unrelated to the domain due to the misplacement made by the person who posted them. Those unrelated articles were about jobs  wanted, questions answered, ads to web site etc. First, we compared some of the statistics from the tr~nlng set and testing set. The percentage of representation of each fact in the articles for both training and teing domain is shown in Table 3, which is the number of articles containing each fact out of the total number of articles. The distribution of number of facts presented in each article is shown in Figure 7.</Paragraph>
      <Paragraph position="6"> The mean number of facts in each article from the tra;nlng set is 4.39, the standard deviation is 1.2; the mean number of facts in each article from the testing set is 4.35, the standard deviation is 1. Although these statistics are not strong enough to indicate the training set is absolutely the good trMn;ng corpora for this information extraction task, it suggests that as far as the facts of interest are concerned, the training set is a reasonable set to be trained and learned.</Paragraph>
    </Section>
    <Section position="8" start_page="87" end_page="88" type="sub_section">
      <SectionTitle>
Article
</SectionTitle>
      <Paragraph position="0"> The evaluation process consisted of the following steps: fn'st, each unseen article was studied to see if there was any fact of interest presented; second, the semantic transitions produced by the system were examined to see if they correctly extracted the fact of interest. Precision is the number of transitions correctly extracting facts of interest out of the total number of transitions produced by the system; recall is the number of facts which have been correctly extracted out of the total number of facts of interest. The overall performance of recall and precision is defined by the  where P is precision, R is recall, 13 = 1 ff precision and recall are equally important.</Paragraph>
      <Paragraph position="1"> First, we tested on single fact extraction, which was position~title fact. The purpose of this experiment is to test whether the different 8 values will lead to the expected recall, and precision statistics. From the result out of 141 related testing articles, the recall, precision, F-measurement curves are shown in Figure 8. Recall is 51.6% when 8 = 1.0, which is lower than 75% at  As mentioned earlier, 21 articles from the testing corpus are unrelated to the job advertisement domain.</Paragraph>
      <Paragraph position="2"> The interesting question rising here is can we use GT rule optimization method to achieve the information retrieval, in this particular case, to identify those unrelated articles. Certaln\]y, we would hope that optlrn;zed rules won't produce any trauqitions from the unrelated articles. The result is shown in Figure 9. The precision of unrelated articles is the number of articles without any transitions created out of total 21 articles. We can see that, when 0 = 0.8, 1.0, precision is 95.7%. Only one article out of 21 articles is mis-identified. But when 0 = 0, 0.2, the precision rate is very low, only 28.6%  and 38.1%. If we use the traditional way of keyword matching to do this information retrieval, the precision won't achieve as high as 95.7% since a few resume and job wanted postings will succeed the keyword matching and be mls-identitled as related articles.</Paragraph>
      <Paragraph position="3">  Threshold The system performance on extracting six facts is shown in Figure I0. The overall performance F-measurement gets to its peak at 70.2% when 0 = 0.8. When 0 = 1.0, the precision does not get to what we expected. One explanation is that, Im!ike the extraction of position/title fact, for extracting the six facts from the domain, the training data is quite small. It is not sufficient enough to support the user's requirement for a strong estimate of precision. 0 = 0.8 is the best choice when the training corpus is small.</Paragraph>
      <Paragraph position="4">  Threshold Some problems were also detected which prevent better performance of the system. The cu~ent domain is a newsgroup, where anyone can post anything which he/she believes is relevant to the newsgroup. It is inevitable that some typographical errors and some abbreviations occur in the articles. And the format of the article sometimes is unpredictable. If we can incorporate into the system a spo\]llng checker, and build a database for the commonly used abbreviations, the system performance is expected to be enhanced. Some problems came for the use of WordNet as well. For example, the sentence &amp;quot;DCR Inc. is looking for Q/A people&amp;quot; won't activate the most general rule in Figure 5. The reason for that is people subsumed to concept {group, grouping}, but not the concept {entity}. This problem can be fixed by adding one more rule with {group, grouping} substituting {~nt/~} in most general rule in Figure 5. WordNet has very refined senses for each concept, including some rarely used ones, which sometimes causes problems too. This kind of problem certainly hurts the performance, but it's not easy to correct because of the nature of WordNet.</Paragraph>
      <Paragraph position="5"> However, the use of WordNet generally provides a good method to achieve generalization in this domain of job advertisement.</Paragraph>
      <Paragraph position="6"> Conclusion and Future Work This paper describes a rule 0ptlmizztion approach by using Generalization Tree and WordNet. Our information extraction system learns the necessary knowledge by analyzing sample corpora through a training process. The rule optimization makes it easier for the information extraction system to be customized to a new domain. The Generalization Tree algorithm prorides a way to make the system adaptable to the user's needs. The idea of first achieving the highest recall with low precision, then adjusting precision to satisfy user's needs has been successful. We are currently studying how to enhance the system performance by further refining the generalization approach.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML