XML Viewer - w97-0809

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0809_metho.xml
Size: 28,057 bytes
Last Modified: 2025-10-06 14:14:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0809">
  <Title>The Use of Lexical Semantics in Information Extraction *</Title>
  <Section position="3" start_page="0" end_page="61" type="metho">
    <SectionTitle>
2 Lexical Acquisition
</SectionTitle>
    <Paragraph position="0"> One way to achieve lexical acquisition is to use the existing repositories of lexical knowledge, such as knowledge base, dictionaries and thesauruses. The key issue is whether those repositories can be effectively applied for the computational purpose. Many researchers have taken steps toward successful extraction of computationally useful lexical information from machine readable dictionaries and convert it into formal representation (Montemagnia and Vanderwende, 1993) (Byrd et al., 1987) (Jensen and Binot, 1987). Sparck Jones's pioneering re- null search (Jones, 1985), done in early 1960, proposed a lexical representation by synonym list. Very close to that proposal, George Miller and colleagues at Princeton University constructed a large-scale resource for lexical information-WordNet.</Paragraph>
    <Paragraph position="1"> The most useful feature of WordNet to Natural Language Processing community is the organization of lexical information in terms of word meanings, rather than word forms. It is organized by parts of speech-nouns, verbs, adjectives, and adverbs. Each entry in the WordNet is a concept represented by a list of synonyms (synset). The information is represented in the form of semantic networks. For instance, in the network for nouns, there are &amp;quot;part of&amp;quot;, &amp;quot;is_a', &amp;quot;member of&amp;quot; .... relationships between concepts. Philip Resnik has studied the lexical relationship by use of a WordNet taxonomy. He wrote that &amp;quot;...it is difficult to ground taxonomic representations such as WordNet in precise formal terms, the use of the WordNet taxonomy makes reason- ~ ably clear the nature of the relationships being represented. &amp;quot; (Resnik, 1993).</Paragraph>
    <Paragraph position="2"> Some early work of applying WordNet for the lex- ~ ical semantic acquisition can be found in NYU's MUC-4 system (Grishman et al., 1992), which c used WordNet hierarchies for semantic classification.</Paragraph>
    <Paragraph position="3"> However, they ran into the problem of automated sense disambiguation because the WordNet hierarchy is sense dependent. Ralph Grishman and his group at NYU reached the conclusion that &amp;quot;Word-Net may be a good source of concepts, but that it will not be of net benefit unless manually reviewed with respect to a particular application&amp;quot; (Grishman et al., 1992). Other research concerns using WordNet senses to tag large corpus with the lexical semantics for automated word sense disambiguation (Ng, 1997) (Wiebe et al. , 1997)</Paragraph>
  </Section>
  <Section position="4" start_page="61" end_page="61" type="metho">
    <SectionTitle>
3 Application of WordNet in the
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
System
</SectionTitle>
      <Paragraph position="0"> Our system contains three major processes which, respectively, address training, rule generalization, and the scanning of new information. WordNet is used in all three processes as shown in figure 1.</Paragraph>
      <Paragraph position="1"> During the training process, each article is partially parsed and segmented into Noun Phrases, Verb Phrases and Prepositional Phrases. An IBM LanguageWare English Dictionary and Computing Term Dictionary, a Partial Parser, a Tokenizer and a Pre-processor are used in the parsing process. The Tokenizer and the Preprocessor are designed to identify some special categories such as e-mail address, phone number, state and city etc. The user, with</Paragraph>
    </Section>
    <Section position="2" start_page="61" end_page="61" type="sub_section">
      <SectionTitle>
---------Braining Process
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
  <Section position="5" start_page="61" end_page="62" type="metho">
    <SectionTitle>
\[ Scanning Process \]
</SectionTitle>
    <Paragraph position="0"> the help of a graphical user intefface(GUI) scans a parsed sample article and indicates a series of semantic net nodes and transitions that he or she would like to create to represent the information of interest. Specifically, the user designates those noun phrases in the article that are of interest and uses the interface commands to translate them into semantic net nodes. Furthermore, the user designates verb phrases and prepositions that relate the noun phrases and uses commands to translate them into semantic net transitions between nodes. In the process, the user indicates the desired translation of the specific information of interest into semantic net form that can easily be processed by the machine.</Paragraph>
    <Paragraph position="1"> When the user takes the action to create the semantic transitions, a Rule Generator keeps track of the user's moves and creates the rules automatically.</Paragraph>
    <Paragraph position="2"> WordNet is used to provide the sense information during the training. For each headword in a noun/verb phrase, many senses are available in WordNet. We trained 24 articles with 1129 head- null words from &amp;quot;triangle.job&amp;quot; domain, and found that 91.7% of headwords were used as sense one in Word-Net. The sense distribution is shown in figure 2.</Paragraph>
    <Paragraph position="3"> Based on this observation, by default, the system assigns sense one to every headword, while providing the user the option to train the sense other than one. For example, &amp;quot;opening&amp;quot; often appears in the job advertisement domain. But instead of using the first sense as {opening, gap}, it uses the fourth sense as {opportunity, chance}. The user needs to train &amp;quot;opening&amp;quot; to be sense four during the training process. The Sense Table keeps the record of these head-words and their most frequently used senses (other than one).</Paragraph>
    <Paragraph position="4"> Rules created from the training process are specific to the training articles and must be generalized before being applied on other articles in the domain.</Paragraph>
    <Paragraph position="5"> According to different requirements from the user, in the rule generalization process, a rule optimization engine, based on WordNet, generalizes the specific rules and forms a set of optimized rules for processing new information. This rule generalization process will be described in the later sections.</Paragraph>
    <Paragraph position="6"> During the scanning of new information, with the help of a rule matching routine, the system applies the optimized rules on a large number of unseen articles from the domain. If headwords are not in the Sense Table, sense one in WordNet will be assigned; otherwise, the Sense Table provides them their most frequently used senses in the domain. The output of the system is a set of semantic transitions for each article that specifically extract information of interest to the user. Those transitions can then be used by a Postprocessor to fill templates, answer queries, or generate abstracts.</Paragraph>
  </Section>
  <Section position="6" start_page="62" end_page="64" type="metho">
    <SectionTitle>
4 Rule Generalization
</SectionTitle>
    <Paragraph position="0"> The Rule Generalization engine is crucial to the whole system because it makes the customizing process easier. The user only needs to train on a comparably small amount of data from the domain, and the system will automatically revise the rules to make them applicable for large amount of new information. null</Paragraph>
    <Section position="1" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
4.1 Rules
</SectionTitle>
      <Paragraph position="0"> In a typical information extraction task, the most interesting part is the events and relationships holding among the events (Appelt et al., 1995). These relationships are usually specified by verbs and prepositions. Based on this observation, the left hand side (LHS) of our information extraction rules is made up of three entities. The first and the third entities are the target objects in the form of noun phrases, the second entity is the verb or prepositional phrase indicating the relationship between the two objects.</Paragraph>
      <Paragraph position="1"> The right hand side (RHS) of the rule consists of the operations required to create a semantic transition-ADD.NODE, ADD_RELATION. ADD.NODE is to add an object in the transitions. ADD.RELATION is to add a relationship between two objects. The specific rule generated from the training process is shown in figure 3 rule 1.</Paragraph>
      <Paragraph position="2"> Rule 1 in figure 3 is very specific, and it can be activated only by a sentence with the same pattern as &amp;quot;DCR Inc. is looking for C programmers... &amp;quot;. It will not be activated by other Sentences such as &amp;quot;IBM Corporation seeks job candidates in Louisville, KY with HTML experience&amp;quot;. Semantically speaking, these two sentences are very much alike. Both of them are about a company that seeks some kind of person. However, without generalization, the second sentence will not be processed. So the use of the specific rule is very limited.</Paragraph>
      <Paragraph position="3"> In order to make the specific rules applicable to a large number of unseen articles in the domain, a comprehensive generalization mechanism is necessary. We are not only interested in the generalization itself, but also in a strategy to control the degree of generalization for various applications in different domains.</Paragraph>
    </Section>
    <Section position="2" start_page="62" end_page="64" type="sub_section">
      <SectionTitle>
4.2 Generalization Scheme
</SectionTitle>
      <Paragraph position="0"> The hierarchical organization of WordNet by word meanings (Miller, 1990) provides the opportunity for automated generalization. With the large amount of information in semantic classification and taxonomy provided in WordNet, many ways of incorporating WordNet semantic features with generalization are foreseeable. At this stage, we only concentrate on the Hypemym/Hyponym feature.</Paragraph>
      <Paragraph position="1"> A hyponym is defined in (Miller et al., 1990a) as follows: &amp;quot; A noun X is said to be a hyponym of a noun Y if we can say that X is a kind of Y. This relation generates a hierarchical tree structure, i.e., a taxonomy. A hyponym anywhere in the hierarchy can be said to be &amp;quot;a kind of&amp;quot; all of its superordinateds .... &amp;quot; If X is a hyponym of Y, then Y is a hypernym of X.</Paragraph>
      <Paragraph position="2"> From the training process, the specific rules contain three entities on the LHS. An abstract specific rule is shown in rule 2 in figure 3. Each entity (sp) is a quadruple, in the form of (w, c, s, t), where w is the headword of the trained phrase; c is the part of the speech of the word; s is the sense number representing the meaning of w; t is the semantic type identified by the preprocessor for w.</Paragraph>
      <Paragraph position="3"> For each sp = (w,c,s,t), if w exists in WordNet,  1. An Example of the Specific Rule: \[DCR Inc, NG, 1,company\], \[look.for, VG, 1, other_type\], \[programmer, NG, 1, other_type\] ADD.NODE(DCR Inc.), ADD_NODE(programmer), ADD_RELATION(look.for, DCR Inc., programmer) 2. An Abstract Specific Rule: (Wl, el, 81, tl), (W2, C2, 82, t2),(w3, c3, 83, t3) ADD_NODE(w1), ADD_NODE(w2), ADD_RELATION(w~, w2, w3) 3. An Abstract Generalized Rule: (W1, C1, $1, T1) E Generalize( spl , hl ) , (W2 , C2, $2, T2 ) E Generalize( sp2, h2 ), (W~, C3, Ss, T3) E Generalize(sp3, h3) ADD_NODE(W1), ADD_NODE(Ws), ADD_RELATION(W2,Wi, W3) 4. An Example of the Most General Rule:</Paragraph>
      <Paragraph position="5"> then there is a corresponding synset in WordNet.</Paragraph>
      <Paragraph position="6"> The hyponym/hypernym hierarchical structure provides a way of locating the superordinate concepts of sp. By following additional Hypernymy, we will get more and more generalized concepts and eventually reach the most general concept, such as {entity}.</Paragraph>
      <Paragraph position="7"> Based on this scenario, for each concept, different degrees of generalization can be achieved by adjusting the distance between this concept and the most general concept in the WordNet hierarchy (Bagga et al., 1997). The function to accomplish this task is Generalize(x,h), which returns a synset list h levels above the concept z in the hierarchy.</Paragraph>
      <Paragraph position="8"> WordNet is an acyclic structure, which suggests that a synset might have more than one hypernym.</Paragraph>
      <Paragraph position="9"> However, This situation doesn't happen often. We tested on 150 randomly chosen articles from &amp;quot;triangle.job&amp;quot; newsgroup. Totally there were 12115 phrases including 1829 prepositions, 1173 phrases with headwords not in WordNet and 9113 phrases with headwords in WordNet. Within 9113 headwords, 722 headwords (7.9%), either themselves or their hypernym had more than one superordinate.</Paragraph>
      <Paragraph position="10"> Furthermore, 90% of 722 cases came from two superordinates of {person, individual, someone, moral, human soul}, which are {life_form, organism, being, living thing}, and {causal agent, cause, causal agency}. Certainly, in some cases, {person...} is a kind of {causal agent...}, but identifying it as hyponym of {life_form...} also makes the sense. Based on this scenario, for the sake of simplicity, the system selects the first superordinate if more than one are presented.</Paragraph>
      <Paragraph position="11"> The process of generalizing rules consists of replacing each sp = (w,c,s,t) in the specific rules by a more general superordinate synset from its hypernym hierarchy in WordNet by performing the Generalize(s, h) function. The degree of generalization for rules varies with the variation of h in Generalize( sp, h ).</Paragraph>
      <Paragraph position="12"> Rule 3 in figure 3 shows an abstract generalized rule. The E symbol signifies the subsumption relationship. Therefore, a E b signifies that a is subsumed by b, or, in WordNet terms, concept b is a superordinate concept of concept a. The generalized rule states that the RHS of the rule gets executed if</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="64" end_page="68" type="metho">
    <SectionTitle>
5 Generalization Tree
</SectionTitle>
    <Paragraph position="0"> The generalization degree is adjustable by the user.</Paragraph>
    <Paragraph position="1"> Rules with different degrees of generalization on their different constituents will have a different behavior when processing new texts. Within a particular rule, the user might expect one entity to be relatively specific and the other entity to be more general. For example, if a user is interested in finding all DCR Inc. related jobs, he/she might want to hold the first entity as specific as that in rule 1 in figure 3, and generalize the third entity. We have designed a Generalization Tree (GT) to control the generalization degree.</Paragraph>
    <Paragraph position="2"> The rule generalization process with the help of GT is illustrated in figure 4. Each specific rule(as shown in rule 1 in figure 3) is generalized to its most general form(as shown in rule 4 in figure 3) by a generalization engine based on WordNet. Specifically, the generalization engine generalizes noun entities in the specific rule to their top hypernym in the hierarchies. The most general rule is applied again to the training corpus and some transitions are created. Some transitions are relevant, while others are not. Then the user employs our system to classify the created transitions as either acceptable or not. The statistical classifier calculates the relevancy_rate for each object, which will be described later. A database is maintained to keep the relevancy information for all the objects which activate the most general concept in the most general rule.</Paragraph>
    <Paragraph position="3"> This database is later automatically transformed to the GT structure. While maintaining the semantic relationships of objects as in WordNet, GTs collect the relevancy information of all activating objects and find the optimal level of generalization to fit the user's needs. The system will automatically adjust the generalization levels for each noun entity to match the desires of the user. The idea of this optimization process is to first keep recall as high as possible by applying the most general rules, then adjust the precision by tuning the rules based on the user's specific inputs.</Paragraph>
    <Section position="1" start_page="64" end_page="65" type="sub_section">
      <SectionTitle>
5.1 An Example of GT
</SectionTitle>
      <Paragraph position="0"> Suppose we apply rule 4 in figure 3 to the training corpus, and the entity three in the rule is activated by a set of objects shown in table 1. From a user interface and a statistical classifier, the relevancy_rate(re0 for each object can be calculated.</Paragraph>
      <Paragraph position="1"> rel(obj) = count of obj being relevant total count of occurence of obj As shown in table 1, for example, rel({analyst...}) = 80%, which indicates that when (entity} in the most general rule is activated by analyst, 80% of time it hits relevant information and 20% of time it hits irrelevant information. On the other hand, it suggests that if { entity} is replaced by the concept (analyst...}, a roughly 80% precision could be achieved in extracting the relevant information. The corresponding GT for table 1 is shown in figure 5.</Paragraph>
      <Paragraph position="2"> In GT, each activating object is the leaf node in the tree, with an edge to its immediate hypernym (parent). For each hypernym list in the database,</Paragraph>
      <Paragraph position="4"> there is a corresponding path in GT. Besides the ordinary hard edge represented by the solid line, there is a soft edge represented by the dotted line. Concepts connected by the soft edge are the same concepts. The only difference between them is that the leaf node is the actual activating object, while the internal node is the hypernym for other activating objects. Hard edges and soft edges only have representational difference, as to the calculation, they are treated the same. Each node has two other fields counts of occurrence and relevancy_rate. For the leaf nodes, those fields can be filled from the database directly. For internal nodes, the instantiation of these fields depends on its hyponym (children) nodes. The calculation will be described later.</Paragraph>
      <Paragraph position="5"> If the relevancy.xate for the root node { entity} is 82.3%, it indicates that, with the probability 82.3%, objects which activate { entity} are relevant. If the user is satisfied with this rate, then it's not necessary to alter the most general concept in the rule.</Paragraph>
      <Paragraph position="6"> If the user feels the estimated precision is too low, the system will go down the tree, and check the relevancy.xate in the next level. For example, if 87.5% is good enough, then the concept {life form, organism...} will substitute {entity} in the most general rule. If the precision is still too low, the system will go down the tree, find {adult..}, {applicant..}, and replace the concept { entity} in the most general rule with the union of these two concepts.</Paragraph>
    </Section>
    <Section position="2" start_page="65" end_page="66" type="sub_section">
      <SectionTitle>
5.2 Generalization Tree Model
</SectionTitle>
      <Paragraph position="0"> For the sake of simplicity, let's use x,, y,, zl to represent the rule constituents- object one, relation, ob- null ject two respectively. As shown in table 2, xo, yo, z0 are the concepts from the specific rule. At the moment, we only consider the generalization on the objects, zs and z~ are more general concepts than x0 and z0. x~ is the hypemym of xz-1 (i _&lt; n); z: is the hypernym of za-1 (j _&lt; m). Xn and Zm are the most general concepts for object one and object two respectively.</Paragraph>
      <Paragraph position="1"> For each object concept, a corresponding GT is created. Let's suppose xn is activated by q concepts el deg, e2 deg, .... edegq; the times of activation for each e~ deg are represented by c~. Since e~deg(i _&lt; q) activates xn, there o ~ e~ =~ .... =~ xn in Word- exists a hypernym list e, Net, where e~ is the immediate hypernym of e~ -1.</Paragraph>
      <Paragraph position="2"> The system maintains a database of activation information as shown in table 3, and builds GT from this database automatically.</Paragraph>
      <Paragraph position="3"> GT is an n-ary branching tree structure with the following properties: * Each node represents a concept, and each edge represents the hypernym relationship between the concepts. If e~ is the immediate hypernym of ca, then there is an edge between node e~ and ej. e~ is one level above ea in the tree.</Paragraph>
      <Paragraph position="4"> * The root node xn is the most general concept from the most general rule.</Paragraph>
      <Paragraph position="5"> * The leaf nodes .o ~0 .o el, ~2,...~q are the concepts which activate xn. The internal nodes are the concepts e~ (i ~ 0 and 1 &lt; j &lt; q) from the hypernym paths for the activating concepts.</Paragraph>
      <Paragraph position="6"> o has three fields-concept it- , Every leaf node e~ self e~ deg , counts and relevancy_rate, which can be obtained from the database:</Paragraph>
      <Paragraph position="8"> * Every internal node e has three fields-concept itself e, relevancy_rate and counts(e).</Paragraph>
      <Paragraph position="9"> For an internal node e, if it has n hyponyms eo, ...en then:</Paragraph>
      <Paragraph position="11"/>
    </Section>
    <Section position="3" start_page="66" end_page="66" type="sub_section">
      <SectionTitle>
5.3 Searching GT
</SectionTitle>
      <Paragraph position="0"> Depending on user's different needs, a threshold 9 is pre-selected. The system will start from the root node, go down the tree, and find all the nodes e, such that relevancy.rate(e~) &gt; 0. If a node relevancy_rate is higher than 9, its hyponym (children) nodes will be ignored. In this way, the system maintains a set of concepts whose relevancy_rate is higher than 8. By substituting xn in the most general rule with this set of concepts, an optimized rule is created to meet the user's needs.</Paragraph>
      <Paragraph position="1"> The searching algorithm is basically the breadth-first search as follows: 1. Initialize Optimal-Concepts to be empty set.</Paragraph>
      <Paragraph position="2"> Pre-select the threshold 0. If the user wants to get the relevant information and particularly cares about the precision, 0 should be set high; if the user wants to extract as much as information possible and does not care about the precision, 0 should be set low.</Paragraph>
      <Paragraph position="3"> 2. Starting from the root node x, perform the Recursive-Seareh algorithm, which is defined as the following:</Paragraph>
      <Paragraph position="5"> put x into Optimal-Concepts set; exit; ) else { let m denote the number of children nodes of x; let x, denote the child of x (0 &lt; i _&lt; m);</Paragraph>
      <Paragraph position="7"/>
    </Section>
    <Section position="4" start_page="66" end_page="68" type="sub_section">
      <SectionTitle>
5.4 Experiment and Discussion
</SectionTitle>
      <Paragraph position="0"> An experiment is conducted to test the applicability of GT in automatic information extraction. We trained our system on 24 articles from the triangle.jobs USENET newsgroups, and created 25 specific rules concerning the job position/title information. For example, in &amp;quot;DCR. is looking for software engineer&amp;quot;, software engineer is the position name.</Paragraph>
      <Paragraph position="1"> The specific rules then were generalized to their most general forms, and were applied again to the training set. After the user's selection of the relevant transitions, the system automatically generated a GT for each most general concept in the most general rule. We predefined the threshold to be 0.2, 0.4, 0.5, 0.6, 0.8, 0.9 and 1.0. Based on the different thresholds, the system generated different sets of optimized  rules. Those rules were then applied on 85 unseen articles from the domain.</Paragraph>
      <Paragraph position="2"> The evaluation process consists of the following step: first, each unseen article is studied to see if there is position/title information presented in the article; second, the semantic transitions produced by the system are examined to see if they correctly extract the position/title information. Precision is the number of transitions created which containing position/title information out of the total number of transitions produced by the system; recall is the number of articles which have been correctly extracted position/title information out of the total number of articles with position/title information.</Paragraph>
      <Paragraph position="3"> The overall performance of recall and precision is</Paragraph>
      <Paragraph position="5"> where P is precision, R is recall,/~ = 1 if precision and recall are equally important. The precision , recall and F-measurement curves with respect to the threshold for relevancy_rate are shown in figure 6.</Paragraph>
      <Paragraph position="6"> The detailed result is shown in table 4.</Paragraph>
      <Paragraph position="7">  The recall achieves the highest at 81.3% when 0 = 0.2. It gradually declines and reaches 66.7% when 8 = 1.0. As expected, the precision increases when 0 goes up. It ranges from very low at 33.3% (0 = 0.2) to very high at 94.6%( 0 = 1.0). The overall performance F-measurement goes up from 4?.2% to 78.7% when 0 increases. The result is consistent with our expectation. When the threshold is high, more tuning of the rules needs to be done, and the system is expected to perform better.</Paragraph>
      <Paragraph position="8"> Some problems were detected which prevent better performance of the system. The current domain is a newsgroup, where anyone can post anything which he/she believes is relevant to the newsgroup. It is inevitable that some typographical errors and some abbreviations occur in the articles. And the format of the article sometimes is unpredictable. The system performance is also hurt by the error in the partial parsing.</Paragraph>
      <Paragraph position="9"> In the experiment, we found that WordNet has about 90% coverage of verbs and nouns in this domain. Most nouns not in WordNet are proper nouns, and in this domain, mostly are company names, software names. This problem is solved by our Preprocessor, which identifies the proper nouns to be several semantic types, such as company name, software name, city name, and so on. However Some important domain specific nouns may not exist in WordNet. It would be nice if WordNet could provide the friendly interface for users to add the new words and create the links for their own applications. As to computational purpose, WordNet is well developed. Finding hypernym, synonyms...etc is very efficient. Training senses at the training process solves the most problems of sense disambiguation. However, some problems still remain. For example, if &amp;quot;better&amp;quot; is not trained in the training process, then by default, it will be assigned sense one, which is a subtype of a person. The hypernym list of &amp;quot;better&amp;quot; with sense one is {better} =~ {superior} {religion} ~ {Religionist} =~ {person}. But in the sentence &amp;quot;This position requires experience with 5.0 or better&amp;quot;, &amp;quot;better&amp;quot; should be used as sense two as in the hypernym list {better} =~ {good, goodness} :-~ {asset, plus) ~ {quality} {attribute} ~ {abstraction} . Despite occasional sense disambiguation problem, generally, WordNet provides a good method to achieve generalization in</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML