File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-1011_metho.xml
Size: 21,131 bytes
Last Modified: 2025-10-06 14:08:07
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1011"> <Title>Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Lexico-syntactic patterns expressing meronymy </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Variety of meronymy expressions </SectionTitle> <Paragraph position="0"> Since there are many ways in which something can be part of something else, there is a variety of lexico-syntactic structures that can express the meronymy semantic relation. Expressions that reflect semantic relations are either explicit or implicit. The explicit ones are further broken down into unambiguous and ambiguous.</Paragraph> <Paragraph position="1"> A. Explicit part-whole constructions There are unambiguous lexical expressions that always convey a part-whole relation. For example: The substance consists of two ingredients. null The cloud was made of dust.</Paragraph> <Paragraph position="2"> Iceland is a member of NATO.</Paragraph> <Paragraph position="3"> In these cases the simple detection of the patterns leads to the discovery of part-whole relations.</Paragraph> <Paragraph position="4"> On the other hand, there are many ambiguous expressions that are explicit but convey part-whole relations only in some contexts. These expressions can be detected only with complex semantic constraints.</Paragraph> <Paragraph position="5"> Examples are: The horn is part of the car.</Paragraph> <Paragraph position="6"> (whereas ''He is part of the game'' is not meronymic).</Paragraph> <Paragraph position="7"> B. Implicit part-whole constructions In addition to the explicit patterns, there are other patterns that express part-whole relations implicitly. Examples are: girl's mouth, eyes of the baby, door knob, oxygen-rich water, high heel shoes.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 An algorithm for finding lexico-syntactic </SectionTitle> <Paragraph position="0"> patterns In order to identify lexical forms that express part-whole relations, the following algorithm was used: Step 1. Pick pairs of WordNet concepts a0a2a1 , a0a4a3 among which there is a part-whole relation.</Paragraph> <Paragraph position="1"> We selected 100 pairs of part-whole concepts that were evenly distributed over all nine WordNet noun hierarchies. null Step 2. Extract lexico-syntactic patterns that link the two selected concepts of each pair by searching a collection of texts.</Paragraph> <Paragraph position="2"> For each pair of part-whole concepts determined above, search a collection of documents and retain only the sentences containing that pair. We chose two distinct text collections: SemCor 1.7 and LA Times from TREC-9. From each collection 10,000 sentences were selected randomly. We manually inspected these sentences and picked only those in which the pairs referred to meronymy.</Paragraph> <Paragraph position="3"> The result of this step is a list of lexico-syntactic expressions that reflect meronymy. From syntactic point of view, these patterns can be classified in two major categories: null a5 Phrase-level patterns, where the part and whole concepts are included in the same phrase. For example, in the pattern &quot;a6a8a7a10a9a11a7a12a7a10a13 &quot; the noun phrase that contains the part (X) and the prepositional phrase that contains the whole (Y) form a noun phrase (NP). Throughout this paper, X represents the part, and Y represents the whole.</Paragraph> <Paragraph position="4"> a5 Sentence-level patterns, where the part-whole relation is intrasentential. A frequent example is the pattern &quot;a6a14a7 a13 verb a6a14a7 a9 &quot;.</Paragraph> <Paragraph position="5"> From the 20,000 SemCor and LA Times sentences, 535 part-whole occurrences were detected. Of these 493 (92.15%) were phrase-level patterns and only 42 sentence-level patterns. There were 54 distinct meronymic lexico-syntactic patterns, of which 36 phrase-level patterns and 18 sentence-level patterns. The most frequent phrase-level patterns were: &quot;a6a14a7a10a15 of a6a8a7a17a16 &quot; occurring 173 of 493 times or 35%; &quot;a6a14a7a10a15 's a6a8a7a17a16 &quot; occurring 71 of 493 times or 14%; The most frequent sentence-level pattern was &quot;a6a14a7a10a15 Verb a6a8a7a17a16 &quot; occurring 18 of 42 times (43%). These observations are consistent with the results in (Evens et al. , 1980). Based on these statistics, we decided to focus in this paper only on the three patterns above. The problem, however, is that these are some of the most ambiguous part-whole relation patterns. For example, in addition to meronymic relations, the genitives can express POSSESSION (Mary's toy), KINSHIP (Mary's brother), and many other relations. The same is true for &quot;a6a8a7 a15 Verb a6a8a7 a16 &quot; patterns (&quot;Kate has green eyes&quot; is meronymic, while &quot;Kate has a cat&quot; is POSSESSION). As it can be seen, the genitives and the have-verb patterns are ambiguous. Thus we need some semantic constraints to differentiate the part-whole relations from the other possible meanings these patterns may have.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Learning Semantic Constraints </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Approach </SectionTitle> <Paragraph position="0"> The learning procedure proposed here is supervised, for the learning algorithm is provided with a set of inputs along with the corresponding set of correct outputs.</Paragraph> <Paragraph position="1"> Based on a set of positive and negative meronymic training examples provided and annotated by the user, the algorithm creates a decision tree and a set of rules that classify new data. The rules produce constraints on the noun constituents of the lexical patterns.</Paragraph> <Paragraph position="2"> For the discovery of the semantic constraints we used C4.5 decision tree learning (Quinlan, 1993). The learned function is represented by a decision tree, or a set of if-then rules. The decision tree learning searches a complete hypothesis space from simple to complex hypotheses until it finds a hypothesis consistent with the data. Its bias is a preference for the shorter tree that places high information gain attributes closer to the root. The error in the training examples can be overcome by using different training and a test corpora, or by cross-validation techniques. null C4.5 receives in general two input files, the NAMES file defining the names of the attributes, attribute values and classes, and the DATA file containing the examples.</Paragraph> <Paragraph position="3"> The output of C4.5 consists of two types of files, the TREE file containing the decision tree and some statistics, and the RULES file containing the rules extracted from the decision tree and some statistics for training and test data. This last file also contains a default rule that is usually used to classify unseen instances when no other rule applies.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Preprocessing Part-Whole Lexico-Syntactic Patterns </SectionTitle> <Paragraph position="0"> Since our constraint learning procedure is based on the semantic information provided by WordNet, we need to preprocess the noun phrases (NPs) extracted and identify the part and the whole. For each NP we keep only the largest word sequence (from left to right) that is defined in WordNet as a concept.</Paragraph> <Paragraph position="1"> For example, from the noun phrase &quot;brown carving knife&quot; the procedure retains only &quot;carving knife&quot;, as it is the WordNet concept with the largest number of words in the noun phrase. For each such concept, we manually annotate it with its corresponding sense in WordNet, for example carving knifea0 1 means sense number 1.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Building the Training Corpus and the Test Corpus </SectionTitle> <Paragraph position="0"> In order to learn the constraints, we used the SemCor 1.7 and TREC 9 text collections. From the first two sets of the SemCor collection, 19,000 sentences were selected.</Paragraph> <Paragraph position="1"> Another 100,000 sentences were extracted from the LA Times articles of TREC 9. A corpus &quot;A&quot; was thus created from the selected sentences of each text collection. Each sentence in this corpus was then parsed using the syntactic parser developed by Charniak (Charniak, 2000).</Paragraph> <Paragraph position="2"> Focusing only on the sentences containing relations indicated by the three patterns considered, we manually annotated all the noun phrases in the 53,944 relationships matched by these patterns with their corresponding senses in WordNet (with the exception of those from SemCor). 6,973 of these relationships were part-whole relations, while 46,971 were not meronymic relations.</Paragraph> <Paragraph position="3"> We used for training a corpus of 34,609 positive examples (6,973 pairs of NPs in a part-whole relation extracted from the corpus &quot;A&quot; and 27,636 extracted from WordNet as selected pairs) and 46,971 negative examples (the nonpart-whole relations extracted from corpus &quot;A&quot;).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Learning Algorithm </SectionTitle> <Paragraph position="0"> Input: positive and negative meronymic examples of pairs of concepts.</Paragraph> <Paragraph position="1"> Output: semantic constraints on concepts. Step 1. Generalize the training examples Initially, the training corpus consists of examples that have the following format:</Paragraph> <Paragraph position="3"> where target can be either &quot;Yes&quot; or &quot;No&quot;, as the relation between the part and whole is meronymy or not.</Paragraph> <Paragraph position="4"> For example, a1 oasisa0 1; deserta0 1; Yesa2 indicates that between oasis and desert there is a meronymic relation.</Paragraph> <Paragraph position="5"> From this initial set of examples an intermediate corpus was created by expanding each example using the following format:</Paragraph> <Paragraph position="7"> where class part and class whole correspond to the WordNet semantic classes of the part, respectively whole concepts. For instance, the initial</Paragraph> <Paragraph position="9"> ariaa0 1 belongs to the entitya0 1 hierarchy in WordNet and the whole concept operaa0 1 is part of the abstractiona0 6 hierarchy.</Paragraph> <Paragraph position="10"> From this intermediate corpus a generalized set of training examples was built, retaining only the semantic classes and the target value. At this point, the generalized training corpus contains three types of examples:</Paragraph> <Paragraph position="12"> The third situation occurs when the training corpus contains both positive and negative examples for the same hierarchy types. For example, both rela-</Paragraph> <Paragraph position="14"> the more general type a1 entitya0 1; entitya0 1; Yes/Noa2 . However, the first example is negative (a POSSESSION relation), while the second one is a positive example.</Paragraph> <Paragraph position="15"> Step 2. Learning constraints for unambiguous examples For the unambiguous examples in the generalized training corpus (those that are either positive or negative), constraints are determined using C4.5. In this context, the features are the components of the relation (the part and, respectively the whole) and the values of the features are their corresponding WordNet semantic classes (the furthest ancestor in WordNet of the corresponding concept). With the first two types of examples, the unambiguous ones, a new training corpus was created on which we applied C4.5 using a 10-fold cross validation. The output is represented by 10 sets of rules generated from these unambiguous examples.</Paragraph> <Paragraph position="16"> The rules in each set were ranked according to their frequency of occurrence and average accuracy obtained for that particular set. In order to use the best rules, we decided to keep only the ones that had a frequency above a threshold (occur in at least 7 of the 10 sets of rules) and an average accuracy greater than 50a2 .</Paragraph> <Paragraph position="17"> Step 3. Specialize the ambiguous examples A part of the generalized training corpus contains ambiguous examples. These examples refer to the same semantic classes in WordNet, but their target value is in some cases &quot;Yes&quot; and in others &quot;No&quot;. Since C4.5 cannot be applied in this situation, we recursively specialize these examples to eliminate the ambiguity.</Paragraph> <Paragraph position="18"> The specialization procedure is based on the IS-A information provided by WordNet. Initially, each semantic class represented the root of one of the noun hierarchies in WordNet. By specialization, a semantic class is replaced with its first hyponym, i.e. the concept immediately below in the hierarchy. For this task, we considered again the intermediate training corpus of examples.</Paragraph> <Paragraph position="19"> For instance, the examples a1 apartmenta0 1,</Paragraph> <Paragraph position="21"> examples are thus generalized in the less ambiguous examples a1 wholea0 2; causal agenta0 1; Noa2 and a1 parta0 7; causal agenta0 1; Yesa2 . This way, we specialize the ambiguous examples with more specific values for the attributes. The specialization process for this particular example is shown in Figure 1.</Paragraph> <Paragraph position="23"> tic classes Although this specialization procedure eliminates a part of the ambiguous examples, there is no guarantee it will work for all the ambiguous examples of this type. This is because the specialization splits the initial hierarchy into smaller distinct subhierarchies, and thus, the examples are distributed over this new set of subhierarchies. For the examples described above, the procedure eliminates the ambiguity through specialization of the semantic classes into two new ones: whole - causal agent, and respectively part - causal agent. However, if the training corpus contained the examples a1 lega0 2;</Paragraph> <Paragraph position="25"> Noa2 , the procedure specializes them in the ambiguous example a1 parta0 7; organisma0 1; Yes/Noa2 and the ambiguity still remains.</Paragraph> <Paragraph position="26"> Steps 2 and 3 are repeated until there are no more ambiguous examples. The general architecture of this procedure is shown in Figure 2. Here is an example</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.5 The Constraints </SectionTitle> <Paragraph position="0"> Table 1 summarizes the constraints learned by the program. The meaning of a constraint with the part Class-X, the whole Class-Y and the value 1 is &quot;if Part is a Class-X and Whole is a Class-Y then it is a part-whole relation&quot; and for the value 0 is &quot;if Part is a Class-X and Whole is a Class-Y then it is not a part-whole relation&quot;. For example, &quot;if Part is an entitya0a4a3 and the Whole is a wholea0 2 then it is not a part-whole relation&quot;. (wholea0 2 is the WordNet concept meaning &quot;an assemblage of parts that is regarded as a single entity&quot;).</Paragraph> <Paragraph position="1"> When forming larger, more complex rules, if the part and the whole contain more then one value, one of these values is negated (preceded by !). For example for the part objecta0 1 and the whole organisma0 1 the constraint is &quot;if the Part is objecta0 1 and not substancea0 1 and not natural objecta0 1 and the Whole is organisma0 1 and not planta0 2 and not animala0 1 then NO part-whole relation&quot;. null</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Results for discovering part-whole </SectionTitle> <Paragraph position="0"> relations To validate the constraints for extracting part-whole relations, a new test corpus &quot;B&quot; was created from other 10,000 sentences of TREC-9 LA Times news articles. This corpus was parsed and disambiguated using a Word Sense Disambiguation system that has an accuracy of 81a2 when disambiguating nouns in open-domain (Mihalcea and Moldovan, 2001). The results provided by the part-whole relation discovery procedure were validated by a human annotator.</Paragraph> <Paragraph position="1"> Let us define the precision and recall performance metrics in this context: On the test corpus there were 119 meronymy relations expressed by the three patterns considered. The system retrieved 140 relations, of which 117 were meronymy relations and 23 were non-meronymy relations, yielding a precision of 83% and a recall of 98%. Table 2 shows the results obtained for each of the three patterns and for all of them.</Paragraph> <Paragraph position="2"> However, there were other 43 manner relations found in the corpus, expressed by other than the three lexico-syntactic patterns considered in this paper, yielding a global meronymy relation coverage (recall) of 72a2 . [117/119+43] The errors are explained mostly by the fact that the genitives and the verb have are very ambiguous. These lexico-syntactic patterns encode numerous relations which are very difficult to disambiguate based only on the nouns they connect. The errors were also caused by the incorrect parsing of a few s-genitives, the use of the rules with smaller accuracy (e.g. 50%), the wrong word sense disambiguation of some concepts, and the lack of named entity recognition in WordNet (e.g., proper names of persons, places, etc.).</Paragraph> <Paragraph position="3"> and 0 means &quot;Is not a part-whole relation&quot; )</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Application to Question Answering </SectionTitle> <Paragraph position="0"> The part-whole semantic relation occurs with high frequency in open text. Its discovery is paramount for many applications. In this section we mention only Question Answering. For many questions such as &quot;What parts does General Electric manufacture?&quot;, &quot;What are the components of X?&quot;, &quot;What is Y made of?&quot;, etc., the discovery of part-whole relations is necessary to assemble the right answer.</Paragraph> <Paragraph position="1"> The concepts and part-whole relations acquired from a collection of documents can be useful in answering difficult questions that normally cannot be handled based solely on keywords matching and proximity. As the level of difficulty increases, Question Answering systems need richer semantic resources, including ontologies and larger knowledge bases. Consider the question: What does the AH-64A Apache helicopter consist of? For questions like this, the system must extract all the components the war helicopter has. Unless an ontology of such army attack helicopter parts exists in the knowledge base, which in an open domain situation is highly unlikely, the system must first acquire from the document collection all the direct and indirect pieces the helicopter is made of. These parts can be scattered all over the text collection, so the Question Answering system has to gather together these partial answers into a single and concise hierarchy of parts. This technique is called Answer Fusion.</Paragraph> <Paragraph position="2"> Using a state-of-the-art Question Answering system (Harabagiu et al. , 2001) adapted for Answer Fusion (Girju, 2001) and including a meronymy module, the question presented above was answered by searching the Internet at the website for Defence Industries - Army (www.army-technology.com). The system started with the question focus AH-64A Apache helicopter and extracted and disambiguated all the meronymy relations using the part-whole module. The following taxonomic ontology was created for this question: Longbow millimetre wave fire control radar integrated radar frequency interferometer rotating turret tandem cockpit Kevlar seats For example, the relation &quot;AH-64A Apache helicopter has part Hellfire air-to-surface missile&quot; was determined from the sentence &quot; AH-64A Apache helicopter has a Longbow-millimeter wave fire control radar and a Hellfire air-to-surface missile&quot;. For validation only the heads of the noun phrases were considered as they occur in WordNet (i.e., helicopter and air-to-surface missile, respectively). null</Paragraph> </Section> class="xml-element"></Paper>