File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1029_metho.xml
Size: 19,767 bytes
Last Modified: 2025-10-06 14:08:14
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1029"> <Title>An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Subtree model </SectionTitle> <Paragraph position="0"> Our research on improved representation models for extraction patterns is motivated by the limitations of the prior extraction pattern representations. In this section, we review two of the previous models in detail, namely the Predicate-Argument model (Yangarber et al., 2000) and the Chain model (Sudo et al., 2001).</Paragraph> <Paragraph position="1"> The main cause of difficulty in finding entities by extraction patterns is the fact that the participating entities can appear not only as an argument of the predicate that describes the event type, but also in other places within the sentence or in the prior text. In the MUC-3 terrorism scenario, WEAPON entities occur in many different relations to event predicates in the documents. Even if WEAPON entities appear in the same sentence with the event predicate, they rarely serve as a direct argument of such predicates.</Paragraph> <Paragraph position="2"> (e.g., &quot;One person was killed as the result of a bomb explosion.&quot;) Predicate-Argument model The Predicate-Argument model is based on a direct syntactic relation between a predicate and its arguments1 (Yangarber et al., 2000). In general, a predicate provides a strong context for its arguments, which leads to good accuracy. However, this model has two major limitations in terms of its coverage, clausal boundaries and embedded entities inside a predicate's arguments.</Paragraph> <Paragraph position="3"> Figure 12 shows an example of an extraction task in the terrorism domain where the event template consists of perpetrator, date, location and victim.</Paragraph> <Paragraph position="4"> With the extraction patterns based on the Predicate-Argument model, only perpetrator and victim can be extracted. The location (downtown Jerusalem) is embedded as a modifier of the noun (heart) within the prepositional phrase, which is an adjunct of the main predicate, triggered3. Furthermore, it is not clear whether the extracted entities are related to the same event, because of the clausal boundaries.4 1Since the case marking for a nominalized predicate is significantly different from the verbal predicate, which makes it hard to regularize the nominalized predicates automatically, the constraint for the Predicate-Argument model requires the root node to be a verbal predicate.</Paragraph> <Paragraph position="5"> 2Throughout this paper, extraction patterns are defined as one or more word classes with their context in the dependency tree, where the actual word matched with the class is associated to one of the slots in the template. The notation of the patterns in this paper is based on a dependency tree where (a0 (a1a3a2 -a4a5a2 )..(a1a7a6 -a4a8a6 )) denotes a0 is the head, and, for each a9 in a10a12a11a13a11 a14 , a1a7a15 is its argument and the relation between a0 and a1a16a15 is labeled with a4 a15 . The labels introduced in this paper are SBJ (subject), OBJ (object), ADV (adverbial adjunct), REL (relative), APPOS (apposition) and prepositions (IN, OF, etc.). Also, we assume that the order of the arguments does not matter. Symbols beginning with C- represent NE (Named Entity) types.</Paragraph> <Paragraph position="6"> Chain model Our previous work, the Chain model (Sudo et al., 2001)5 attempts to remedy the limitations of the Predicate-Argument model. The extraction patterns generated by the Chain model are any chain-shaped paths in the dependency tree.6 Thus it successfully avoids the clausal boundary and embedded entity limitation. We reported a 5% gain in recall at the same precision level in the MUC-6 management succession task compared to the Predicate-Argument model.</Paragraph> <Paragraph position="7"> However, the Chain model also has its own weakness in terms of accuracy due to the lack of context. For example, in Figure 1(c), (triggered (a17 C-DATEa18 -ADV)) is needed to extract the date entity. However, the same pattern is likely to be applied to texts in other domains as well, such as &quot;The Mexican peso was devalued and triggered a national financial crisis last week.&quot; Subtree model The Subtree model is a generalization of previous models, such that any subtree of a dependency tree in the source sentence can be regarded as an extraction pattern candidate. As shown in Figure 1(d), the Subtree model, by its definition, contains all the patterns permitted by either the Predicate-Argument model or the Chain model. It is also capable of providing more relevant context, such as (triggered (explosion-OBJ)(a17 C-DATEa18 -ADV)). The obvious advantage of the Subtree model is the flexibility it affords in creating suitable patterns, spanning multiple levels and multiple branches. Pattern coverage is further improved by relaxing the constraint that the root of the pattern tree be a predicate node. However, this flexibility can also be a disadvantage, since it means that a very large number of pattern candidates -- all possible subtrees of the dependency tree of each sentence in the corpus -- must be considered. An efficient procedure is required to select the appropriate patterns from among the candidates.</Paragraph> <Paragraph position="8"> Also, as the number of pattern candidates increases, the amount of noise and complexity inas &quot;triggering an explosion is related to killing or injuring and therefore constitutes one terrorism action.&quot; 5Originally we called it &quot;Tree-Based Representation of Patterns&quot;. We renamed it to avoid confusion with the proposed approach that is also based on dependency trees.</Paragraph> <Paragraph position="9"> 6(Sudo et al., 2001) required the root node of the chain to be a verbal predicate, but we have relaxed that constraint for our experiments.</Paragraph> <Paragraph position="10"> (a) JERUSALEM, March 21 - A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, killing himself and three other people and injuring scores.</Paragraph> <Paragraph position="11"> (b) (c) are shaded in the tree). (c) Predicate-Argument patterns and Chain-model patterns that contribute to the extraction task. (d) Subtree model patterns that contribute the extraction task.</Paragraph> <Paragraph position="12"> creases. In particular, many of the pattern candidates overlap one another. For a given set of extraction patterns, if pattern A subsumes pattern B (say, A is (shoot (a17 C-PERSONa18 -OBJ)(to death)) and B is (shoot (a17 C-PERSONa18 -OBJ))), there is no added contribution for extraction by pattern matching with A (since all the matches with pattern A must be covered with pattern B). Therefore, we need to pay special attention to the ranking function for pattern candidates, so that patterns with more relevant contexts get higher score.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Acquisition Method </SectionTitle> <Paragraph position="0"> This section discusses an automatic procedure to learn extraction patterns. Given a narrative description of the scenario and a set of source documents, the following three stages obtain the relevant extraction patterns for the scenario; preprocessing, document retrieval, and ranking pattern candidates.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Stage 1: Preprocessing </SectionTitle> <Paragraph position="0"> Morphological analysis and Named Entities (NE) tagging are performed at this stage.7 Then all the sentences are converted into dependency trees by an appropriate dependency analyzer.8 The NE tagging dure, from lexicalized dependency to chunk-level dependency. For the following experiment in Japanese, we define a node in replaces named entities by their class, so the resulting dependency trees contain some NE class names as leaf nodes. This is crucial to identifying common patterns, and to applying these patterns to new text.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Stage 2: Document Retrieval </SectionTitle> <Paragraph position="0"> The procedure retrieves a set of documents that describe the events of the scenario of interest, the relevant document set. A set of narrative sentences describing the scenario is selected to create a query for the retrieval. Any IR system of sufficient accuracy can be used at this stage. For this experiment, we retrieved the documents using CRL's stochasticmodel-based IR system (Murata et al., 1999).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Stage 3: Ranking Pattern Candidates </SectionTitle> <Paragraph position="0"> Given the dependency trees of parsed sentences in the relevant document set, all the possible subtrees can be candidates for extraction patterns. The ranking of pattern candidates is inspired by TF/IDF scoring in IR literature; a pattern is more relevant when it appears more in the relevant document set and less across the entire collection of source documents.</Paragraph> <Paragraph position="1"> The right-most expansion base subtree discovery algorithm (Abe et al., 2002) was implemented to calculate term frequency (raw frequency of a pattern) and document frequency (the number of documents where a pattern appears) for each pattern candidate.</Paragraph> <Paragraph position="2"> The algorithm finds the subtrees appearing more frequently than a given threshold by constructing the subtrees level by level, while keeping track of their occurrence in the corpus. Thus, it efficiently avoids the construction of duplicate patterns and runs almost linearly in the total size of the maximal tree patterns contained in the corpus.</Paragraph> <Paragraph position="3"> The following ranking function was used to rank each pattern candidate. The score of subtree a0 ,</Paragraph> <Paragraph position="5"> where a14a17a16a7a10 is the number of times that subtree a0 appears across the documents in the relevant document set, a34 . a35a13a36 is the set of subtrees that appear in</Paragraph> <Paragraph position="7"> is the total number of the dependency tree as a bunsetsu, phrasal unit. documents in the collection. The first term roughly corresponds to the term frequency and the second term to the inverse document frequency in TF/IDF scoring. a37 is used to control the weight on the IDF portion of this scoring function.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Parameter Tuning for Ranking Function </SectionTitle> <Paragraph position="0"> The a37 in Equation (1) is used to parameterize the weight on the IDF portion of the ranking function.</Paragraph> <Paragraph position="1"> As we pointed out in Section 2, we need to pay special attention to overlapping patterns; the more relevant context a pattern contains, the higher it should be ranked. The weight a37 serves to focus on how specific a pattern is to a given scenario. Therefore, for high a37 value, (triggered (explosion-OBJ)(a17 C-DATEa18 -ADV)) is ranked higher than (triggered (a17 C-DATEa18 -ADV)) in the terrorism scenario, for example. Figure 2 shows the improvement of the extraction performance by tuning a37 on the entity extraction task which will be discussed in the next section.</Paragraph> <Paragraph position="2"> For unsupervised tuning of a37 , we used a pseudoextraction task, instead of using held-out data for supervised learning. We used an unsupervised version of the text classification task to optimize a37 , assuming that all the documents retrieved by the IR system are relevant to the scenario and the pattern set that performs well on the text classification task also works well on the entity extraction task.</Paragraph> <Paragraph position="3"> The unsupervised text classification task is to measure how close a pattern matching system, given a set of extraction patterns, simulates the document retrieval of the same IR system as in the previous sub-section. The a37 value is optimized so that the cumulative performance of the precision-recall curve over the entire range of recall for the text classification task is maximized.</Paragraph> <Paragraph position="4"> The document set for text classification is composed of the documents retrieved by the same IR system as in Section 3.2 plus the same number of documents picked up randomly, where all the documents are taken from a different document set from the one used for pattern learning. The pattern matching system, given a set of extraction patterns, classifies a document as retrieved if any of the patterns match any portion of the document, and as random otherwise. Thus, we can get the performance of text classification of the pattern matching system in the form of a precision-recall curve, without any supervision. null Next, the area of the precision-recall curve is computed by connecting every point in the precision-recall curve from 0 to the maximum recall the pattern matching system reached, and we compare the area for each possible a37 value. Finally, the a37 value which gets the greatest area under the precision-recall curve is used for extraction.</Paragraph> <Paragraph position="5"> The comparison to the same procedure based on the precision-recall curve of the actual extraction performance shows that this tuning has high correlation with the extraction performance (Spearman correlation coefficient a6a1a0 a12a3a2a5a4a7a6a9a8 with 2% confidence).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.5 Filtering </SectionTitle> <Paragraph position="0"> For efficiency and to eliminate low-frequency noise, we filtered out the pattern candidates that appear in less than 3 documents throughout the entire collection. Also, since the patterns with too much context are unlikely to match with new text, we added another filtering criterion based on the number of nodes in a pattern candidate; the maximum number of nodes is 8.</Paragraph> <Paragraph position="1"> Since all the slot-fillers in the extraction task of our experiment are assumed to be instances of the 150 classes in the extended Named Entity hierarchy (Sekine et al., 2002), further filtering was done by requiring a pattern candidate to contain at least one Named Entity class.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Experiment </SectionTitle> <Paragraph position="0"> The experiment of this study is focused on comparing the performance of the earlier extraction pattern models to the proposed Subtree Model (SUBT). The compared models are the direct predicate-argument model (PA)9, and the Chain model (CH) in (Sudo et al., 2001).</Paragraph> <Paragraph position="1"> The task for this experiment is entity extraction, which is to identify all the entities participating in relevant events in a set of given Japanese texts. Note that all NEs in the test documents were identified manually, so that the task can measure only how well extraction patterns can distinguish the participating entities from the entities that are not related to any events. This task does not involve grouping entities associated with the same event into a single template to avoid possible effect of merging failure on extraction performance for entities. We accumulated the test set of documents of two scenarios; the Management Succession scenario of (MUC-6, 1995), with a simpler template structure, where corporate managers assumed and/or left their posts, and the Murderer Arrest scenario, where a law enforcement organization arrested a murder suspect.</Paragraph> <Paragraph position="2"> The source document set from which the extraction patterns are learned consists of 117,109 Mainichi Newspaper articles from 1995. All the sentences are morphologically analyzed by JU-MAN (Kurohashi, 1997) and converted into dependency trees by KNP (Kurohashi and Nagao, 1994).</Paragraph> <Paragraph position="3"> Regardless of the model of extraction patterns, the pattern acquisition follows the procedure described in Section 3. We retrieved 300 documents as a relevant document set.</Paragraph> <Paragraph position="4"> The association of NE classes and slots in the template is made automatically; Person, Organization, Post (slots) correspond to C-PERSON, C-ORG, C-POST (NE-classes), respectively, in the Succession scenario, and Suspect, Arresting Agency, strained to have a single place-holder for each pattern, while (Yangarber et al., 2000) allowed more than one place-holder. However, the difference does not matter for the entity extraction task which does not require merging entities in a single template.</Paragraph> <Paragraph position="5"> sion at the level of executives of a company. The topic of interest should not be limited to the promotion inside the company mentioned, but also includes hiring executives from outside the company or their resignation.</Paragraph> <Paragraph position="6"> A relevant document must describe the arrest of the suspect of murder. The document should be regarded as interesting if it discusses the suspect under suspicion for multiple crimes including murder, such as murder-robbery.</Paragraph> <Paragraph position="7"> rest scenario. 10 For each model, we get a list of the pattern candidates ordered by the ranking function discussed in Section 3.3 after filtering. The result of the performance is shown (Figure 3) as a precision-recall graph for each subset of top-a15 ranked patterns where a15 ranges from 1 to the number of the pattern candidates. null The test set was accumulated from Mainichi Newspaper in 1996 by a simple keyword search, with some additional irrelevant documents. (See Table 1 for detail.) Figure 3(a) shows the precision-recall curve of top-a15 relevant extraction patterns for each model on the Succession Scenario. At lower recall levels (up to 35%), all the models performed similarly. However, the precision of Chain patterns dropped suddenly by 20% at recall level 38%, while the SUBT patterns keep the precision significantly higher than Chain patterns until it reaches 58% recall. Even after SUBT hit the drop at 56%, SUBT is consistently a few percent higher in precision than Chain patterns for most recall levels. Figure 3(a) also shows that although PA keeps high precision at low recall level it has a significantly lower ceiling of recall (52%) compared to other models.</Paragraph> <Paragraph position="8"> Figure 3(b) shows the extraction performance on 10Since there is no subcategory of C-PERSON to distinguish Suspect and victim (which is not extracted in this experiment) for the Arrest scenario, the learned pattern candidates may extract victims as Suspect entities by mistake.</Paragraph> <Paragraph position="9"> the Arrest scenario task. Again, the Predicate-Argument model has a much lower recall ceiling (25%). The difference in the performance between the Subtree model and the Chain model does not seem as obvious as in the Succession task. However, it is still observable that the Subtree model gains a few percent precision over the Chain model at recall levels around 40%. A possible explanation of the subtleness in performance difference in this scenario is the smaller number of contributing patterns compared to the Succession scenario.</Paragraph> </Section> class="xml-element"></Paper>