XML Viewer - p06-2094

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2094_metho.xml
Size: 18,926 bytes
Last Modified: 2025-10-06 14:10:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2094">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics On-Demand Information Extraction</Title>
  <Section position="5" start_page="732" end_page="733" type="metho">
    <SectionTitle>
3 Details of Components
</SectionTitle>
    <Paragraph position="0"> In this section, four important components will be described in detail. Prior work related to each component is explained and the techniques used in our system are presented.</Paragraph>
    <Section position="1" start_page="732" end_page="732" type="sub_section">
      <SectionTitle>
3.1 Pattern Discovery
</SectionTitle>
      <Paragraph position="0"> The pattern discovery component is responsible for discovering salient patterns for the topic. The patterns will be extracted from the documents relevant to the topic which are gathered by an IR system.</Paragraph>
      <Paragraph position="1"> Several unsupervised pattern discovery techniques have been proposed, e.g. (Riloff 96), (Agichtein and Gravano 00) and (Yangarber et al. 00). Most recently we (Sudo et al. 03) proposed a method which is triggered by a user query to discover important patterns fully automatically. In this work, three different representation models for IE patterns were compared, and the sub-tree model was found more effective compared to the predicate-argument model and the chain model. In the sub-tree model, any connected part of a dependency tree for a sentence can be considered as a pattern. As it counts all possible sub-trees from all sentences in the retrieved documents, the computation is very expensive. This problem was solved by requiring that the sub-trees contain a predicate (verb) and restricting the number of nodes. It was implemented using the sub-tree counting algorithm proposed by (Abe et al. 02).</Paragraph>
      <Paragraph position="2"> The patterns are scored based on the relative frequency of the pattern in the retrieved documents</Paragraph>
      <Paragraph position="4"> ) and in the entire corpus (f all ). The formula uses the TF/IDF idea (Formula 1). The system ignores very frequent patterns, as those patterns are so common that they are not likely to be important to any particular topic, and also very rare patterns, as most of those patterns are noise.</Paragraph>
      <Paragraph position="6"> The scoring function sorts all patterns which contain at least one extended NE and the top 100 patterns are selected for later processing. Figure 2 shows examples of the discovered patterns for the &amp;quot;merger and acquisition&amp;quot; topic. Chunks are shown in brackets and extended NEs are shown in upper case words. (COM means &amp;quot;company&amp;quot; and MNY</Paragraph>
    </Section>
    <Section position="2" start_page="732" end_page="733" type="sub_section">
      <SectionTitle>
3.2 Paraphrase Discovery
</SectionTitle>
      <Paragraph position="0"> The role of the paraphrase discovery component is to link the patterns which mean the same thing for the task. Recently there has been a growing amount of research on automatic paraphrase discovery. For example, (Barzilay 01) proposed a method to extract paraphrases from parallel translations derived from one original document. We proposed to find paraphrases from multiple newspapers reporting the same event, using shared Named Entities to align the phrases (Shinyama et al. 02). We also proposed a method to find paraphrases in the context of two Named Entity instances in a large un-annotated corpus (Sekine 05).</Paragraph>
      <Paragraph position="1"> The phrases connecting two NEs are grouped based on two types of evidence. One is the identity of the NE instance pairs, as multiple instances of the same NE pair (e.g. Yahoo! and Overture) are likely to refer to the same relationship (e.g.</Paragraph>
      <Paragraph position="2"> acquisition). The other type of evidence is the keywords in the phrase. If we gather a lot of phrases connecting NE's of the same two NE types (e.g. company and company), we can cluster these phrases and find some typical expressions (e.g. merge, acquisition, buy). The phrases are clustered based on these two types of evidence and sets of paraphrases are created.</Paragraph>
      <Paragraph position="3"> Basically, we used the paraphrases found by the approach mentioned above. For example, the expressions in Figure 2 are identified as paraphrases by this method; so these three patterns will be placed in the same pattern set.</Paragraph>
      <Paragraph position="4">  Note that there is an alternative method of paraphrase discovery, using a hand crafted synonym dictionary like WordNet (WordNet Home page). However, we found that the coverage of WordNet for a particular topic is not sufficient.</Paragraph>
      <Paragraph position="5"> For example, no synset covers any combinations of the main words in Figure 2, namely &amp;quot;buy&amp;quot;, &amp;quot;acquire&amp;quot; and &amp;quot;merger&amp;quot;. Furthermore, even if these words are found as synonyms, there is the additional task of linking expressions. For example, if one of the expressions is &amp;quot;reject the merger&amp;quot;, it shouldn't be a paraphrase of &amp;quot;acquire&amp;quot;.</Paragraph>
    </Section>
    <Section position="3" start_page="733" end_page="733" type="sub_section">
      <SectionTitle>
3.3 Extended NE tagging
</SectionTitle>
      <Paragraph position="0"> Named Entities (NE) were first introduced by the MUC evaluations (Grishman and Sundheim 96).</Paragraph>
      <Paragraph position="1"> As the MUCs concentrated on business and military topics, the important entity types were limited to a few classes of names and numerical expressions. However, along with the development of Information Extraction and Question Answering technologies, people realized that there should be more and finer categories for NE. We proposed one of those extended NE sets (Sekine 02). It includes 140 hierarchical categories. For example, the categories include Company, Company group, Military, Government, Political party, and International Organization as subcategories of Organization. Also, new categories are introduced such as Vehicle, Food, Award, Religion, Language, Offense, Art and so on as subcategories of Product, as well as Event, Natural Object, Vocation, Unit, Weight, Temperature, Number of people and so on. We used a rule-based tagger developed to tag the 140 categories for this experiment.</Paragraph>
      <Paragraph position="2"> Note that, in the proposed method, the slots of the final table will be filled in only with instances of these extended Named Entities. Most common nouns, verbs or sentences can't be entries in the table. This is obviously a limitation of the proposed method; however, as the categories are designed to provide good coverage for a factoid type QA system, most interesting types of entities are covered by the categories.</Paragraph>
    </Section>
    <Section position="4" start_page="733" end_page="733" type="sub_section">
      <SectionTitle>
3.4 Table Construction
</SectionTitle>
      <Paragraph position="0"> Basically the table construction is done by applying the discovered patterns to the original corpus.</Paragraph>
      <Paragraph position="1"> The discovered patterns are grouped into pattern set using discovered paraphrase knowledge. Once the pattern sets are built, a table is created for each pattern set. We gather all NE instances matched by one of the patterns in the set. These instances are put in the same column of the table for the pattern set. When creating tables, we impose some restrictions in order to reduce the number of meaningless tables and to gather the same relations in one table. We require columns to have at least three filled instances and delete tables with fewer than three rows. These thresholds are em-</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="733" end_page="735" type="metho">
    <SectionTitle>
4 Experiments
. Examples
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="733" end_page="734" type="sub_section">
      <SectionTitle>
of
4.1 Data and Processing
</SectionTitle>
      <Paragraph position="0"> We conducted the experiments using the 1995 New York Times as the corpus. The queries used for system development and threshold tuning were created by the authors, while queries based on the set of event types in the ACE extraction evaluations were used for testing. A total of 31 test queries were used; we discarded several queries which were ambiguous or uncertain. The test queries were derived from the example sentences for each event type in the ACE guidelines queries are shown in the Appendix.</Paragraph>
      <Paragraph position="1"> At the moment, the whole process takes about 15 minutes on average for each query on a Pentium 2.80GHz processor running Linux. The corpus was analyzed in advance by a POS tagger, NE tagger and dependency analyzer. The processing  and counting of sub-trees takes the majority (more than 90%) of the time. We believe we can easily make it faster by programming techniques, for ple, using distributed puting.</Paragraph>
    </Section>
    <Section position="2" start_page="734" end_page="735" type="sub_section">
      <SectionTitle>
4.2 Result and Evaluation
</SectionTitle>
      <Paragraph position="0"> Out of 31 queries, the system is unable to build any tables for 11 queries. The major reason is that the IR component can't find enough newspaper articles on the topic. It retrieved only a few articles for topics like &amp;quot;born&amp;quot;, &amp;quot;divorce&amp;quot; or &amp;quot;injure&amp;quot; from The New York Times. For the moment, we will focus on the 20 queries for which tables were built. The Appendix shows some examples of queries and the generated tables. In total, 127 tables are created for the 20 topics, with one to thirteen tables for each topic. The number of columns in a table ranges from 2 to 10, including the document ID column, and the average number of columns is 3.0. The number of rows in a table range from 3 to 125, and the average number of rows is 16.9. The created tables are y filled; the average rate is 20.0%.</Paragraph>
      <Paragraph position="1"> In order to measure the potential and the usefulness of the proposed method, we evaluate the result based on three measures: usefulness, argument role coverage, and correctness. For the usefulness evaluation, we manually reviewed the tables to determine whether a useful table is included or not. This is inevitably subjective, as the user does not specify in advance what table rows and columns are expected. We asked a subject to judge usefulness in three grades; A) very useful for the query, many people might want to use this table for the further investigation of the topic, B) useful - at least, for some purpose, some people might want to use this table for further investigation and C) not useful - no one will be interested in using this table for further investigation. The argument role coverage measures the percentage of the roles specified for each ACE event type which appeared as a column in one or more of the created tables for that event type. The correctness was measured based on whether a row of a table reflects the correct information. As it is impossible to evaluate all th ected randomly.</Paragraph>
      <Paragraph position="2"> Table 1 shows the usefulness evaluation result.</Paragraph>
      <Paragraph position="3"> Out of 20 topics, two topics are judged very useful and twelve are judged useful. The very useful topics are &amp;quot;fine&amp;quot; (Q4 in the appendix) and &amp;quot;acquit&amp;quot; (not shown in the appendix). Compared to the results in the 'useful' category, the tables for these two topics have more slots filled and the NE types of the fillers have fewer mistakes. The topics in the &amp;quot;not useful&amp;quot; category are &amp;quot;appeal&amp;quot;, &amp;quot;execute&amp;quot;, &amp;quot;fired&amp;quot;, &amp;quot;pardon&amp;quot;, &amp;quot;release&amp;quot; and &amp;quot;trial&amp;quot;. These are again topics with very few relevant articles. By increasing the corpus size or improving the IR component, we may be able to improve the performance for these topics. The majority category, &amp;quot;useful&amp;quot;, has 12 topics. Five of them can be found in the appendix (all those besides Q4). For these topics, the number of relevant articles in the corpus is relatively high and interesting relations are found. The examples in the appendix are selected from larger tables with many columns. Although there are columns that cannot be filled for every event instance, we found that the more columns that are filled in, th  For the 14 &amp;quot;very useful&amp;quot; and &amp;quot;useful&amp;quot; topics, the role coverage was measured. Some of the roles in the ACE task can be filled by different types of Named Entities, for example, the &amp;quot;defendant&amp;quot; of a &amp;quot;sentence&amp;quot; event can be a Person, Organization or GPE. However, the system creates tables based on NE types; e.g. for the &amp;quot;sentence&amp;quot; event, a Person column is created, in which most of the fillers are defendants. In such cases, we regard the column as covering the role. Out of 63 roles for the 14 event types, 38 are found in the created tables, for a role coverage of 60.3%. Note that, by lowering the thresholds, the coverage can be increased to as much as 90% (some roles can't be found because of Extended NE limitations or the rare appearance of roles) but with some sacrifice of precision.</Paragraph>
      <Paragraph position="4"> Table 2 shows the correctness evaluation results. We randomly select 100 table rows among the topics which were judged &amp;quot;very useful&amp;quot; or &amp;quot;useful&amp;quot;, and determine the correctness of the information by reading the newspaper articles the information was extracted from. Out of 100 rows, 84 rows have correct information in all slots. 4  rows have some incorrect information in some of the columns, and 12 contain wrong information.</Paragraph>
      <Paragraph position="5"> Most errors are due to NE tagging errors (11 NE errors out of 16 errors). These errors include instances of people which are tagged as other categories, and so on. Also, by looking at the actual articles, we found that co-reference resolution could help to fill in more information. Because the important information is repeatedly mentioned in newspaper articles, referential expressions are often used. For example, in a sentence &amp;quot;In 1968 he was elected mayor of Indianapolis.&amp;quot;, we could not extract &amp;quot;he&amp;quot; at the moment. We plan to add coreference resolution in the near future. Other * e entity is confused, i.e. victim * query (as both of them * He was sentenced 3 ears and fined $1,000&amp;quot;.</Paragraph>
      <Paragraph position="6"> orrectness n Numb sources of error include: The role of th and murderer Different kinds of events are found in one table, e.g., the victory of Jack Nicklaus was found in the political election use terms like &amp;quot;win&amp;quot;) An unrelated but often collocate entity was included. For example, Year period expressions are found in &amp;quot;fine&amp;quot; events, as there are many expressions like &amp;quot; y</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="735" end_page="735" type="metho">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> As far as the authors know, there is no system similar to ODIE. Several methods have been proposed to produce IE patterns automatically to facilitate IE knowledge creation, as is described in Section 3.1. But those are not targeting the fully automatic creation of a complete IE system for a new vent detection follow thi e a country and ial where an ODIE-type system can be beneficial.</Paragraph>
    <Paragraph position="1"> topic.</Paragraph>
    <Paragraph position="2"> There exists another strategy to extend the range of IE systems. It involves trying to cover a wide variety of topics with a large inventory of relations and events. It is not certain if there are only a limited number of topics in the world, but there are a limited number of high-interest topics, so this may be a reasonable solution from an engineering point of view. This line of research was first proposed by (Aone and Ramos-Santacruz 00) and the ACE evaluations of e s line (ACE Home Page).</Paragraph>
    <Paragraph position="3"> An unsupervised learning method has been applied to a more restricted IE task, Relation Discovery. (Hasegawa et al. 2004) used large corpora and an Extended Named Entity tagger to find novel relations and their participants. However, the results are limited to a pair of participants and because of the nature of the procedure, the discovered relations are static relations lik its presidents rather than events.</Paragraph>
    <Paragraph position="4"> Topic-oriented summarization, currently pursued by the DUC evaluations (DUC Home Page), is also closely related. The systems are trying to create summaries based on the specified topic for a manually prepared set of documents. In this case, if the result is suitable to present in table format, it can be handled by ODIE. Our previous study (Sekine and Nobata 03) found that about one third of randomly constructed similar newspaper article clusters are well-suited to be presented in table format, and another one third of the clusters can be acceptably expressed in table format. This suggests there is a big potent</Paragraph>
  </Section>
  <Section position="8" start_page="735" end_page="736" type="metho">
    <SectionTitle>
6 Future Work
</SectionTitle>
    <Paragraph position="0"> We demonstrated a new paradigm of Information Extraction technology and showed the potential of this method. However, there are problems to be solved to advance the technology. One of them is the coverage of the extracted information. Although we have created useful tables for some topics, there are event instances which are not found. This problem is mostly due to the inadequate performance of the language analyzers (information retrieval component, dependency analyzer or Extended NE tagger) and the lack of a coreference analyzer. Even though there are possible applications with limited coverage, it will be essential to enhance these components and add coreference in order to increase coverage. Also, there are basic domain limitations. We made the system &amp;quot;on-demand&amp;quot; for any topic, but currently only within regular news domains. As configured, the system would not work on other domains such as a medical, legal, or patent domain, mainly due to the design of the extended NE hierarchy.</Paragraph>
    <Paragraph position="1"> While specific hierarchies could be incorporated  for new domains, it will also be desirable to integrate bootstrapping techniques for rapid incremental additions to the hierarchy. Also at the would like to investigate this problem in the future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML