File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/w04-1806_relat.xml

Size: 3,657 bytes

Last Modified: 2025-10-06 14:15:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1806">
  <Title>Automatically Inducing Ontologies from Corpora</Title>
  <Section position="6" start_page="74" end_page="74" type="relat">
    <SectionTitle>
4 Related Work and 'chiropractic medicine' are missed by H. This
</SectionTitle>
    <Paragraph position="0"> highlights a problem with human-generated ontologies: substantial errors of omission.</Paragraph>
    <Paragraph position="1"> The existing approaches to ontology induction include those that start from structured data, merging ontologies or database schemas (Doan et al. 2002). Other approaches use natural language data, sometimes just by analyzing the corpus (Sanderson and Croft 1999), (Caraballo 1999) or by learning to expand WordNet with clusters of terms from a corpus, e.g., (Girju et al. 2003).</Paragraph>
    <Paragraph position="2"> Information extraction approaches that infer labeled relations either require substantial hand-created linguistic or domain knowledge, e.g., (Craven and Kumlien 1999) (Hull and Gomez 1993), or require human-annotated training data with relation information for each domain (Craven et al. 1998).</Paragraph>
    <Paragraph position="3"> The number of relations in H that our system missed (relations that were more than distance 1 away in the system ontology), is 3493. However, of these 3493 relations, 2955 involved at least 1 term that was not included in M, leaving 538 relations that we could calculate the distance for in M. These 538 relations in H include relations between 'acid indigestion medicine' and 'maalox', and 'alternative medicine' and 'acupuncture' (a majority of the misses involved relations between a disease and the name of a specific drug for it, which aren't part-of or kind-of relations).</Paragraph>
    <Paragraph position="4">  Many, though not all, domain-independent approaches (Evans et al. 1991) (Grefenstette 1997) have restricted themselves to discovering termassociations, rather than labeled relations. A notable exception is (Sanderson and Croft 1995), which (unlike our approach) assumes the existence of a query that was used to originally retrieve the documents (so that terms can be extracted from the query and then expanded to generate additional terms for the ontology). Their approach also is restricted to one method to discover relations, while we use several.</Paragraph>
    <Paragraph position="5"> Our approach is complementary to approaches aimed at automatically enhancing existing resources for a particular domain, e.g. (Moldovan et al. 2000). Finally, the prior methods, while they often carry out evaluation, lack standard criteria for ontology evaluation. Although ontology evaluation remains challenging, we have discussed several evaluation methods in this paper.</Paragraph>
    <Paragraph position="6">  These observations lead to a metric for comparing one ontology with another one serving as a reference ontology. Given two ontologies A and B, define Relation Precision (A, B, D) as the proportion of the distance 1 relations in A that are at most a distance D apart in B. This measure can be plotted for different values of D. In Figure 4, we show the Relation Precision(H, M, D), and Relation Precision(M, H, D), for our machine ontology M and human ontology H. Both curves show Relation Precision(H, M, D) growing faster than Relation Precision(M, H, D), with 70% of the area being below the former curve and 54% being below the latter curve. The graph shows that while 22% of distance 1 relations in M are at most 3 apart in H (but keep in mind the errors of omission in H), 40% of distance 1 relations in H are at most</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML