File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2180_metho.xml
Size: 9,470 bytes
Last Modified: 2025-10-06 14:15:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2180"> <Title>MindNet: acquiring and structuring semantic information from text</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Full automation </SectionTitle> <Paragraph position="0"> MindNet is produced by a fully automatic process, based on the use of a broad-coverage NL parser. A fresh version of MindNet is built regularly as part of a normal regression process. Problems introduced by daily changes to the underlying system or parsing grammar are quickly identified and fixed.</Paragraph> <Paragraph position="1"> Although there has been much research on the use of automatic methods for extracting information from dictionary definitions (e.g., Vossen 1995, Wilks et al. 1996), hand-coded knowledge bases, e.g. WordNet (Miller et al. 1990), continue to be the focus of ongoing research. The Euro WordNet project (Vossen 1996), although continuing in the WordNet tradition, includes a focus on semi-automated procedures for acquiring lexical content.</Paragraph> <Paragraph position="2"> Outside the realm of NLP, we believe that automatic procedures such as MindNet's provide the only credible prospect for acquiring world knowledge on the scale needed to support common-sense reasoning. At the same time, we acknowledge the potential need for the hand vetting of such information to insure accuracy and consistency in production level systems.</Paragraph> </Section> <Section position="5" start_page="0" end_page="1098" type="metho"> <SectionTitle> 3 Broad-coverage parsing </SectionTitle> <Paragraph position="0"> The extraction of the semantic information contained in MindNet exploits the very same broad-coverage parser used in the Microsoft Word 97 grammar checker. This parser produces syntactic parse trees and deeper logical forms, to which rules are applied that generate corresponding structures of semantic relations. The parser has not been specially tuned to process dictionary definitions. All enhancements to the parser are geared to handle the immense variety of general text, of which dictionary definitions are simply a modest subset.</Paragraph> <Paragraph position="1"> There have been many other attempts to process dictionary definitions using heuristic pattern matching (e.g., Chodorow et al. 1985), specially constructed definition parsers (e.g., Wilks et al. 1996, Vossen 1995), and even general coverage syntactic parsers (e.g., Briscoe and Carroll 1993). However, none of these has succeeded in producing the breadth of semantic relations across entire dictionaries that has been produced for MindNet.</Paragraph> <Paragraph position="2"> Vanderwende (1996) describes in detail the methodology used in the extraction of the semantic relations comprising MindNet. A truly broad-coverage parser is an essential component of this process, and it is the basis for extending it to other sources of information such as encyclopedias and text corpora.</Paragraph> <Paragraph position="3"> 4 Labeled, semantic relations The different types of labeled, semantic relations extracted by parsing for inclusion in MindNet are given in the table below:</Paragraph> <Section position="1" start_page="1098" end_page="1098" type="sub_section"> <SectionTitle> MindNet </SectionTitle> <Paragraph position="0"> These relation types may be contrasted with simple co-occurrence statistics used to create network structures from dictionaries by researchers including Veronis and Ide (1990), Kozima and Furugori (1993), and Wilks et al. (1996). Labeled relations, while more difficult to obtain, provide greater power for resolving both structural attachment and word sense ambiguities.</Paragraph> <Paragraph position="1"> While many researchers have acknowledged the utility of labeled relations, they have been at times either unable (e.g., for lack of a sufficiently powerful parser) or unwilling (e.g., focused on purely statistical methods) to make the effort to obtain them. This deficiency limits the characterization of word pairs such as river~bank (Wilks et al. 1996) and write~pen (Veronis and Ide 1990) to simple relatedness, whereas the labeled relations of MindNet specify precisely the relations river---Part-->bank and write---Means--->pen.</Paragraph> </Section> </Section> <Section position="6" start_page="1098" end_page="1098" type="metho"> <SectionTitle> 5 Semantic relation structures </SectionTitle> <Paragraph position="0"> The automatic extraction of semantic relations (or semrels) from a definition or example sentence for MindNet produces a hierarchical structure of these relations, representing the entire definition or sentence from which they came. Such structures are stored in their entirety in MindNet and provide crucial context for some of the procedures described in later sections of this paper. The semrel structure for a definition of car is given in the figure below.</Paragraph> <Paragraph position="1"> car: &quot;a vehicle with 3 or usu. 4 wheels and driven by a motor, esp. one one for carrying people&quot; Early dictionary-based work focused on the extraction of paradigmatic relations, in particular Hypernym relations (e.g., car--Hypernym--->vehicle). Almost exclusively, these relations, as well as other syntagmatic ones, have continued to take the form of relational triples (see Wilks et al. 1996). The larger contexts from which these relations have been taken have generally not been retained. For labeled relations, only a few researchers (recently, Barri~re and Popowich 1996), have appeared to be interested in entire semantic structures extracted from dictionary definitions, though they have not reported extracting a significant number of them.</Paragraph> </Section> <Section position="7" start_page="1098" end_page="1099" type="metho"> <SectionTitle> 6 Full inversion of structures </SectionTitle> <Paragraph position="0"> After semrel structures are created, they are fully inverted and propagated throughout the entire MindNet database, being linked to every word that appears in them. Such an inverted structure, produced from a definition for motorist and linked to the entry for car (appearing as the root of the inverted structure), is shown in the figure below: motorist: 'a person who drives, and usu. owns, a car&quot; 'inverted) Researchers who produced spreading activation networks from MRDs, including Veronis and Ide (1990) and Kozima and Furugori (1993), typically only implemented forward links (from headwords to their definition words) in those networks. Words were not related backward to any of the headwords whose definitions mentioned them, and words co-occurring in the same definition were not related directly. In the fully inverted structures stored in MindNet, however, all words are cross-linked, no matter where they appear. The massive network of inverted semrel structures contained in MindNet invalidates the criticism leveled against dictionary-based methods by Yarowsky (1992) and Ide and Veronis (1993) that LKBs created from MRDs provide spotty coverage of a language at best.</Paragraph> <Paragraph position="1"> Experiments described elsewhere (Richardson 1997) demonstrate the comprehensive coverage of the information contained in MindNet.</Paragraph> <Paragraph position="2"> Some statistics indicating the size (rounded to the nearest thousand) of the current version of MindNet and the processing time required to create it are provided in the table below. The definitions and example sentences are from the Longman Dictionary of Contemporary English (LDOCE) and the American Heritage Dictionary, 3 ra Edition (AHD3).</Paragraph> </Section> <Section position="8" start_page="1099" end_page="1099" type="metho"> <SectionTitle> 7 Weighted paths </SectionTitle> <Paragraph position="0"> Inverted semrel structures facilitate the access to direct and indirect relationships between the root word of each structure, which is the headword for the MindNet entry containing it, and every other word contained in the structures. These relationships, consisting of one or more semantic relations connected together, constitute semrel paths between two words.</Paragraph> <Paragraph position="1"> For example, the semrel path between car and person in Figure 2 above is: car~--Tobj---drive--Tsub--)motorist--Hyp--~person.</Paragraph> <Paragraph position="2"> An extended semrel path is a path created from sub-paths in two different inverted semrel structures. For example, car and truck are not related directly by a semantic relation or by a semrel path from any single semrel structure. However, if one allows the joining of the semantic relations car--Hyp--->vehicle and vehicle6--Hyp---Cruck, each from a different semrel structure, at the word vehicle, the semrel path car--Hyp-~vehicle6--Hyp--Cruck results. Adequately constrained, extended semrel paths have proven invaluable in determining the relationship between words in MindNet that would not otherwise be connected.</Paragraph> <Paragraph position="3"> Semrel paths are automatically assigned weights that reflect their salience. The weights in MindNet are based on the computation of averaged vertex probability, which gives preference to semantic relations occurring with middle frequency, and are described in detail in Richardson (1997). Weighting schemes with similar goals are found in work by Braden-Harder (1993) and Bookman (1994).</Paragraph> </Section> class="xml-element"></Paper>