File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1191_intro.xml

Size: 6,452 bytes

Last Modified: 2025-10-06 14:02:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1191">
  <Title>Inferring parts of speech for lexical mappings via the Cyc KB</Title>
  <Section position="3" start_page="0" end_page="1" type="intro">
    <SectionTitle>
5.
2 Cyc knowledge base
</SectionTitle>
    <Paragraph position="0"> In development since 1984, the Cyc knowledge base (Lenat, 1995) is the world's largest formalized representation of commonsense knowledge, containing over 120,000 concepts and more than a million axioms.</Paragraph>
    <Paragraph position="1">  Cyc's upper ontology describes the most general and fundamental of distinctions (e.g., tangibility versus intangibility). The lower ontology contains facts useful for particular applications, such as web searching, but not necessarily required for commonsense reasoning (e.g., that  These figures and the results discussed later are basedonCycKBversion576andOpenCycKBversion 567.</Paragraph>
    <Paragraph position="2"> &amp;quot;Dubya&amp;quot; refers to President George W.Bush). The KB also includes a broad-coverage English lexicon mapping words and phrases to terms throughout the KB. A subset of the Cyc KB including parts of the English lexicon has been made freely available as part of OpenCyc (www.opencyc.org).</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Ontology
</SectionTitle>
      <Paragraph position="0"> Central to the Cyc ontology is the concept collection, which corresponds to the familiar notion of a set, but with membership intensionally defined (so distinct collections can have identical members, which is impossible for sets). Every object in the Cyc ontology is a member (or instance,inCyc parlance) of one or more collections. Collection membership is expressed using the predicate (i.e., relation-type) isa, whereas collection subsumption is expressed using the transitive predicate genls (i.e., generalization). These predicates correspond to the set-theoretic notions element of and subset of respectively and thus are used to form a partially ordered hierarchy of concepts. For the purposes of this discussion, the isa and genls assertions on a Cyctermconstituteitstype definition.</Paragraph>
      <Paragraph position="1"> Figure 1 shows the type definition for PhysicalDevice, a prototypical denotatum term for count nouns. The type definition of PhysicalDevice indicates that it is a collection that is a specialization of Artifact, etc. As is typical for terms referred to by count nouns, it is an instance of the collection ExistingObjectType.</Paragraph>
      <Paragraph position="2"> Figure 2 shows the type definition for Water,a prototypical denotation for mass nouns. Although the asserted type information for Water does not convey any properties that would suggest a mass noun lexicalization, the genls hierarchy of collections does. In particular, the collection Chemical-CompoundTypeByChemicalSpecies is known to be a specialization of the collection ExistingStuffType, via the transitive properties of genls. Thus, by virtue of being an instance of ChemicalCompound-TypeByChemicalSpecies, Water is known to be an instance of ExistingStuffType. This illustrates that the decision procedure for the lexical mapping speech parts needs to consider not only asserted, but also inherited collection membership.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.2 English lexicon
</SectionTitle>
      <Paragraph position="0"> Natural language lexicons are integrated directly into the Cyc KB (Burns and Davis, 1999). Though several lexicons are included in the KB, the English lexicon is the only one with general coverage. The mapping from nouns to concepts is done using one of two general strategies, depending on whether the  mapping is from a name or a common noun phrase.</Paragraph>
      <Paragraph position="1"> Several different binary predicates indicate nameto-term mappings, with the name represented as a string. For example, (nameString HEBCompany &amp;quot;HEB&amp;quot;) A denotational assertion maps a phrase into a concept, usually a collection. The phrase is specified via a lexical word unit (i.e., lexeme concept) with optional string modifiers. The part of speech is specified via one of Cyc's SpeechPart constants. Syntactic information, such as the wordform variants and their speech parts, is stored with the Cyc constant for the word unit. For example, Device-TheWord, the Cyc constant for the word 'device,' has a single syntactic mapping since the plural form is inferable: Constant: Device-TheWord Microtheory: GeneralEnglishMt isa: EnglishWord posForms: CountNoun singular: &amp;quot;device&amp;quot; The simplest type of denotational mapping associates a particular sense of a word with a concept via the denotation predicate. For example,</Paragraph>
      <Paragraph position="3"> This indicates that sense 0 of the count noun 'device' refers to PhysicalDevice via the associated wordforms &amp;quot;device&amp;quot; and &amp;quot;devices.&amp;quot; To account for phrasal mappings, three additional predicates are used, depending on the location of the headword in the phrase. These are compoundString, headMedialString,andmultiWordString for phrases with the headword at the beginning, the middle, and the end, respectively.</Paragraph>
      <Paragraph position="4">  tational assertions. The other entry covers 20 infrequently used cases.</Paragraph>
      <Paragraph position="5"> This states that &amp;quot;buy down&amp;quot; refers to BuyDown, as do &amp;quot;buys down,&amp;quot; &amp;quot;buying down,&amp;quot; and &amp;quot;bought down&amp;quot; based on the inflections of the verb 'buy.' Table 1 shows the frequency of the various predicates used in the denotational assertions, excluding lexicalizations that involve technical, informal or slang terms. Table 2 shows the most frequent speech parts from these assertions. This shows that nearly 50% of the cases use CountNoun for the headword speech part and that about 25% use MassNoun. This subset of the denotational assertions forms the basis of the training data used in the mass versus count noun classifier, as discussed later. Twenty other speech parts used in the lexicon are not shown. Several of these are quite specialized (e.g., QuantifyingIndexical) and not very common, mainly occurring in fixed phrases. The full speech part classifier handles all categories.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML