File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1022_intro.xml

Size: 3,090 bytes

Last Modified: 2025-10-06 14:02:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1022">
  <Title>Supersense Tagging of Unknown Nouns in WordNeta0</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Unknown Words and Semantic
Classification
</SectionTitle>
    <Paragraph position="0"> Language processing systems make use of &amp;quot;dictionaries&amp;quot;, i.e., lists that associate words with useful information such as the word's frequency or syntactic category. In tasks that also involve inferences about world knowledge, it is useful to know something about the meaning of the word. This lexical semantic information is often modeled on what is found in normal dictionaries, e.g., that &amp;quot;irises&amp;quot; are flowers or that &amp;quot;exane&amp;quot; is a solvent.</Paragraph>
    <Paragraph position="1"> This information can be crucial in tasks such as question answering - e.g., to answer a question such as &amp;quot;What kind of flowers did Van Gogh paint?&amp;quot; (Pasca and Harabagiu, 2001) - or the individuation of co-referential expressions, as in the passage &amp;quot;... the prerun can be performed with a2a4a3a6a5a8a7a9a2a11a10 ... this a12a14a13 a15a17a16 a2a18a7a20a19 a10 can be considered ...&amp;quot; (Pustejovsky et al., 2002).</Paragraph>
    <Paragraph position="2"> Lexical semantic information can be extracted from existing dictionaries such as WordNet. However, these resources are incomplete and systems that rely on them often encounter unknown words, even if the dictionary is large. As an example, in the Bllip corpus (a very large corpus of Wall Street Journal text) the relative frequency of common nouns that are unknown to WordNet 1.6 is approximately 0.0054; an unknown noun occurs, on average, every eight sentences. WordNet 1.6 lists 95,000 noun types. For this reason the importance of issues such as automatically building, extending or customizing lexical resources has been recognized for some time in computational linguistics (Zernik, 1991).</Paragraph>
    <Paragraph position="3"> Solutions to this problem were first proposed in AI in the context of story understanding, cf.</Paragraph>
    <Paragraph position="4"> (Granger, 1977). The goal is to label words using a set of semantic labels specified by the dictionary.</Paragraph>
    <Paragraph position="5"> Several studies have addressed the problem of expanding one semantic category at a time, such as &amp;quot;vehicle&amp;quot; or &amp;quot;organization&amp;quot;, that are relevant to a particular task (Hearst, 1992; Roark and Charniak, 1998; Riloff and Jones, 1999). In named-entity classification a large set of named entities (proper nouns) are classified using a comprehensive set of semantic labels such as &amp;quot;organization&amp;quot;, &amp;quot;person&amp;quot;, &amp;quot;location&amp;quot; or &amp;quot;other&amp;quot; (Collins and Singer, 1999). This latter approach assigns all named entities in the data set a semantic label. We extend this approach to the classification of common nouns using a suitable set of semantic classes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML