File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0903_intro.xml

Size: 3,208 bytes

Last Modified: 2025-10-06 14:02:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0903">
  <Title>Constructing Text Sense Representations</Title>
  <Section position="2" start_page="0" end_page="2" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Many important tasks in the field of Natural Language Processing (NLP) such as text categorization, text summarization, (semi-) automatic translation and such require a certain amount of world knowledge and knowledge about text meaning and sense (Allen, 1995; R. Cole et al., 1995).</Paragraph>
    <Paragraph position="1"> Handling the amount of textual data in the World Wide Web also increasingly requires advanced automatic text and language processing techniques: successful search engines like Google (Google, Inc., 2004) already employ text retrieval and information extraction methods based on shallow semantic information. null There are many methodologies to generate word sense representations, but efficiency and effectivity of fully automated techniques tends to be low (Diana Zaiu Inkpen and Graeme Hirst, 2003). Furthermore, formalisation and quantification of evaluation methods is difficult because in general word sense related techniques are only verifyable through theoretical examination, application on language or human judges (Alexander Budanitsky and Graeme Hirst, 2001), i.e. there is no inherent validation because there is no direct connection to the world as perceived by humans. In the case of frequency based word sense representations corpus related difficulties arise (number of tagged entities, corpus quality, etc.). In order to overcome these limitations, we developed a methodology to generate and use explicit computer-usable representations of text senses.</Paragraph>
    <Paragraph position="2"> A common understanding of the &amp;quot;sense&amp;quot; of words is defined by the ways the word is used in context, i.e. the interpretation of the word that is consistent with the text meaning  - as summarized by S. G.</Paragraph>
    <Paragraph position="3"> Pulman in (R. Cole et al., 1995, Section 3.5). Extending this definition onto full texts, we introduce our notion of &amp;quot;Text Sense Representation&amp;quot; (TSR) as &amp;quot;the set of possible computer usable interpretations of a text without respect to a particular linguistic context&amp;quot;  .</Paragraph>
    <Paragraph position="4"> TSR Trees provide detailed answers to questions like &amp;quot;how close are these n words topically related to each other?&amp;quot;, &amp;quot;are these m sentences really about the same topic?&amp;quot; or &amp;quot;how much does paragraph x contribute to topic y?&amp;quot;. They cannot tell e.g. a telephone is a physical artifact, it's purpose is to enable distant communication, etc.</Paragraph>
    <Paragraph position="5"> TSR Trees are not meant to substitute meaning acquired through conceptual or linguistic analysis but are rather aimed at: * augmenting deeper (linguistic or conceptual) methodologies by providing additional analysis clues * standalone usage in generic shallow methods (e.g. in shallow text categorization) and specific applications (e.g. anti-spam functionality)</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML