File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/86/c86-1089_abstr.xml

Size: 4,961 bytes

Last Modified: 2025-10-06 13:46:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1089">
  <Title>Learning the Space of Word Meanings for Information Retrieval Systems</Title>
  <Section position="2" start_page="0" end_page="374" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> There have been no theories of semantics we can rely on for building a large information retrieval system. The defect in the existent theories is the lack of explanation of the mechanism for adjusting to tile real world the formal symbolic systems used in the theories; the only tMng they explain is the relation between natural language and the formal system.</Paragraph>
    <Paragraph position="1"> Those theories assume the existence of fixed and universal one-to-one relations between the basic elements in the formal system and tile entities m tile real world. For example, both Montague semantics and the situation semantics assume that we can represent the dog named Morris in the real world as some symbol like MORRIS in the formal system and that the relation between Morris and MORRIS is fixed and universal \[3,2\].</Paragraph>
    <Paragraph position="2"> However, when we consider an information retrieval system, especially in the field of study on literature, we encounter problems where the assumption does not hold. One problem is that there are entities that do not have universal symbolic representation. For example, when a researcher discovers a new entity(or notion) in literature and writes a paper on that entity, the paper must be stored in the database but we do not have appropriate key words for that entity. When tile entity becomes well known in later years, it may be named, for example, 'overthereism'. However at the time tile entity is discovered and does not have the name 'overthereism', we must represent tile entity by a fixed set of symbols, but it is not easy. Another problem is that the range of what is meant by a symbol differs among the users of an information retrieval system. For example, we cannot identify the fixed meaning of 'romanticism'. Every user assumes different meanings of 'romanticism' and it is not easy to control the meaning. The latter problem has been considered in the studies of fuzzy meanings, but, so far, the former problem has not been considered m the studies on semantics.</Paragraph>
    <Paragraph position="3"> In order to solve the above mentioned problems, we propose a notion of semantie space and the learning mechanism of the space. Our assumption is that the entities which could not be represented by a fixed set of symbols can be identified in some semantic space by the location the entity should be settled in. Although whether this assumption is universally valid is problematic, we have proved that this assumption is effective in information retrieval systems in the field of studies on literature. We believe that the fieht of literature includes essential problems and has jnst enough complexity to give as evidence for a general discussion on semantics.</Paragraph>
    <Paragraph position="4"> The semantic space is an Euclidean space where entities and words are scattered. Tile crucial point of our idea is that the axes of the space are not given beforehand but are generated through learning from  tile interaction between a user' and the iuformation retrieval sy:C/tem. Since tile axes of t, he space are not given beforehand, the system can adjust the configuration, of tile space for absorbing new entities. In chapter 2, we describe in detail the notion of semantic space, explaining what are the entities and words in an information retrieval system for lit,era-Lure studies, and we show how the meanings of words are represented in the space.</Paragraph>
    <Paragraph position="5"> In chapter 3, we describe tile learlting mechanism of the semantic space. Generally speaking, in tile studies on machine learning, it. has been revealed that the mechanism for controlling the learning process is important; without such mectlanisms, the result of tile learning becomes Leo general or too specific. Ill the learning process proposed ill this paper, we use a user's satisfaction as tile eontrollhlg criterion for' learning. The result of the learning is a semantic space that just mirrors tile world of literature existing in the user's mind. The reason we use the term 'learning' instead of 'acquiring' is that the information the system gets is not tile direct exprcs.sion of the meanings of words a user' has in his mind but indirect and partial i~tformaLion giveu Lhrongh the interaction between a user and the information retrieval system.</Paragraph>
    <Paragraph position="6"> Ill chapter 4, we evaluate Lhe effectiveness of the proposed ideas through all experiment. IL is shown LhaL entities that could not be retrieved by conventional key words can be rctrie.ved in our' system. In chapter 5, we refer to related works and summarize our contribution.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML