File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/86/c86-1089_intro.xml
Size: 5,987 bytes
Last Modified: 2025-10-06 14:04:32
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1089"> <Title>Learning the Space of Word Meanings for Information Retrieval Systems</Title> <Section position="4" start_page="376" end_page="376" type="intro"> <SectionTitle> 3. Learning the Space </SectionTitle> <Paragraph position="0"> The outline for the use and learning of semantic space for an information retrieval system is as follows: null 1. A user gives a query to the information retrieval system. The query is recognized as a sequence of words. Parsing of the query sentence is not done. So users generally give what they think are key words for search. For example, a user who wants a paper on the influence of Goethe oil modern Japanese literature asks the system 'Goethe modern Japanese literature'.</Paragraph> <Paragraph position="1"> 2. The system searches in the semantic space for the same words tile user gave. If such words are found, the system presents to the user the neighbor spaces of the words. If no such words are found, the system presents an overview of the whole space (mainly the middle space, i.e. the space for writers and works).</Paragraph> <Paragraph position="2"> so that the selected papers in step 5 are located in shorter distance and selected sub-spaces ill step 3 are located in shorter distance, and then th.e system puts tile query words in the location above the selected papers.</Paragraph> <Paragraph position="3"> We have implemented a system called ML0(Model Learner version0) that realizes the above mentioned steps. Fig.2 shows the configuration of the system. The system is written in Lisp and P\]\[,/I.</Paragraph> <Paragraph position="4"> Tile monitor monitors all tile functions of the system. It has special variables named *inconsistent and *attention.</Paragraph> <Paragraph position="5"> qnconsistent is the variable for storing a pair' of entities for which the distance in the semantic space is different from the estimated distance. The estimation of the distance is done as follows. When the initial semantic space is built, the distanec between two papers is estimated, with some normalization, by the inverse of the number of occurrences of same words in tile titles, and the dist.anee between two</Paragraph> <Paragraph position="7"> .... '. flow of contrail Fig.?_ Configuration of the system entities (other than papers) is estimated by the inverse of tile number of the papers which include both entities in title. When tile semantic space is reconstructed, the distance between entities which a user selected is estimated to some fixed small value, and the distance between entities which the system presented, but only one of which the user selected, is estimated to some fixed large value. The monitor judges that a user is satisfied if the real distance ill semantic space is tile same as the estimated distance. When the monitor detects the user's dissatisfaction, i.e. the difference between the real distance and the estimated distance, it registers in *inconsistent the pair of entities which caused the problem.</Paragraph> <Paragraph position="8"> *attention is the w~riable for limiting the space for consideration. The monitor monitors the spac.e only ill the scope of *attention. This improves the efficiency of search and reconstruction.</Paragraph> <Paragraph position="9"> The monitor triggers the space reconstructor after one session of query and answer if *inconsistent has value.</Paragraph> <Paragraph position="10"> The space reconstructor plays the role of reconstrucUng the semantic space so that the user can be satisfied. It uses a heuristic procedure for space reconstruction mentioned below.</Paragraph> <Paragraph position="11"> 1.Select one pair from *inconsistent. (In the current version of tile system, the pair which caused tile largest inconsistency is selected.) 2.Inspect tile density of the neighbor space for each entity ill the pair, and decide to move tile entity with less dense neighbors.</Paragraph> <Paragraph position="12"> 3.Enumerate the posiible new positions for the moving entity. (Ill the current version of the system, there are eight new candidate positions around another entity where the distance between the two entities is equal to the estimated value.) 4.Select from among them one position which causes the least new inconsistency.</Paragraph> <Paragraph position="13"> 5.Check new inconsistencies and register them in *inconsistent.</Paragraph> <Paragraph position="14"> 6.Go to step 1.</Paragraph> <Paragraph position="15"> The monitor monitors the whole reconstruction process and stops the process by raising the threshold to judge the inconsistency when it judges that the reconstrnetion takes too much time.</Paragraph> <Paragraph position="16"> Fig.3 shows an example of the process of space reconstruction. Ill Fig.3(a), the distance between the entities A and B was 10. Let's assume that a new estimation for tile distance is 5. The reconstructor looks around the neighbors of both entities, and decides to move tile entity B because the neighbors of B are less dense than those of A. The reconstructor selects one position that causes the least new inconsistency, from among eight positions around A, for B to be placed in. In Fig.g(b), \]:3 is placed to the left of A. New inconsistencies ill the scope of *attention such as inconsistency about B and G are checked and registered in *inconsistent. After a few trial loops to decrease inconsistency, the space settles in the configuration shown in Fig.3(c), which includes no inconsistency.</Paragraph> <Paragraph position="17"> Of course we can use more mathematical methods (e.g. matrix transformation of distance) for space reconstruction. However, the above mentioned heuristic procedure works more efficiently than mathematical methods, because so many pairs causing inconsistency are not detected at once due to the limitation of attention and rather small density of the world of literature.</Paragraph> </Section> class="xml-element"></Paper>