File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1115_intro.xml

Size: 9,581 bytes

Last Modified: 2025-10-06 14:01:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1115">
  <Title>Temporal Ranking for Fresh Information Retrieval</Title>
  <Section position="3" start_page="0" end_page="4" type="intro">
    <SectionTitle>
2. Temporal Information Retrieval
</SectionTitle>
    <Paragraph position="0"> The value of information is determined by the ratio of the number of information consumers who want the information to the number of information providers who have the information. If the number of information providers increases then the information value decreases. Information that is known to everyone is called common knowledge.</Paragraph>
    <Paragraph position="1"> According to Shannon's information theory, information is entropy. In other words, information creates a system from chaos, although the system is temporary and will soon diffuse. Information value is at its highest when the system is first created. Therefore, the freshest information is the most valuable. Information retrieval is the process of finding valuable information, and in this sense, fresh information retrieval is extremely important.</Paragraph>
    <Paragraph position="2"> It is clear that fresh information retrieval is a special type of temporal information retrieval. Temporal information retrieval is the process of extracting time-varying information. A document may be modified any time after it is created, and hence a document consists of time-varying information. For example, a word which was included in a document before modifying often is not included in a document after modifying. Therefore, time-varying information must be retrieved with the time specified. This is quite natural and such temporal information retrieval is available for digital libraries.</Paragraph>
    <Section position="1" start_page="0" end_page="4" type="sub_section">
      <SectionTitle>
2.1 Temporal Database
</SectionTitle>
      <Paragraph position="0"> Although information retrieval is not data retrieval, the theoretical background of temporal information retrieval is in temporal databases. A temporal database is a database to which a time interval can be specified as a query. The time interval is based on temporal interval logic proposed by J. F.</Paragraph>
      <Paragraph position="1"> Allen[14]. Therefore, temporal information retrieval must support time intervals as part of a query. In a temporal database, the unit of time is the chronon.</Paragraph>
      <Paragraph position="2"> The granularity of a chronon is selected from year, month, day, hour, minute, and second.</Paragraph>
      <Paragraph position="3"> Assume that there are time points t</Paragraph>
      <Paragraph position="5"> . The following relations exist among time points X and Y, and time intervals A and B.  '] In a temporal database, there are 2 kinds of times: valid times and transaction times. Valid times concern facts that are true in modeled reality. Transaction times concern facts that are current in the database.</Paragraph>
      <Paragraph position="6"> In general, a valid time DB stores only fresh data, whereas a transaction time DB stores the complete history of the data. A bitemporal DB supports both kinds of data.</Paragraph>
    </Section>
    <Section position="2" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
2.2 The Concept of Temporal Information
Retrieval
</SectionTitle>
      <Paragraph position="0"> In this paper, temporal information retrieval is defined as determining whether or not a document exists at a time point or in a time interval. This is in contrast to whether or not the content of a document includes the specified time. For example, assume that a document containing the text In 2002, the FIFA World Cup will be held in Korea and Japan was written in 1998. In the former case, this document would be retrieved with the query, 1998 and (Korea or Japan). In the latter case, this document would be retrieved with the query, 2002 and (Korea or Japan). The number 1998 in the former case is the modified time of the document.</Paragraph>
      <Paragraph position="1"> The number 2002 in the latter case is a keyword in the text of the document. This latter type of retrieval is classified as a query expansion or a numerical query. We discuss temporal information retrieval in the former sense.</Paragraph>
      <Paragraph position="2"> Assume that a document always contains facts. In this case, a fact in temporal information retrieval means the existence of the document. Valid time is the time when the document exists in the real world, and transaction time denotes the time when the document is indexed.</Paragraph>
      <Paragraph position="3"> The lifetime of a document depends on the document model, and there are two kinds of models. The first is the immutable model, in which the lifetime of a document is equivalent to the lifetime of the information. The information is the content of the document, and when a document is modified, the information is also changed.</Paragraph>
      <Paragraph position="4"> Therefore, an old document is deleted and a new document is created at every modification time. The second type of model is the mutable model, in which the modification of a document is allowed. In this model, when a document is modified, the content of the document is changed but the document itself is not changed. So, in the mutable model, a document exists from the time it is created to the time it is deleted, although its content may change multiple times. In the immutable model, a document exists only from one modification time to another. From the viewpoint of the users the retrieval result, with the exception of time, is not dependent on the document model. However, in the immutable model, the retrieval result is based on the modification time, whereas in the mutable model, it is based on the creation time.</Paragraph>
      <Paragraph position="5"> There are several possible interpretations of created time, modified time and deleted time. Assume that someone had information at time t  , he wrote it into a document at t  , he published the document at t  , and the document was indexed by a search engine at t  . It is important to determine what time corresponds to the origin of the information. In principle, the information is created at t  . However, it is hard to prove this fact and it is impossible to retrieve it. The time</Paragraph>
      <Paragraph position="7"> is determined by outside factors. In addition, it may not be possible for everyone to publish a web document without changing the timestamp, so, t  is not a good measure. The time t  is the published time when the document is available on the web. However, it is difficult to retrieve the document at precisely t  . In fact, we can retrieve the document after t  is used for the purpose of temporal information retrieval. In such a case, the valid time is equivalent to the transaction time. There are two kinds of temporal queries in temporal information retrieval. One is an interval query which retrieves documents existing in an interval of time. The other is a point query which retrieves documents existing at a certain time point. An interval query is also called a time slice query. A temporal query is used in conjunction with a keyword query. The retrieval results include not only the content of the documents, but also the created time and the modified time.</Paragraph>
      <Paragraph position="8"> The targets of a temporal query are the lifetime interval and the modified time point of the document. In a temporal query, temporal relations mentioned in section 2.1 may be specified.</Paragraph>
    </Section>
    <Section position="3" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
2.3 Fresh Information Retrieval
</SectionTitle>
      <Paragraph position="0"> In order to realize fully temporal information retrieval, it is necessary to store the complete history of every document's modification, however this has huge storage requirements.</Paragraph>
      <Paragraph position="1"> So instead, we introduce fresh information retrieval as a practical substitute, which retrieves the last modified versions of current documents.</Paragraph>
      <Paragraph position="2"> Temporal information retrieval is the retrieval of documents that exist during a time interval. Fresh information retrieval is not the retrieval of documents that have current content, but to retrieve current documents which exist with content during a time interval. With fresh information retrieval, huge storage is unnecessary because only the last modified version of a document is stored. Also, fresh information retrieval supports all the functions of temporal information retrieval except that the retrieved document is the current version. In section 2.1, we described that a valid time DB stores only current versions of documents. In this sense, fresh information retrieval is valid time information retrieval.</Paragraph>
      <Paragraph position="3"> We illustrate 3 kinds of information retrieval in Fig. 1. In this figure, there are 3 documents D  , and the black dots represent modification events. In non-temporal information retrieval, documents which exist at the current point in time are retrieved. In Fig. 1, D  are retrieved in the same way as in non-temporal information retrieval. However, D  is retrieved with the temporal query shown as the dashed rectangle in Fig. 1. Non-temporal information retrieval does not support such a query. Finally, in fully temporal information retrieval, all  may be retrieved with any temporal query. For example, D  exists as 3 versions separated by two modifications.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML