File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/i05-2005_relat.xml

Size: 6,384 bytes

Last Modified: 2025-10-06 14:15:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2005">
  <Title>A Novel Method for Content Consistency and Efficient Full-text Search for P2P Content Sharing Systems</Title>
  <Section position="3" start_page="25" end_page="26" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> The two kinds of researches related to our work are researches on content consistency maintenance and those on information search in a P2P environment.</Paragraph>
    <Paragraph position="1"> In this paper, we refer to a hybrid P2P system, such as Napster that uses a central server, as a P2P system, although it is not entirely decentralized. This is because, even a hybrid P2P system has an important advantage in terms of content sharing; it can distribute large amounts of contents with less bandwidth consumption on the service providers side.</Paragraph>
    <Section position="1" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
2.1 Contents Consistency Maintenance
</SectionTitle>
      <Paragraph position="0"> Since the contents are stored on clients in a P2P content sharing system, malicious clients can tamper with the contents if no protection method against tampering is provided.</Paragraph>
      <Paragraph position="1"> The MD5 hash function in the protocol of Napster [4] enables a content publisher to send the hash value of a content to a central server when it publishes the content. Freenet [2] prevents tampering with the content by using the hash value of a content as its key. This technique is effective in preventing the tampering of static content such as a movie or music content. However, when this technique is applied to frequently updated contents, each version is treated as a separate content because different versions have different keys. To handle such frequently updated contents, Freenet introduced indirect files in which the hash values of the contents are stored. By retrieving an indirect file, a user can retrieve the last updated content in two steps. In order to share frequently updated contents, we need to provide a mechanism that associates the content ID with the hash value of a particular version of the content, as in the case of Freenet.</Paragraph>
      <Paragraph position="2"> Another problem of P2P content sharing systems is that the provider of a content sharing service cannot trace the exchange of contents among users.</Paragraph>
      <Paragraph position="3"> Napster, which is a centralized P2P content sharing system similar to our system, uses a download protocol by which the clients send a download request to the central server before they download the content from another client. After this, the central server does not participate in the download process of the content. Using this protocol, the central server cannot identify whether a download has been carried out successfully or not. A malicious client can send the same information to the central server and pretend that a download request has been made by another client. It is also possible to send tampered content to another client without being detected by the central server.</Paragraph>
    </Section>
    <Section position="2" start_page="25" end_page="26" type="sub_section">
      <SectionTitle>
2.2 Information Search in P2P Environment
</SectionTitle>
      <Paragraph position="0"> The two types of search techniques that are widely used in P2P content sharing systems include using a central search server [4] and flooding of search requests [6].</Paragraph>
      <Paragraph position="1"> The problems of using a central server, such as poor scalability of a central search server and vulnerability that arises from a single point of failure, are widely known. The flooding of search requests also has scalability problems. As the number of nodes in a network increases, more search requests are flooded that consume a major part of the bandwidth. In order to reduce search requests, many systems use flooding techniques that often limit the search range with heuristic methods. As a result, it cannot be assured that all existing contents in a network can be found in these systems.</Paragraph>
      <Paragraph position="2"> In order to solve the problems associated with the above mentioned techniques, several search methods based on distributed hash tables (DHT) have been proposed [5] [7]. These methods are scalable to a considerable extent. A characteristic of these methods is that exact match key search can be done with O (log n) or O (n a ) hops.</Paragraph>
      <Paragraph position="3"> Reynolds and Vahdat proposed a method for implementing full-text search by distributing the reverse index on a DHT. In this method, a key in a hash table corresponds to a particular keyword in a document, and a value in a hash table corresponds to a document that contains a keyword. A client that publishes a document notifies the nodes that correspond to the keywords contained in the document and updates the reverse indexes on these nodes. In this method, the load of the full-text search can be distributed among the nodes. We can also expect that the reverse indexes on the nodes can be updated rapidly by pushing the latest keywords in the contents from a client.</Paragraph>
      <Paragraph position="4"> On the other hand, this method has several limitations. For example, when an AND search is performed by this method, the search results must be transferred between the nodes. Li estimated the amount of resources that is necessary to implement a full-text search engine based on this method and pointed out that it is difficult to implement a large-scale search engine, such as Google, by this method [8].</Paragraph>
      <Paragraph position="5"> Furthermore, if this method were applied to a P2P content sharing system, the problem of low availability of nodes would arise because the users' PCs would be used as nodes in such a system. In order to store reverse indexes on the nodes, we have to replicate them to ensure the availability of indexes. This would require more resources than that estimated by Li.</Paragraph>
      <Paragraph position="6">  Based on the above mentioned reasons, we believe that a full-text search technique using a central search server that manages reverse indexes is more feasible than a distributed reverse index technique for implementing a full-text search engine in a P2P environment. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML