Efficient Search in Unstructured Peer-to-Peer Content Distribution Systems
The search mechanism of peer-to-peer networks is widely criticized due to its flooding nature. Distributed Hash Table (DHT) algorithms have been proposed to improve the search efficiency by mapping the index of a file to a unique peer based on predefined hash functions. However, the tight coupling between indices and hosting peers incurs high maintaining cost in a highly dynamic network. To well balance the tradeoff between the cost of indexing and searching, we have proposed the distributed caching and adaptive search mechanism, where indices are passively cached in a group of peers based on a predefined hash function. Guided by the same function, adaptive search selectively forwards queries to matched peers with a high probability of caching the desired indices. We have further proposed the differentiated search to utilize the file sharing heterogeneity of peer-to-peer networks. The differentiated search gives higher priority to peers that share more files than others. We have evaluated both approaches with trace analysis and trace-driven simulations.

Software:
  • Index Cache-enabled Gutella Servant I have modified Gnutella LimeWire servant and developed an index Cache-enabled Gnutella Client (CGC). Compared with a normal Gnutella client, The CGC peer is able to create and maintain a local index cache by overhearing query responses in Gnutella network. As a result, other peers neighboring to the CGC peer will have the opportunity to utilize the index cache for future search. CGC implementation can be used to investigate the performance impact of index caching in a real, large scale peer-to-peer network.
  • Peer-to-peer trace monitoring tool The tool has been modified from Gnutella servant and hooked to the real Gnutella peer-to-peer network. I used this tool to collect traces of queries and responses of Gnutella network for trace analysis and simulation.
  • Trace driven simulator for unstructured peer-to-peer networks: Implemented with JAVA, this simulator uses traces collected from Gnutella network to simulate file distribution, searching, and caching activities of peer-to-peer networks that can be scaled to thousands of nodes.

  

Papers:
  • C. Wang and L. Xiao, "An Effective P2P Search Scheme to Exploit File Sharing Heterogeneity," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 18, no. 2, pp. 145-157, 2007.
  • C. Wang, L. Xiao, Y. Liu, and P. Zheng, "DiCAS: an Efficient Distributed Caching Mechnism for P2P Systems," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 17, no. 10, pp. 1097-1109, October 2006.
  • C. Wang, L. Xiao, and P. Zheng, "Prioritized Search in Hierarchical Peer to Peer Networks," in 34th International Conference on Parallel Processing (ICPP), Oslo, Norway, June 2005, pp. 269-276.
  • C. Wang, L. Xiao, Y. Liu, and P. Zheng, "Distributed Caching and Adaptive Search in Multilayer P2P Networks," in 24th International Conference on Distributed Computing Systems (ICDCS), Tokyo, Japan, 2004, pp. 219-226.
>