papers | collections | search login | register | forgot password?

Looking Up Data in P2P Systems
by Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica
url  show details
You need to log in to add tags and post comments.
Tags
Public comments
#1 posted on Mar 22 2008, 23:04 in collection CMU 15-744: Computer Networks -- Spring 08
This paper shows the evolution of P2P systems and compares many DHT implementations proposed. I liked the way it provided summary and comparison, but little was there about open questions that needs to be addressed.

The iris project page points to many useful papers in this area.

The paper below gives a sense on how DHTs may behave in highly dynamic conditions.
"Comparing the performance of distributed hash tables under churn"
http://pdos.csail.mit.edu/papers/dhtcomparison:iptps04/paper.pdf
#2 posted on Mar 23 2008, 09:23 in collection CMU 15-744: Computer Networks -- Spring 08
This article is a good introduction to P2P and the lookup problem. It describes several algorithms based on Distributed Hash Tables, and points out the strong and weak points of each one.
However, the paper does not give much insight on how some of the features of the protocols( proximity routing and symmetric, unidirectional distance function...) actually affect their performance.
Of all the algorithms, Chord is, in my opinion, the most interesting due to its simplicity and the fact that it is fairly robust against faillures. Unfortunately, I only know the details of Chord so my opinion may be biased. On the other hand, CAN is almost as simple and has the advantage of using Proximity Routing (althought its lookup time may increase faster than Chord's)
#3 posted on Mar 23 2008, 11:16 in collection CMU 15-744: Computer Networks -- Spring 08
This article is really useful for understanding what DHT is and for taking a quick look at its current implementations. Even though DHT addresses very nicely several problems of conventional P2P protocols, I'm skeptical about its practicality. As mentioned in one of the open questions, it doesn't provide the capability of keyword search. Without the ability to keyword search, it seems impossible to replace the conventional protocols for current popular P2P apps.
#4 posted on Mar 23 2008, 12:36 in collection CMU 15-744: Computer Networks -- Spring 08
A nice easy-read article on DHT's. It seems to me that DHT lookup shares some interesting features with routing -- we've got a lot of nodes with associated numbers, and we want to contact one of them. Of course, some things are easier (we're more-or-less on a complete graph, we just have to find which link to use) and some are harder (backbone routers don't just decide to leave every few minutes). In this sense DHT is another branch of routing protocol space.

As to layering keyword search on top of a distributed hash table:

As a first cut, hash your keyword and look it up in one table. This table has keyword->id mappings. Look up the id in a second table.

To go further, map keywords to hash keys in a not-entirely-random way -- keyword -> [REP(keyword);HASH(keyword)]. Make 'REP' something such that bit distance begins to look like 'keyword-closeness' distance. This can allow some sort of fuzzy lookup while (you hope) HASH() still keeps things load-balanced.

Of course you'll probably want some sort of multiple-word search. It isn't terribly elegant, but you might just have to do single-keyword searches and intersect the results yourself.
#5 posted on Mar 23 2008, 15:11 in collection CMU 15-744: Computer Networks -- Spring 08
1) The schemes in general seem to apply 1-to-1 mapping between keywords and nodes. How well does this work if everybody is looking up to a single node? Clearly you want some kind of replication scheme here.

2) Optimizing path metrics. Some of the schemes like CAN simply use random point assignment when initiating new node joins. Can something more intelligent be done during the joining process such that neighbors (or successors, etc) can have "overall" lower RTTs, etc.
#6 posted on Mar 23 2008, 15:23 in collection CMU 15-744: Computer Networks -- Spring 08
This paper provided a nice summary of how various P2P protocols implement a distributed hash table (DHT) for mapping a key to a host. However, there are a few crucial things that the paper does not discuss:

  • How do we replicate data stored on hosts? If a node dies, we don't want its data to die with it.
  • If a certain key is very popular, then it can't simply be associated with a single host. Instead, it must be associated with a load-balanced set of hosts.
  • As others have pointed out, keyword searching is another important functionality of P2P networks.

Also, although security was briefly discussed, I think it is important topic that needs further research. Certain entities (e.g., the RIAA) are not particularly fond of P2P networks and may attempt to infiltrate and disrupt such networks by employing ill-behaved hosts.
#7 posted on Mar 23 2008, 15:44 in collection CMU 15-744: Computer Networks -- Spring 08
This is a good introduction to the lookup problem and DHT. From this reading since I could also think about its applications in Wireless Sensor Network. Regarding one of the open issue in indexing and keyword search, I don't quite understand why the author put the question mark on the efficiency of indexing. What would be the inefficiency in such distributed indexing and keyword lookup systems which are implemented as an upper layer of the distributed hash system as in BitTorrent?
#8 posted on Mar 23 2008, 16:19 in collection CMU 15-744: Computer Networks -- Spring 08
I thought this was a very interesting survey of work on distributed hashing. As mentioned above, keyword search might be an interesting problem: As James mentioned, you can of course hash keywords, and do intersections yourself, but it would be nice to have a more elegant solution. It seems related to the problem of performing queries on encrypted data -- I know there has been some work on this, I wonder if there are connections?
#9 posted on Mar 23 2008, 16:21 in collection CMU 15-744: Computer Networks -- Spring 08
This paper is a good introductory summary on lookup algorithms in P2P systems. I have a minor question: What are the advantages and disadvantages of "routing in multiple dimensions" compared to "routing in one dimension"?
#10 posted on Mar 23 2008, 16:27 in collection CMU 15-744: Computer Networks -- Spring 08
On page 2 they state: "One might believe P2P systems are mainly used for illegal music-swapping and little else, but this would be a rather hasty conclusion." They suggest that there are other important uses, but I don't find any discussion of them (maybe I missed it). Or do they mean that in theory it is really a general-purpose construction that they expect will support more applications in the future?
#11 posted on Mar 23 2008, 16:36 in collection CMU 15-744: Computer Networks -- Spring 08
I enjoyed reading this paper. Firstly, because it gives a brief and concise overview of the benefits in using P2P systems and the challenges in designing them. Secondly, because it clearly explains and identifies the trade-offs of the various previously proposed solutions to the lookup problem, which makes the introduction of DHTs seem as a very natural and obvious choice. I also liked the symmetry- and direction-based taxonomy for the distance function metric.

The use of a d-dimensional space in CAN was interesting, because it makes it easier to pick the right trade-off point when deploying a P2P system. Choosing a larger number of dimensions forces each node to maintain more information about its neighbors, but at the same time leads to smaller lookup times.
#12 posted on Mar 23 2008, 17:51 in collection CMU 15-744: Computer Networks -- Spring 08
I am not sure whether any of DHT protocol consider this situation. Most of protocol try to distribute key uniformly among nodes in order to balance the work load. However, in reality, users often do not access all keys uniformly. Some keys are hotter (got accessed more often) than other nodes. Although the key are uniformly distributed, the work load might become unbalanced.
#13 posted on Mar 24 2008, 12:05 in collection CMU 15-744: Computer Networks -- Spring 08
This is a good survey so I can compare different DHT protocols.

To Wittawat:
There is a very interesting paper talking about the replication strategy with "non-uniform" file popularity. Although the analysis is for unstructured P2P system (using flooding), but it can still provide some insights to structured system (e.g. DHT).

http://portal.acm.org/citation.cfm?id=964725.633043