Distributed hash table
Overview
Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
DHTs form an infrastructure that can be used to build more complex services, such as distributed file systems, p2p file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Applications that use DHTs include BitTorrent, Overnet, YaCy, and the Coral Content Distribution Network.
The Kademlia DHT
Kademlia is a distributed hash table for decentralized peer to peer networks designed by Petar Maymounkov and David Mazières. It specifies the structure of the network and how the exchange of information has to take place through network node lookups. Kademlia nodes communicate among themselves using the UDP. Over an existing network, a new virtual or overlay network is created in which each node is identified by a number or node ID. The node ID serves not only as a nodes identification, but the Kademlia algorithm uses it to locate values (usually file hashes or keywords). In fact, the node ID provides a direct map to file hashes.
When searching for some value, the algorithm explores the network in several steps. Each step approaches the key until the contacted node returns the value or no more closer nodes are found. Like many other DHTs, Kademlia contacts only [math]O(\log n)[/math] (see Big O notation) nodes during the search out of a total of [math]n[/math] nodes in the system.
Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these "holes".
- Kademlia - they use a wiki too
- Kademlia.pdf
- AMule - confluence of the current range of p2p protocols eg eDonkey2K etc
- infoanarchy article
Tapestry
The first generation of peer-to-peer applications, including Napster and Gnutella, had restricting limitations such as a central directory for Napster and scoped broadcast queries for Gnutella limiting scalability. To address these problems a second generation of p2p applications were developed including Tapestry, Chord, Pastry, and CAN. These overlays implement a basic key-based routing mechanism. This allows for deterministic routing of messages and adaptation to node failures in the overlay network.
Tapestry is an extensible infrastructure that provides decentralized object location and routing focusing on efficiency and minimizing message latency. This is achieved since Tapestry constructs locally optimal routing tables from initialization and maintains them in order to reduce routing stretch. Furthermore, Tapestry allows object distribution determination according to the needs of a given application. Similarly Tapestry allows applications to implement multicasting in the overlay network.
Pastry
Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. Pastry provides efficient request routing, deterministic object location, and load balancing in an application-independent manner. Furthermore, Pastry provides mechanisms that support and facilitate application-specific object replication, caching, and fault recovery.