Distributed hash table
Contents
Overview
Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
DHTs form an infrastructure that can be used to build more complex services, such as distributed file systems, p2p file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Applications that use DHTs include BitTorrent, Overnet, YaCy, and the Coral Content Distribution Network.
DHTs are scalable network infrastructures that support Internet-scale network applications utilizing a decentralized resource model. At their core, these overlays provide Key-Based Routing (KBR), where messages addressed to any Key will incrementally route towards an overlay node responsible for that key. On top of the KBR layer, these overlays can support distributed storage using a DHT layer or data location using a DOLR layer.
Along with Pastry, Chord, and CAN, Tapestry was one of the first structured overlay networks proposed in 2001. Since then, at least 20 structured protocols have been proposed with varying properties, including SkipNet/SkipGraph, Kademlia, Viceroy, Z-Ring among many others. On top of these overlays, researchers have proposed numerous distributed applications, including distributed storage and backup systems, multicast systems, resilient routing networks, distributed spam filters, mobility support and anonymous routing networks.
The Kademlia DHT
Kademlia is a distributed hash table for decentralized peer to peer networks designed by Petar Maymounkov and David Mazières. It specifies the structure of the network and how the exchange of information has to take place through network node lookups. Kademlia nodes communicate among themselves using the UDP. Over an existing network, a new virtual or overlay network is created in which each node is identified by a number or node ID. The node ID serves not only as a nodes identification, but the Kademlia algorithm uses it to locate values (usually file hashes or keywords). In fact, the node ID provides a direct map to file hashes.
When searching for some value, the algorithm explores the network in several steps. Each step approaches the key until the contacted node returns the value or no more closer nodes are found. Like many other DHTs, Kademlia contacts only [math]O(\log n)[/math] (see Big O notation) nodes during the search out of a total of [math]n[/math] nodes in the system.
Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these "holes".
- CSpace is a P2P application space written in Python which uses Kademlia
- Kademlia - they use a wiki too (a MediaWiki version 1.2!)
- Kademlia.pdf
- AMule - confluence of the current range of p2p protocols eg eDonkey2K etc
- infoanarchy article
Tapestry
The first generation of peer-to-peer applications, including Napster and Gnutella, had restricting limitations such as a central directory for Napster and scoped broadcast queries for Gnutella limiting scalability. To address these problems a second generation of p2p applications were developed including Tapestry, Chord, Pastry, and CAN. These overlays implement a basic key-based routing mechanism. This allows for deterministic routing of messages and adaptation to node failures in the overlay network.
Tapestry is an extensible infrastructure that provides decentralized object location and routing focusing on efficiency and minimizing message latency. This is achieved since Tapestry constructs locally optimal routing tables from initialization and maintains them in order to reduce routing stretch. Furthermore, Tapestry allows object distribution determination according to the needs of a given application. Similarly Tapestry allows applications to implement multicasting in the overlay network.
- From experiments it is shown that Tapestry efficiency increases with network size so multiple applications sharing the same overlay network increases efficiency. To differentiate between applications a unique application identifier is used. Tapestry uses best-effort to publish and route objects.
- OceanStore uses Tapestry for these reasons
- NOTE: Tapestry is written in Java and development has stopped and been replaced by Chimera (see below) which is a "light-weight" version in C.
Pastry
Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. Pastry provides efficient request routing, deterministic object location, and load balancing in an application-independent manner. Furthermore, Pastry provides mechanisms that support and facilitate application-specific object replication, caching, and fault recovery.
- Current implementations are Java and MS-based
Content Addressable Network
The Content Addressable Network (CAN) was one of the original four DHT proposals (Ratnasamy 2001), introduced concurrently with Chord, Pastry, and Tapestry. Although intended to be more general, the term content addressable network came to be associated with Ratnasamy et al.'s specific design.
Like other DHTs, CAN is a distributed, decentralized P2P infrastructure that provides hash table functionality on an Internet-like scale. CAN is designed to be scalable, fault tolerant, and self-organizing. CAN is built around a virtual multi-dimensional Cartesian coordinate space on a multi-torus. This d-dimensional coordinate space is completely logical. The entire coordinate space is dynamically partitioned among all the peers (N number of peers) in the system such that every peer possesses its individual, distinct zone within the overall space. A CAN peer maintains a routing table that holds the IP address and virtual coordinate zone of each of its neighbor coordinates. A peer routes a message towards its destination using a simple greedy forwarding to the neighbor peer that is closest to the destination coordinates. CAN is a distributed system that maps keys onto values.
Chimera
Chimera is a light-weight C implementation of a "next-generation" structured overlay that provides similar functionality as prefix-routing protocols Tapestry and Pastry. Chimera gains simplicity and robustness from its use of Pastry's leafsets, and efficient routing from Tapestry's locality algorithms. In addition to these properties, Chimera also provides efficient detection of node and network failures, and re-routes messages around them to maintain connectivity and throughput.
Along w/ academic research, Chimera is currently being used by projects at industry labs, as part of DoD research, and by local startups. It also provides a research infrastructure for both graduate and undergraduate networking class projects at UCSB. Three MS graduates have used it for their MS projects. To get timely (but infrequent) notifications of updates, bug fixes and new releases to Chimera, please join the Chimera-users GoogleGroup.
- Chimera is programmed in C and replaces Tapestry which was in Java
- chimera-1.10.zip