Distributed hash table

From Organic Design wiki

Overview

Distributed hash tables (DHTs) are a class of decentralized distributed systems (based on the hash algorithm) that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

DHTs form an infrastructure that can be used to build more complex services, such as distributed file systems, p2p file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Applications that use DHTs include BitTorrent, Overnet, YaCy, and the Coral Content Distribution Network.

DHTs are scalable network infrastructures that support Internet-scale network applications utilizing a decentralized resource model. At their core, these overlays provide key based routing (KBR), where messages addressed to any Key will incrementally route towards an overlay node responsible for that key. On top of the KBR layer, these overlays can support distributed storage using a DHT layer or data location using a DOLR layer.

Along with Pastry, Chord, and CAN, Tapestry was one of the first structured overlay networks proposed in 2001. Since then, at least 20 structured protocols have been proposed with varying properties, including SkipNet/SkipGraph, Kademlia, Viceroy, Z-Ring among many others. On top of these overlays, researchers have proposed numerous distributed applications, including distributed storage and backup systems, multicast systems, resilient routing networks, distributed spam filters, mobility support and anonymous routing networks.

PHT

Distributed Hash Tables are scalable, robust, and self-organizing peer-to-peer systems that support exact match lookups. This paper describes the design and implementation of a Prefix Hash Tree - a distributed data structure that enables more sophisticated queries over a DHT:

The Prefix Hash Tree uses the lookup interface of a DHT to construct a trie-based structure that is both efficient (updates are doubly logarithmic in the size of the domain being indexed), and resilient (the failure of any given node in the Prefix Hash Tree does not affect the availability of data stored at other nodes).

The Kademlia DHT

Kademlia is a distributed hash table for decentralized peer to peer networks designed by Petar Maymounkov and David Mazières. It specifies the structure of the network and how the exchange of information has to take place through network node lookups. Kademlia nodes communicate among themselves using the UDP. Over an existing network, a new virtual or overlay network is created in which each node is identified by a number or node ID. The node ID serves not only as a nodes identification, but the Kademlia algorithm uses it to locate values (usually file hashes or keywords). In fact, the node ID provides a direct map to file hashes.

When searching for some value, the algorithm explores the network in several steps. Each step approaches the key until the contacted node returns the value or no more closer nodes are found. Like many other DHTs, Kademlia contacts only [math]O(\log n)[/math] (see Big O notation) nodes during the search out of a total of [math]n[/math] nodes in the system.

Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these "holes".

Tapestry

The first generation of peer-to-peer applications, including Napster and Gnutella, had restricting limitations such as a central directory for Napster and scoped broadcast queries for Gnutella limiting scalability. To address these problems a second generation of p2p applications were developed including Tapestry, Chord, Pastry, and CAN. These overlays implement a basic key-based routing mechanism. This allows for deterministic routing of messages and adaptation to node failures in the overlay network.

Tapestry is an extensible infrastructure that provides decentralized object location and routing focusing on efficiency and minimizing message latency. This is achieved since Tapestry constructs locally optimal routing tables from initialization and maintains them in order to reduce routing stretch. Furthermore, Tapestry allows object distribution determination according to the needs of a given application. Similarly Tapestry allows applications to implement multicasting in the overlay network.

From experiments it is shown that Tapestry efficiency increases with network size so multiple applications sharing the same overlay network increases efficiency. To differentiate between applications a unique application identifier is used. Tapestry uses best-effort to publish and route objects.

  • OceanStore uses Tapestry for these reasons
  • NOTE: Tapestry is written in Java and development has stopped and been replaced by Chimera (see below) which is a "light-weight" version in C.

Pastry

Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. Pastry provides efficient request routing, deterministic object location, and load balancing in an application-independent manner. Furthermore, Pastry provides mechanisms that support and facilitate application-specific object replication, caching, and fault recovery.

  • Current implementations are Java and MS-based

Content Addressable Network

The Content Addressable Network (CAN) was one of the original four DHT proposals (Ratnasamy 2001), introduced concurrently with Chord, Pastry, and Tapestry. Although intended to be more general, the term content addressable network came to be associated with Ratnasamy et al.'s specific design.

Like other DHTs, CAN is a distributed, decentralized P2P infrastructure that provides hash table functionality on an Internet-like scale. CAN is designed to be scalable, fault tolerant, and self-organizing. CAN is built around a virtual multi-dimensional Cartesian coordinate space on a multi-torus. This d-dimensional coordinate space is completely logical. The entire coordinate space is dynamically partitioned among all the peers (N number of peers) in the system such that every peer possesses its individual, distinct zone within the overall space. A CAN peer maintains a routing table that holds the IP address and virtual coordinate zone of each of its neighbor coordinates. A peer routes a message towards its destination using a simple greedy forwarding to the neighbor peer that is closest to the destination coordinates. CAN is a distributed system that maps keys onto values.

Chimera

Chimera is a C library that implements a structured, peer-to-peer system. This work is an attempt to provide a library that allows easy development of applications on top of a peer-to-peer routing infrastructure. The goals are twofold. First, we wanted to make a fast, lightweight, C implantation of a Tapestry-like system includes some of the optimizations provided by other systems. Second, we wanted to develop a system designed to export an API in line with existing work that describes how to effectively interface with such an overlay network.

The library implements a routing infrastructure much like those provided by Tapestry and Pastry. The system contains both a leaf set of neighbor nodes, which provides fault tolerance and a probabilistic invariant of constant routing progress. It also provides a PRR style routing table to improve routing time to a logarithmic factor of network size. Using this library, developers can build an application that creates an overlay network with a limited number of library calls. They can implement their own application by providing a series of up-calls that are called by the library in response to certain overlay network events.

From experiments it is shown that Tapestry (Chimera) efficiency increases with network size so multiple applications sharing the same overlay network increases efficiency. To differentiate between applications a unique application identifier is used. Tapestry uses best-effort to publish and route objects.

The library developed will serve as both a usable interface and a starting point for further research. This library implements a relatively complete version of a structured peer-to-peer system we described. It includes some of the current work in locality optimization and soft-state operations. It also provides an interface that can be used as is to develop applications, but that will allow for the infrastructure to be changed with little impact on existing application code.

Chimera is programmed in C and replaces Tapestry which was in Java. From what I can gather, development on Tapestry stopped sometime in 2003, and then the first useable release of Chimera (version 0.7) was late 2005. Chimera 1.0 was released in early 2006. The head of the development team is Ben Zhao, an Assistant Professor in the computer science department at the University of California. The code is all released under the GPL liscence.

Installation on Debian and Ubuntu

To install on Debian or Ubuntu, do the usual configure && make && make install after installing automake, g++ and libssl-dev.

GNUnet

GNUnet is a framework for secure peer-to-peer networking that does not use any centralized or otherwise trusted services. A first service implemented on top of the networking layer allows anonymous censorship-resistant file-sharing. GNUnet uses a simple, excess-based economic model to allocate resources. Peers in GNUnet monitor each others behavior with respect to resource usage; peers that contribute to the network are rewarded with better service.

GNUnet also has an easily accessible DHT which is based on the Kademlia algorithm. Secure DHT instances could be set up in GNUnet as a semantic overlay so that the content can be

TOR (The Onion Router)

Tor is a toolset for a wide range of organizations and people that want to improve their safety and security on the Internet. Using Tor can help you anonymize web browsing and publishing, instant messaging, IRC, SSH, and other applications that use the TCP protocol. Tor also provides a platform on which software developers can build new applications with built-in anonymity, safety, and privacy features.

Tor aims to defend against traffic analysis, a form of network surveillance that threatens personal anonymity and privacy, confidential business activities and relationships, and state security. Communications are bounced around a distributed network of servers called onion routers, protecting you from websites that build profiles of your interests, local eavesdroppers that read your data or learn what sites you visit, and even the onion routers themselves.

PIER

The database research community prides itself on scalable technologies. Yet database systems traditionally do not excel on one important scalability dimension: the degree of distribution. This limitation has hampered the impact of database technologies on massively distributed systems like the Internet. PIER is a massively distributed query engine based on structured overlay networks, which is intended to bring database query processing facilities to new, widely distributed environments.

Entropy

A p2p anonymous content publishing network like freenet, but in C. They're also working on a distributed SQL database.

Multi-protocol connectors

See also