Kademlia-Like Distributed Hash Tables
Last updated
Last updated
Green Meta Distributed Hash Table (DHT), similar to Kademlia, plays a crucial role in the network component of the Green Meta project, enabling the location of other nodes in the network.
For instance, clients who wish to submit transactions to a shard chain may want to find the validators or collators of that shard chain, or at least a node that can relay the client's transactions to a collator. This can be achieved by searching for specific keys within the Green Meta DHT. Another important application of the Green Meta DHT is its ability to quickly populate the neighbor tables of new nodes by searching for random keys or addresses of new nodes. If a node uses proxies and tunnels for its inbound datagrams, it publishes the tunnel identifier and its entry point (e.g., IP address and UDP port) in the Green Meta DHT. Consequently, all nodes wishing to send datagrams to that node will first obtain the contact information from the DHT. The Green Meta DHT is a member of a series of distributed hash tables similar to Kademlia.
1. Overlay Network
An overlay (sub)network is a simple network within a large-scale network. Typically, only a subset of nodes in a larger network participates in the overlay subnet, and these nodes (physical or virtual) are connected through a limited number of "links" that are part of the overlay subnet. In this manner, if the encompassing network is represented as a graph (in the case of datagram networks, a complete graph can be used, such as ADNL), any node can easily communicate with any other node. The overlay network is a subgraph of this graph. In most cases, overlay networks are constructed using protocols based on the larger network. It can either use the same addresses as the larger network or employ custom addresses.
2. DHT Values
The key-value mappings of the Green Meta DHT are stored on the nodes of the DHT, essentially all members of the Green Meta network. For this purpose, any node in the Green Meta network, except for some very lightweight nodes, possesses at least one "semi-permanent address" that identifies it as a member of the Green Meta DHT, in addition to any number of transient and permanent abstract addresses described earlier. This semi-permanent address or DHT address should not change frequently; otherwise, other nodes would be unable to locate the keys they are looking for. If a node does not wish to expose its "real" identity, it generates a separate abstract address solely for participating in the DHT. However, this abstract address must be public as it is associated with the node's IP address and port.
3. Kademlia Distance
We now have 256-bit keys and 256-bit (semi-permanent) node addresses. We introduce the XOR distance or Kademlia distance dK on the set of 256-bit sequences, defined by the following equation:
Here, x⊕y denotes the bitwise XOR of two bit sequences of the same length. The Kademlia distance introduces a metric on the set of all 256-bit sequences, 2^256. Specifically, dK(x, y) = 0 if and only if x = y, dK(x, y) = dK(y, x), and dK(x, z) ≤ dK(x, y) + dK(y, z). Another important property is that there is only one point at any given distance from x: dK(x, y) = dK(x, y′) implies y = y′.
4. Kademlia Routing Table
Each node participating in the Kademlia-like DHT typically maintains a Kademlia routing table. In the case of the Green Meta DHT, it consists of n = 256 buckets numbered from 0 to n-1. The ith bucket will contain information about some known nodes (a fixed number of "good" nodes, possibly with some additional candidates) at Kademlia distances between 2i and 2i+1 − 1 from the node's address. This information includes their (semi-permanent) addresses, IP addresses and UDP ports, and some availability information such as the time and delay since the last ping. When a Kademlia node learns about any other Kademlia node as a result of some query, it includes it in the appropriate bucket of its routing table, initially as a candidate. Then, if some "good" nodes in that bucket fail (e.g., do not respond to ping queries for a long time), they can be replaced by some candidates. In this manner, the Kademlia routing table remains populated.
5. Kademlia Network Queries
Kademlia nodes typically support the following network queries:
• Ping: Checks the availability of a node.
• Store(key, value): Requests a node to store the value as the value for the key. For the Green Meta DHT, the store query is slightly more complex.
• Find_Node(key, l): Requests a node to return the l Kademlia closest known nodes (from its Kademlia routing table) to the key.
• Find_Value(key, l): Same as above, but returns only the value if the node knows the value corresponding to the key.
When any node wants to find the value of a key K, it first creates a set S of s' nodes, where s' is a small value like s' = 5, that are closest to K in terms of Kademlia distance among all known nodes (coming from the Kademlia routing table).
It then sends a FindValue query to each of them, and any mentioned nodes in their responses are included in S. If not done previously, the s' nodes from S (closest to K) are also sent FindValue queries, and this process continues until a value is found or the set S stops growing. This is a "beam search" on nodes closest to Kademlia distance to K. For setting the value of a certain key K, the same process is run for s' ≥ s using FindNode queries instead of FindValue to find the nodes closest to K. Subsequently, Store queries are sent to all these nodes. There are some less important details in the implementation of a Kademlia-like DHT (e.g., every node should look for closest nodes, e.g., every hour, and republish all stored keys via Store queries). We will ignore them for now.
6. Bootstrapping Kademlia Nodes
When a Kademlia node comes online, it initially populates its Kademlia routing table by finding its own address. During this process, it identifies the nodes that are closest to itself. It can download all known (key, value) pairs from them to populate a portion of its DHT.