Skip to content
Adrien Béraud edited this page Apr 17, 2018 · 58 revisions

Introduction

OpenDHT offers the following features:

  • Distributed shared key->value data-store.
  • IPv4 and IPv6 support.
  • Storage of arbitrary binary values up to 64 KiB. Keys are 160 bits long.
  • Different values under a same key can be distinguished by a key-unique 64 bits ID.
  • Every value also has a "value type". Each value type defines potentially complex storage, edition and expiration policies, allowing for instance different value expiration times. The set of supported "value types" is hardcoded and known by every node.

Note that OpenDHT is not compatible with the Mainline Bittorrent DHT (which only stores IP addresses).

An optional public-key cryptography layer on top of the DHT allows to put signed or encrypted data on the DHT. Signed values can then be edited, only by their owner (as verified cryptographically). Signed values retrieved from the DHT are automatically checked and will only be presented to the user if the signature verification succeeds.

The identity layer also publishes a (usually self-signed) certificate on the DHT that can be used to encrypt data for other nodes. Encrypted values are always signed, and the signature is part of the encrypted data, to hide the signer identity during transmission. For this reason, like other non-signed values, encrypted values can't be edited (because storage nodes can't check the identity of the author).

The OpenDHT API

OpenDHT uses the dht C++ namespace and is composed by a few major classes :

  • Infohash represents a key or a node ID, which are 20 bytes/160 bits bitstrings. Infohash instances can be compared with the comparison operator ==. The user can compute hashes from strings or binary data using static methods Infohash::get(), for instance Infohash::get("my_key") returns the SHA1 hash of the string "my_key".
  • Value represents a value potentially stored on the DHT. dht::Value is the result type of get operations and the argument type of put operations. A dht::Value can be easily built from any binary object, for instance using the constructor dht::Value::Value(const std::vector<uint8_t>&) or C-style with dht::Value::Value(const uint8_t* ptr, size_t len).
  • ValueType defines how data is stored on the DHT : preservation time, storage and edition constraints etc. Every stored Value have an associated value type. Note that ValueType usually have no impact on data serialization.
  • Value::Filter is a class inheriting from std::function<bool(Value&)>. It lets you define whether a value should be returned to the user. It also defines some useful methods like chain(Value::Filter&&) and chainOr(Value::Filter&&).
  • Query much like the filters, the Query lets you filter values, but also fields in each value. It pretty much defines an SQL SELECT, WHERE statements. In fact, one of it's constructors literally takes an SQL-ish fromatted string as parameter. Fields on which SELECT and WHERE operations are permitted are listed in Value::Fields. This is a subset of the fields a Value contains. The most meaningful distinction between the query and the filter is that the query is going to be executed by the remote nodes, giving you a better control over the traffic triggered by your usage of the library.
  • Dht is the class implementing the actual distributed hash table and providing basic operations. It requires an already-open UDP socket to send packets. When used alone, the Dht::periodic method must be called regularly and when a packet is received.
  • SecureDht is a child class of dht::Dht that exposes its APIs and will transparently check signed values (for get and listen operations), decrypt encrypted values (that we can decrypt), and provide additional methods to publish signed or encrypted values.
  • DhtRunner provides a thread-safe interface to SecureDht and manages UDP sockets. DhtRunner is what most applications implementing OpenDHT should use: the instance can be safely shared to be used independently by various components or threads, with networking managed transparently. DhtRunner can launch a dedicated thread or be integrated in the program main loop.

Callbacks

Get/listen operations take a callback argument of type GetCallback or GetCallbackSimple (both can be used):

using GetCallback = std::function<bool(const std::vector<std::shared_ptr<dht::Value>>& values)>;

using GetCallbackSimple = std::function<bool(const std::shared_ptr<dht::Value>& values)>;

Query operations take a callback argument of type QueryCallback, defined as:

using QueryCallback = std::function<bool(const std::vector<std::shared_ptr<dht::FieldValueIndex>>& fields)>;

Many operations also use an "operation completed" callback DoneCallback, defined as:

using DoneCallback = std::function<void(bool success)>

dht::Dht

This class provides the core API. Important methods are:

  • Constructor
Dht::Dht(int s, int s6, const InfoHash& id)

The constructor takes open IPv4, IPv6 UDP sockets used to send packets, and the node ID. At least one open socket must be provided for the Dht instance to be considered running. If a valid socket is not provided the value -1 should be passed instead.

Most apps implementing OpenDHT should use the class DhtRunner that will instantiate Dht, handle networking transparently and provide a thread-safe interface to the dht instance.

  • Get
void Dht::get(const InfoHash& key, GetCallback cb, DoneCallback donecb={}, Value::Filter f = {}, Query q = {});

Get initiates a search on the network for values associated with the provided key. Results will be provided during the search through the second argument cb. The callback will be called multiple times with new values when they are found on the network or until the callback returns false. An optional DoneCallback is called on operation completion (success or failure), after which no further callback is called.
Filter: optional predicate to pre-filter values before they are passed to the callback.
Query: optional query to filter values on remote nodes.

Example using Dht::get:

//node is a running instance of dht::Dht
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "Got value: " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }
);
  • Query
void Dht::query(const InfoHash& key, QueryCallback cb, DoneCallback done_cb = {}, Query&& q = {});

Query initiates a search on the network at the provided key for specific value fields. Results will be provided during the search through the second argument cb. The callback will be called multiple times with new values when they are found on the network or until the callback returns false. An optional DoneCallback is called on operation completion (success or failure), after which no further callback is called.
Filter: optional predicate to pre-filter values before they are passed to the callback.
Query: optional query to filter values on remote nodes.

Example using Dht::query:

//node is a running instance of dht::Dht
node.query(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::FieldValueIndex>>& fields) {
        for (const auto& i : fields)
            std::cout << "Got index: " << *i << std::endl;
        return true; // keep looking for field value index
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }
);
  • Put
void Dht::put(const InfoHash& key, const std::shared_ptr<Value>& value, DoneCallback cb = {});

Put initiates publication of a value on the network at the provided key. See Data serialization for more information about how to build a dht::Value instance. An optional DoneCallback is called on operation completion (success or failure).
If the value ID is dht::Value::INVALID_ID (0) when put is called, the Value::id field is set during the operation to identify the value.
A value remains on the network for its lifetime (default 10 minutes). Use put with the same key and value to refresh the expiration deadline. Values can't be edited by default (with the exception of signed values). If a value with the same value ID exists on the network, the new value is by default ignored by the network.

Example using Dht::put:

const char* my_data = "42 cats";

//node is a running instance of dht::Dht
node.put(
    dht::InfoHash::get("some_key"),
    dht::Value((const uint8_t*)my_data, std::strlen(my_data))
);
  • Listen
size_t Dht::listen(const InfoHash& key, GetCallback cb, Value::Filter q = {}, Query q = {});

Listen initiates a search on the network to find values associated with the provided key and will keep being informed of new values published at key, calling the provided callback function cb every time there is a new or changed value at key, until the callback cb returns false or the operation is canceled with bool cancelListen(const InfoHash& key, size_t token), where token is the return value from listen. Calling cancelListen has the same effect as returning false from the callback.

Example using Dht::listen:

auto key = dht::InfoHash::get("some_key");
auto token = node.listen(key,
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "Found value: " << *v << std::endl;
        return true; // keep listening
    }
);

// later
node.cancelListen(key, std::move(token));

Listen with type template for automatic deserialization:

struct Cloud {
    uint32_t altitude;
    double width, height;
    bool rainbow;
    MSGPACK_DEFINE_MAP(altitude, width, height, rainbow);
}
std::vector<Cloud> found_clouds;

auto key = dht::InfoHash::get("some_key");
auto token = node.listen<Cloud>(key, [](Cloud&& value) {
        // warning: called from another thread
        found_clouds.emplace_back(std::move(value));
    }
);

// later
node.cancelListen(key, token);

Filters and queries

Filters

A filter is an std::function<bool(const dht::Value&)> predicate to filter values.

auto coolValueFilter = [](const dht::Value& v) {
    return v.user_type == "cool" and v.data.size() < 64;
};
node.get("coolKey"),
    [](const std::shared_ptr<dht::Value>& value) {
        std::cout << "That's a cool value: " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Op went " << (success ? "cool" : "not cool") << std::endl;
    },
    filter);

As you can see, the Value::Filter class is really flexible. However, this filtering is only going to be processed on the local node upon receiving values in a response. What if you know that the storage you're interested in is hosting a high number of values and you don't want to trigger big traffic. Use queries!

Queries

An equivalent to the last example, but using queries is as follows:

Where w;
w.id(5); /* the same as Where w("WHERE id=5"); */
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "This value has passed through the remotes filters " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }, {}, w
);

All available fields are listed below:

Field
Id
ValueType
OwnerPk
UserType

Note: fields usage in string initialization is snake case!

A query can tell if it is satisfied by another query. For e.g.:

Query q1;
q1.where.id(5); // the whole value with id=5 will be sent

Query q2 {{"SELECT value_type"}};
// q2 the same as Query q("SELECT * WHERE value_type=10,user_type=foo_type");
q2.where.valueType(10).userType("foo_type");

Query q3("SELECT id WHERE id=5"); // only the id=5 will be sent

q1.isSatisfiedBy(q3); // false
q2.isSatisfiedBy(q1); // false
q3.isSatisfiedBy(q1); // true
q2.isSatisfiedBy(q3); // false

dht::SecureDht

This class extends dht::Dht, and provides the same API methods (get, put, listen). It adds a public-key cryptography layer on top of the DHT. A user-provided or generated Identity (RSA key pair) will be used for signing and decrypting.

Values returned to the user by ::get and ::listen are checked beforehand and filtered: signed values are dropped if their signature verification fails. Similarly, encrypted values that we can't decrypt are dropped, or provided decrypted to the user if we can.

The user can know if a value was encrypted by checking the recipient field of the Value (which should be our public key ID).

As a layer on top of Dht, SecureDht can also be used for plain values. Methods like get and put will behave the same as Dht for non-encrypted and non-signed values.

Additionally, SecureDht adds a few methods:

  • PutSigned
void putSigned(const InfoHash& hash, const std::shared_ptr<Value>& val, DoneCallback callback);
  • PutEncrypted
void putEncrypted(const InfoHash& hash, const InfoHash& to, std::shared_ptr<Value> val, DoneCallback callback);

dht::DhtRunner

DhtRunner provides a thread-safe access to the running DHT instance and exposes all methods from SecureDht. See more information here : Running a node in your program

Clone this wiki locally