Skip to content

Unified Backend API Refactoring

seancribbs edited this page May 12, 2011 · 1 revision

Overview / Rationale

riak-client was started before the binary Protocol Buffers (“protobuffs” or “PBC”) interface to Riak was available. Hence, all client operations at higher levels of abstraction (Bucket, RObject, MapReduce, etc) invoke HTTP-related methods directly on the client backend. There are also some incidental aspects of the library that use HTTP semantics.

In order to support both client APIs transparently, the means for accessing client API features needs to be pushed more into the backends, and any HTTP-specific semantics should be factored out.

Note: This refactoring was completed in version 0.9.0.

Unified Backend API

Each backend will need support the following methods (default values shown):

ping()
fetch_object(bucket, key, r=nil)
store_object(robject, returnbody=false, w=nil, dw=nil)
delete_object(bucket, key, rw=nil)
get_bucket_props(name)
set_bucket_props(name, props={})
list_keys(bucket, &block)
list_buckets()
mapred(mr, &block)

The HTTP backends will also support these methods:

link_walk(robject, *walk_specs)
stats()

The PBC backend(s) will also support these methods:

get_client_id()
set_client_id(id)
server_info()

Challenges & TODOs

Inconsistent Client APIs

PBC and HTTP have some mutually-exclusive capabilities. For API-specific calls, we might want to call that backend explicitly, while fielding generic calls to the “auto-selected” backend. Specific sub-challenges:

Link-walking

This is actually a MapReduce job under the covers, we could construct a series of jobs that emulates it, but then the round-trip to Ruby might be expensive.

Bucket properties

PBC only supports two bucket properties, n_val and allow_mult, while HTTP dynamically handles any properties using JSON format.

Streaming MapReduce

HTTP streaming MapReduce is done with multipart/mixed format. A streaming parser will have to be written.

Two connections?

Should we disable/throw exceptions on features that aren’t implemented by the preferred backend? Not all client applications will want to allow both APIs.

PBC connection management

HTTP connection management is handled by a higher-level client library, currently either Net::HTTP or curb. PBC is bare sockets and will need a pool for thread-safe or event-looped applications.

PBC Native APIs

PBC uses Google’s protobuf library for the heavy lifting of (de-)serialization. The implementation is mostly done for MRI 1.8/1.9, and will be simple for JRuby. Rubinius support might require FFI, which is another beast entirely. Automatic selection of the appropriate Native API and building it on installation is still up in the air.

Error conditions

HTTP and PBC report errors (and expected responses) very differently. The FailedRequest class needs to be refactored such that existing code will be minimally affected, probably through subclassing.

Base64 encoding

HTTP encodes the client ID and vclock with Base64, PBC uses a plain byte array.