2012. január 24., kedd

Thrift / c_glib and Cassandra

Thrift

Thrift is apache's tool. It can generate client / server codes based on a file written in it's own descriptor language.

At first I was thrilled how easy it'll be to write a Cassandra client with it: "you just have to generate the C files, #include them, call a few functions and it's done".

Yeah. Like anything in the world works like that. And this particular thing is no exception.



Thrift comes with a documentation that is... wait! It doesn't really comes with any documentation at all. The stuff that's in the package and / or scattered on the Net in the form of blog posts and bug reports is outdated and only can be used to prevent the enemy from using this great weapon.

My last expression isn't a sarcastic one, thrift would be great if I could wield it correctly.

Cassandra

Cassandra is noSQL database. It doesn't really matter now how it works exactly, it's enough if you know that one can store and fetch data with it, and can connect to it over the network.

Coincidentally, it uses Thrift to describe it's interface, so people of different sex, religion and programming language can generate their own interface libraries. First, I tried to put together a client in C++ based on this article. Cassandra, Thirft and gcc evolved somewhat since 2010, and / or I might be using an exotic combination of software (Ubuntu Oneiric), or the Gods might be angry at me for some strange reason,  or I may be simply too dumb to follow a bit outdated tutorial solving a few problems along the way; anyway I could not get the code compiled.

C and GLib

I have much more experience with C than C++, so I decided to throw the C++ code away my co-worker has been writing, and start from scratch with C. I was prepared to read and interpret the Thrift interface descriptor file with my already melting brain, and write the C code myself.

As I started to work I discovered that Thrift CAN generate C interface libraries. It is a bit incomplete in 0.8.0, since it does not generate the server skeleton file; it didn't really matter for me.

I cd'd into Cassandra's interface directory and issued thrift -gen c_glib cassandra.thrift command, just to find the generated sources under gen-c_glib directory.

The sources was clean and readable despite the fact that they were auto-generated.

I even found a small example, and it compiled OK.

I had to replace a few lines, to work with Cassandra instead of the calculator example. The following is the re-write for Cassandra, with connecting to a server on localhost on the default port, and executing a query that fetches a value from keyspace "example", column family "examplecf", with key "foo", from the "bar" column. Error handling might be incomplete.

Warning: I don't know a thing a about glib, and I suspect that the code below is NOT the way to use it. It works here though. I maybe will improve it in the distant future.

#include 
#include 

#include "gen-c_glib/cassandra.h"
#include "protocol/thrift_protocol.h"
#include "protocol/thrift_binary_protocol.h"
#include "transport/thrift_framed_transport.h"
#include "transport/thrift_transport.h"
#include "transport/thrift_socket.h"

#include "gen-c_glib/cassandra.h"

int main(int argc, char** argv) {
  ThriftSocket *tsocket;
  ThriftTransport *transport;
  ThriftProtocol *protocol;
  CassandraClient *client;
  CassandraIf *service;
  InvalidRequestException *ire = NULL;
  NotFoundException *nfe = NULL;
  UnavailableException *ue = NULL;
  TimedOutException *te = NULL;
  ColumnOrSuperColumn *result;
  GError *error = NULL;

  GByteArray column = {
    .data = (unsigned char *)"bar",
    .len  = 3
  };

  ColumnPath *cp = NULL;
  
  GByteArray key = {
    .data = (unsigned char *)"foo",
    .len  = 3
  };
 
  g_type_init();

  tsocket = THRIFT_SOCKET(
    g_object_new(
      THRIFT_TYPE_SOCKET, "hostname",
      "localhost", "port", 9160, 0
    )
  );
  transport = THRIFT_TRANSPORT(
    g_object_new(
      THRIFT_TYPE_FRAMED_TRANSPORT, "transport", tsocket, 0
    )
  );
  protocol = THRIFT_PROTOCOL(
    g_object_new(
      THRIFT_TYPE_BINARY_PROTOCOL, "transport", transport, 0
    )
  );
  client = CASSANDRA_CLIENT(
    g_object_new(
      TYPE_CASSANDRA_CLIENT, "input_protocol",
      protocol, "output_protocol", protocol, 0
    )
  );
  service = CASSANDRA_IF(client);

  if (
    !thrift_transport_open(transport, 0) ||
    !thrift_transport_is_open(transport)
  ) {
          printf("Could not connect to server\n");
          return 1;
  }
  printf("Connected to cassandra at localhost:9160\n");

  cassandra_client_set_keyspace(
    service, "example", &ire, &error
  );
  if (ire) {
    printf("Invalid request exception: %s\n", ire->why);
    return 1;
  }
  if (error) {
    printf("An error has occured\n");
    return 1;
  }
  printf("Selected keyspace example\n");

  cp = g_object_new(TYPE_COLUMN_PATH, 0);
  cp->column_family = "examplecf";
  cp->column = &column;
  cp->__isset_column = TRUE;

  cassandra_client_get(
    service, &result, &key, cp, CONSISTENCY_LEVEL_QUORUM,
    &ire, &nfe, &ue, &te, &error
  );

  if (ire) {
    printf("Invalid request exception: %s\n", ire->why);
    return 1;
  }
  if (nfe) {
    printf("Row not found\n");
    return 1;
  }
  if (ue) {
    printf("Unavailable exception\n");
    return 1;
  }
  if (te) {
    printf("Timed out exception\n");
    return 1;
  }
  if (error) {
    printf("An error has occured\n");
    return 1;
  }
  
  printf(
    "The result is %s\n",
    strndup(
      (char *)result->column->value->data,
      result->column->value->len
    )
  );

  /* Don't forget to free resources if
   * your program runs longer than this */

  return 0;
}



I compiled thecode with the following commands:

gcc -c `pkg-config --cflags thrift_c_glib` test.c -o test.o

gcc -c `pkg-config --cflags thrift_c_glib`\
gen-c_glib/cassandra.c -o cassandra.o

gcc -c `pkg-config --cflags thrift_c_glib`\
 gen-c_glib/cassandra_types.c -o cassandra_types.o

libtool --tag=CC --mode=link gcc `pkg-config --libs thrift_c_glib` -o test test.o cassandra.o cassandra_types.o

The last command is even more cryptic then the others, so here's the explanation:

The pkg-config command is used to query for compilation flags of program that use installed libraries. It's the library's make install script's responsibility to install this info. If a package is installed from the repository of your distribution, this information is installed by the package manager. The rest of the command line should be clear.

UPDATE:

Note that the key, column name and value does NOT contain the trailing zero byte.