2012. január 19., csütörtök

Interfacing Python with shared libraries (so/dll)

Python is a great language but it's far from being a complete tool. It do have got a lot of libraries, many of them already built into the standard distribution. However as you do more and more work with it, chances are that you'll have to do something that does not have a canned solution ala Python. This usually means that you have to interface with C libraries or you even have to write the C code yourself.

It's possible and even not too hard to write C libraries that can be loaded by Python and expose Python objects and functions. If you want to rewrite existing Python code to be faster, it's probably the way to go. (I strongly advise you to think twice before you act, such work won't be an easy thing to do.) So if you have to write your own Python things in C, read this. If you haven't done much C before, this gonna hurt. Fortunately there is one particular case when you don't have to write C code at all, even it smells like low-level coding: the case when you have to call functions from an existing shared library.

Actually, this is how I met ctypes. Ctypes is the exact thing you need if you want to call foreign functions from Python. If you read the first few examples, you feel like you've just did import antigravity, but be aware: there are a few catches.

The cool things

You can call any function from any shared library without having to write C code. You can even work with unions and structs, even with recursive ones, and can allocate buffers, work with pointers and array. My personal favorite is array handling. The following is an example of creating a C array in Python.

import ctypes

intarray_type = ctypes.c_int * 3 # Elegant :)

intarray = intarray_type()

intarray[0] = 1
intarray[1] = 2
intarray[2] = 3

print intarray[2]

...or to do the assignment with a one-liner:

import ctypes

intarray = (ctypes.c_int * 3)(1, 2, 3)

print intarray[2]

But, as I mentioned, beside the coolness, there are some catches. One of the nastiest one is that in python, strings are immutable. In C, they are just pointers to an address. When you need a buffer (to be used with libc read/write for example), that is mutable, you must create it with ctypes.create_string_buffer().

There is another catch with string buffers. If you want to point to a buffer with binary data, use the POINTER(c_char) type instead of c_char_p. This is because c_char_p is a "smarter" class, and tries to treat the buffer as a null terminated string. To be safe, use POINTER(c_char) with every pre-allocated buffer, even if it's just a string. Python's string handler code might want to do magic with it, wich results in a segfault.