2016. július 11., hétfő

CUDA accelerated linear algebra with Python and Theano

Theano is a Python module that enables one to construct mathematical expressions with matrices and/or tensors (basically more than 2 dimensional "matrices").


These expressions are than can be evaluated using Python, but Theano can translate the expression into a C program and compile it to binary. This way it can achieve respectable performance.


But wait, there's more! Theano can build the program so certain - or all - parts of it run on a GPU. Yes, on your video card. Modern cards can do calculations in a way that makes them especially fit for doing linear algebra and similar operations. In "similar" I mean the execution of simple operation on lots of data in parallel. A GPU-s can be several (tens or hundreds of) times better, than your CPU.


I'm going to show you how to exploit an NVIDIA GPU, using Python.
Dependencies

On Linux - Ubuntu Xenial (16.04 LTS) - every good story starts with an "apt-get install". Our case is no exception.

First, we'll need to add so:

$ echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1504/x86_64 /" > sudo tee /etc/apt/sources.list.d/cuda.list

$ sudo apt-get update

$ sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev nvidia-cuda-toolkit gfortran

It's a good idea to install the cuDNN libraries. These are routines for neural networks, but might be useful in other applications too. Theano can check cuDNN availability and make good use of it.

To get cuDNN, you need to register at NVIDIA, and download the package manually. Both can be done here: https://developer.nvidia.com/cudnn.

If all is well, you can download two .deb file for libcudnn5 and libcudnn5-dev (libcudnn5_5.0.5-1+cuda7.5_amd64.deb and libcudnn5-dev_5.0.5-1+cuda7.5_amd64.deb in my case, and at this time). Let's install them:

$ sudo dpkg -i libcudnn5_5.0.5-1+cuda7.5_amd64.deb

$ sudo dpkg -i libcudnn5-dev_5.0.5-1+cuda7.5_amd64.deb

You need to reboot your system to proceed. This is needed so the CUDA enabled drivers and libs will be loaded.

Let's install Theano itself:

$ sudo pip -I install Theano

I've included the -I option so if you already have Theano installed for some reason, it will be re-installed. Internet wisdom says this might solve compilation problems.

Theano depends on numpy and scipy, these will be installed to. To not clutter your system packages all up, it's recommended to use a virtualenv. I might, or might not re-write this tutorial to do that.

On Ubuntu 15.04, this concludes the installation. However on 16.04, Theano must be patched.

Edit  /usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/nvcc_compiler.py and add the following line between os.chdir(location) and p = subprocess.Popen(cmd, ...), near line 360:

cmd.append('-D_FORCE_INLINES')

This magic is needed for Theano to work with gcc 5.4.0, the default gcc on Ubuntu 16.04. Without this, you'll get an error message:

...
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
...


shortly after that Theano will tell you that "CUDA is installed, but device gpu is not available". Bummer.

Well, this monkey-patch fixes this. Just make sure you re-patch Theano if you upgrade it and get the error message again.

Great, you can run the test program at http://deeplearning.net/software/theano/tutorial/using_gpu.html to see if Theano can really use the gpu:

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python theanotest.py
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, cuDNN 5005)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.531866 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu


Great!

Happy computing!