2012. július 17., kedd

How NOT to write SLOW programs with python and numpy

How to write fast programs with numpy and python

At first, I wanted to write a post about "how to write fast programs with python and numpy". After writing a few test cases it quickly turned out that the fast, numpy versions of tests are not only fast because of numpy, but because they use numpy "the right way". Let's see what does it mean.



The problem

I was writing a program that read a sound file, subtracted the right channel from the left (or the other way around, it doesn't matter), and played the result back as a single mono audio stream.

The point was to generate karaoke track out of a "normal" one by stripping the vocals. Since the vocals are often present equally and in-phase on both channels, after the subtraction no vocals was heard. Most of the time. This method has a drawback of ruining much of the other sounds in a track, but at that point I could live with that.

The slow solution

# -*- coding: utf-8 -*-

import wave
import pyaudio
import numpy

f = wave.open('e.wav')

samples = numpy.fromstring(f.readframes(
    f.getnframes()), dtype=numpy.int16)

print "Removing voice..."
S = numpy.empty(len(samples)/2, dtype=numpy.int16)
for i in xrange(len(samples)/2):
    S[i] = samples[2*i]/2 - samples[2*i+1]/2

f.close()

print "Playing..."

p = pyaudio.PyAudio()

stream = p.open(
    output_device_index = 0, format = pyaudio.paInt16,
    channels = 1, rate = 44100, input = False,
    output = True, frames_per_buffer = 4410
)

for i in xrange(len(samples)/44100):
    stream.write(S[i*4410:(i+1)*4410].tostring())

stream.close()

The fast solution

The infinitely faster solution only differed in a few lines after the line that prints "Removing voice...":


print "Removing voice..."
S = samples[0::2]/2 - samples[1::2]/2

print "Playing..."

Digging deeper

Using the built-in compile() function, and the wonderful dis module, let's examine what's happening in both cases.

Slow case:

>>> c = compile("for i in xrange(len(samples)): S[i] = samples[2*i]/2 - samples[2*i+1]/2", "dummy", "exec")
>>> dis.dis(c)
  1           0 SETUP_LOOP              68 (to 71)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_NAME                1 (len)
              9 LOAD_NAME                2 (samples)
             12 CALL_FUNCTION            1
             15 CALL_FUNCTION            1
             18 GET_ITER            
        >>   19 FOR_ITER                48 (to 70)
             22 STORE_NAME               3 (i)
             25 LOAD_NAME                2 (samples)
             28 LOAD_CONST               0 (2)
             31 LOAD_NAME                3 (i)
             34 BINARY_MULTIPLY     
             35 BINARY_SUBSCR       
             36 LOAD_CONST               0 (2)
             39 BINARY_DIVIDE       
             40 LOAD_NAME                2 (samples)
             43 LOAD_CONST               0 (2)
             46 LOAD_NAME                3 (i)
             49 BINARY_MULTIPLY     
             50 LOAD_CONST               1 (1)
             53 BINARY_ADD          
             54 BINARY_SUBSCR       
             55 LOAD_CONST               0 (2)
             58 BINARY_DIVIDE       
             59 BINARY_SUBTRACT     
             60 LOAD_NAME                4 (S)
             63 LOAD_NAME                3 (i)
             66 STORE_SUBSCR        
             67 JUMP_ABSOLUTE           19
        >>   70 POP_BLOCK           
        >>   71 LOAD_CONST               2 (None)
             74 RETURN_VALUE        

Fast case:

>>> import dis
>>> c = compile("S = samples[2::0]/2 - samples[2::1]/2", "dummy", "exec")
>>> dis.dis(c)
  1           0 LOAD_NAME                0 (samples)
              3 LOAD_CONST               0 (2)
              6 LOAD_CONST               1 (None)
              9 LOAD_CONST               2 (0)
             12 BUILD_SLICE              3
             15 BINARY_SUBSCR       
             16 LOAD_CONST               0 (2)
             19 BINARY_DIVIDE       
             20 LOAD_NAME                0 (samples)
             23 LOAD_CONST               0 (2)
             26 LOAD_CONST               1 (None)
             29 LOAD_CONST               3 (1)
             32 BUILD_SLICE              3
             35 BINARY_SUBSCR       
             36 LOAD_CONST               0 (2)
             39 BINARY_DIVIDE       
             40 BINARY_SUBTRACT     
             41 STORE_NAME               1 (S)
             44 LOAD_CONST               1 (None)
             47 RETURN_VALUE  

There are differences that are quite obvious at first sight. The slow version has a loop that executes a bunch of operations like BINARY_ADD, BINARY_DIVIDE, etc. in each iteration, so there's O(n) python-related stuff to execute.

In the fast case, there are a fixed amount code to run, so we only have O(1) python overhead. The lion's share is written in C. Although it's easy to write slow code in C, numpy is certainly not slow.

The conclusion is that if you're programming with numpy, you're better avoid iterating over numpy arrays and doing stuff on them. Numpy has a great many built-in functions that cover everything came into my mind since I've started working with it. Reinventing the wheel is highly counterproductive.