How NOT to write SLOW programs with python and numpy
At first, I wanted to write a post about "how to write fast programs with python and numpy". After writing a few test cases it quickly turned out that the fast, numpy versions of tests are not only fast because of numpy, but because they use numpy "the right way". Let's see what does it mean.
The problem
I was writing a program that read a sound file, subtracted the right channel from the left (or the other way around, it doesn't matter), and played the result back as a single mono audio stream.
The point was to generate karaoke track out of a "normal" one by stripping the vocals. Since the vocals are often present equally and in-phase on both channels, after the subtraction no vocals was heard. Most of the time. This method has a drawback of ruining much of the other sounds in a track, but at that point I could live with that.
The slow solution
# -*- coding: utf-8 -*- import wave import pyaudio import numpy f = wave.open('e.wav') samples = numpy.fromstring(f.readframes( f.getnframes()), dtype=numpy.int16) print "Removing voice..." S = numpy.empty(len(samples)/2, dtype=numpy.int16) for i in xrange(len(samples)/2): S[i] = samples[2*i]/2 - samples[2*i+1]/2 f.close() print "Playing..." p = pyaudio.PyAudio() stream = p.open( output_device_index = 0, format = pyaudio.paInt16, channels = 1, rate = 44100, input = False, output = True, frames_per_buffer = 4410 ) for i in xrange(len(samples)/44100): stream.write(S[i*4410:(i+1)*4410].tostring()) stream.close()
The fast solution
The infinitely faster solution only differed in a few lines after the line that prints "Removing voice...":
print "Removing voice..." S = samples[0::2]/2 - samples[1::2]/2 print "Playing..."
Digging deeper
Using the built-in compile() function, and the wonderful dis module, let's examine what's happening in both cases.
Slow case:
>>> c = compile("for i in xrange(len(samples)): S[i] = samples[2*i]/2 - samples[2*i+1]/2", "dummy", "exec") >>> dis.dis(c) 1 0 SETUP_LOOP 68 (to 71) 3 LOAD_NAME 0 (xrange) 6 LOAD_NAME 1 (len) 9 LOAD_NAME 2 (samples) 12 CALL_FUNCTION 1 15 CALL_FUNCTION 1 18 GET_ITER >> 19 FOR_ITER 48 (to 70) 22 STORE_NAME 3 (i) 25 LOAD_NAME 2 (samples) 28 LOAD_CONST 0 (2) 31 LOAD_NAME 3 (i) 34 BINARY_MULTIPLY 35 BINARY_SUBSCR 36 LOAD_CONST 0 (2) 39 BINARY_DIVIDE 40 LOAD_NAME 2 (samples) 43 LOAD_CONST 0 (2) 46 LOAD_NAME 3 (i) 49 BINARY_MULTIPLY 50 LOAD_CONST 1 (1) 53 BINARY_ADD 54 BINARY_SUBSCR 55 LOAD_CONST 0 (2) 58 BINARY_DIVIDE 59 BINARY_SUBTRACT 60 LOAD_NAME 4 (S) 63 LOAD_NAME 3 (i) 66 STORE_SUBSCR 67 JUMP_ABSOLUTE 19 >> 70 POP_BLOCK >> 71 LOAD_CONST 2 (None) 74 RETURN_VALUE
Fast case:
>>> import dis >>> c = compile("S = samples[2::0]/2 - samples[2::1]/2", "dummy", "exec") >>> dis.dis(c) 1 0 LOAD_NAME 0 (samples) 3 LOAD_CONST 0 (2) 6 LOAD_CONST 1 (None) 9 LOAD_CONST 2 (0) 12 BUILD_SLICE 3 15 BINARY_SUBSCR 16 LOAD_CONST 0 (2) 19 BINARY_DIVIDE 20 LOAD_NAME 0 (samples) 23 LOAD_CONST 0 (2) 26 LOAD_CONST 1 (None) 29 LOAD_CONST 3 (1) 32 BUILD_SLICE 3 35 BINARY_SUBSCR 36 LOAD_CONST 0 (2) 39 BINARY_DIVIDE 40 BINARY_SUBTRACT 41 STORE_NAME 1 (S) 44 LOAD_CONST 1 (None) 47 RETURN_VALUE
There are differences that are quite obvious at first sight. The slow version has a loop that executes a bunch of operations like BINARY_ADD, BINARY_DIVIDE, etc. in each iteration, so there's O(n) python-related stuff to execute.
In the fast case, there are a fixed amount code to run, so we only have O(1) python overhead. The lion's share is written in C. Although it's easy to write slow code in C, numpy is certainly not slow.
The conclusion is that if you're programming with numpy, you're better avoid iterating over numpy arrays and doing stuff on them. Numpy has a great many built-in functions that cover everything came into my mind since I've started working with it. Reinventing the wheel is highly counterproductive.
Megjegyzések
Megjegyzés küldése