How NOT to write SLOW programs with python and numpy
At first, I wanted to write a post about "how to write fast programs with python and numpy". After writing a few test cases it quickly turned out that the fast, numpy versions of tests are not only fast because of numpy, but because they use numpy "the right way". Let's see what does it mean.
The problem
I was writing a program that read a sound file, subtracted the right channel from the left (or the other way around, it doesn't matter), and played the result back as a single mono audio stream.
The point was to generate karaoke track out of a "normal" one by stripping the vocals. Since the vocals are often present equally and in-phase on both channels, after the subtraction no vocals was heard. Most of the time. This method has a drawback of ruining much of the other sounds in a track, but at that point I could live with that.
The slow solution
# -*- coding: utf-8 -*-
import wave
import pyaudio
import numpy
f = wave.open('e.wav')
samples = numpy.fromstring(f.readframes(
f.getnframes()), dtype=numpy.int16)
print "Removing voice..."
S = numpy.empty(len(samples)/2, dtype=numpy.int16)
for i in xrange(len(samples)/2):
S[i] = samples[2*i]/2 - samples[2*i+1]/2
f.close()
print "Playing..."
p = pyaudio.PyAudio()
stream = p.open(
output_device_index = 0, format = pyaudio.paInt16,
channels = 1, rate = 44100, input = False,
output = True, frames_per_buffer = 4410
)
for i in xrange(len(samples)/44100):
stream.write(S[i*4410:(i+1)*4410].tostring())
stream.close()
The fast solution
The infinitely faster solution only differed in a few lines after the line that prints "Removing voice...":
print "Removing voice..." S = samples[0::2]/2 - samples[1::2]/2 print "Playing..."
Digging deeper
Using the built-in compile() function, and the wonderful dis module, let's examine what's happening in both cases.
Slow case:
>>> c = compile("for i in xrange(len(samples)): S[i] = samples[2*i]/2 - samples[2*i+1]/2", "dummy", "exec")
>>> dis.dis(c)
1 0 SETUP_LOOP 68 (to 71)
3 LOAD_NAME 0 (xrange)
6 LOAD_NAME 1 (len)
9 LOAD_NAME 2 (samples)
12 CALL_FUNCTION 1
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 48 (to 70)
22 STORE_NAME 3 (i)
25 LOAD_NAME 2 (samples)
28 LOAD_CONST 0 (2)
31 LOAD_NAME 3 (i)
34 BINARY_MULTIPLY
35 BINARY_SUBSCR
36 LOAD_CONST 0 (2)
39 BINARY_DIVIDE
40 LOAD_NAME 2 (samples)
43 LOAD_CONST 0 (2)
46 LOAD_NAME 3 (i)
49 BINARY_MULTIPLY
50 LOAD_CONST 1 (1)
53 BINARY_ADD
54 BINARY_SUBSCR
55 LOAD_CONST 0 (2)
58 BINARY_DIVIDE
59 BINARY_SUBTRACT
60 LOAD_NAME 4 (S)
63 LOAD_NAME 3 (i)
66 STORE_SUBSCR
67 JUMP_ABSOLUTE 19
>> 70 POP_BLOCK
>> 71 LOAD_CONST 2 (None)
74 RETURN_VALUE
Fast case:
>>> import dis
>>> c = compile("S = samples[2::0]/2 - samples[2::1]/2", "dummy", "exec")
>>> dis.dis(c)
1 0 LOAD_NAME 0 (samples)
3 LOAD_CONST 0 (2)
6 LOAD_CONST 1 (None)
9 LOAD_CONST 2 (0)
12 BUILD_SLICE 3
15 BINARY_SUBSCR
16 LOAD_CONST 0 (2)
19 BINARY_DIVIDE
20 LOAD_NAME 0 (samples)
23 LOAD_CONST 0 (2)
26 LOAD_CONST 1 (None)
29 LOAD_CONST 3 (1)
32 BUILD_SLICE 3
35 BINARY_SUBSCR
36 LOAD_CONST 0 (2)
39 BINARY_DIVIDE
40 BINARY_SUBTRACT
41 STORE_NAME 1 (S)
44 LOAD_CONST 1 (None)
47 RETURN_VALUE
There are differences that are quite obvious at first sight. The slow version has a loop that executes a bunch of operations like BINARY_ADD, BINARY_DIVIDE, etc. in each iteration, so there's O(n) python-related stuff to execute.
In the fast case, there are a fixed amount code to run, so we only have O(1) python overhead. The lion's share is written in C. Although it's easy to write slow code in C, numpy is certainly not slow.
The conclusion is that if you're programming with numpy, you're better avoid iterating over numpy arrays and doing stuff on them. Numpy has a great many built-in functions that cover everything came into my mind since I've started working with it. Reinventing the wheel is highly counterproductive.
Megjegyzések
Megjegyzés küldése