Let's start by having a sample profile run with 1000 iterations with the program as it was last time, but keeping track of more function calls in the profile.
5441165 function calls in 20.484 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 494604 8.461 0.000 14.218 0.000 Tkinter.py:2135(_create) 11 3.839 0.349 20.315 1.847 mandel2g_cy.pyx:27(create_fractal) 494632 2.053 0.000 5.309 0.000 Tkinter.py:1046(_options) 494657 1.809 0.000 2.811 0.000 Tkinter.py:77(_cnfmerge) 494593 1.522 0.000 16.475 0.000 mandel2g_cy.pyx:21(draw_pixel) 989240 0.752 0.000 0.752 0.000 {method 'update' of 'dict' objects} 494593 0.736 0.000 14.953 0.000 Tkinter.py:2155(create_line) 989257 0.698 0.000 0.698 0.000 {_tkinter._flatten} 494632 0.315 0.000 0.315 0.000 {method 'items' of 'dict' objects} 1 0.158 0.158 0.158 0.158 {_tkinter.create} 494641 0.131 0.000 0.131 0.000 {callable} 37 0.009 0.000 0.009 0.000 {built-in method call} 11 0.001 0.000 20.317 1.847 viewer2b.py:20(draw_fractal) 12 0.000 0.000 0.000 0.000 viewer.py:60(info)I notice that there are many function calls: over 5 millions of them. While most of them appear to take very little time, they do add up in the end. It is time to adopt a smarter drawing strategy.
Currently, a "line" is potentially drawn for each pixel. If I look at a given fractal drawing, I can see that it could be drawn using "longer lines", when consecutive pixels are to be drawn with the same colour. I can easily implement this as follows.
def create_fractal(int canvas_width, int canvas_height, double min_x, double min_y, double pixel_size, int nb_iterations, canvas): cdef int x, y, start_y, end_y cdef double real, imag cdef bint start_line for x in range(0, canvas_width): real = min_x + x*pixel_size start_line = False for y in range(0, canvas_height): imag = min_y + y*pixel_size if mandel(real, imag, nb_iterations): if not start_line: start_line = True start_y = canvas_height - y else: if start_line: start_line = False end_y = canvas_height - y canvas.create_line(x, start_y, x, end_y, fill="black") if start_line: end_y = canvas_height - y canvas.create_line(x, start_y, x, end_y, fill="black")Note that I no longer need the function
draw_pixel()
.
The result is a reduction from 20 seconds down to 4 seconds:
79092 function calls in 3.959 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 3.552 0.323 3.815 0.347 mandel2h_cy.pyx:21(create_fractal) 7856 0.150 0.000 0.250 0.000 Tkinter.py:2135(_create) 1 0.131 0.131 0.131 0.131 {_tkinter.create} 7884 0.037 0.000 0.091 0.000 Tkinter.py:1046(_options) 7909 0.030 0.000 0.047 0.000 Tkinter.py:77(_cnfmerge) 7845 0.014 0.000 0.263 0.000 Tkinter.py:2155(create_line) 15761 0.014 0.000 0.014 0.000 {_tkinter._flatten} 15744 0.013 0.000 0.013 0.000 {method 'update' of 'dict' objects} 37 0.009 0.000 0.009 0.000 {built-in method call} 7884 0.005 0.000 0.005 0.000 {method 'items' of 'dict' objects} 7893 0.002 0.000 0.002 0.000 {callable} 11 0.001 0.000 3.818 0.347 viewer2b.py:20(draw_fractal) 12 0.000 0.000 0.000 0.000 viewer.py:60(info) 22 0.000 0.000 0.001 0.000 Tkinter.py:1172(_configure)And it is now again my own code in
create_fractal()
that appears
to be the limiting factor. Thinking back of when I increased the number
of iterations from 100 to 1000, thus only affecting the execution time
of mandel()
, it seemed like this might be a good place
to look at for possible time improvements. Let's recall what the code
looks like.
cdef inline bint mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i for i in range(0, max_iterations): z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real, 2*z_real*z_imag + imag ) if (z_real*z_real + z_imag*z_imag) >= 4: return False return (z_real*z_real + z_imag*z_imag) < 4I used a Pythonic tuple assignement to avoid the use of temporary variables. However, in a typical iteration, there will be 4 multiplications for the tuple re-assigment and two more for the "if" statement, for a total of 6. It is certainly possible to reduce the number of multiplications by using temporary variables, as follows:
cdef inline bint mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i cdef double zr_sq, zi_sq, z_cross for i in range(0, max_iterations): zr_sq = z_real*z_real zi_sq = z_imag*z_imag z_cross = 2*z_real*z_imag z_real = zr_sq - zi_sq + real z_imag = z_cross + imag if (zr_sq + zi_sq) >= 4: return False return (zr_sq + zi_sq) < 4So, there are now fewer multiplications to compute. Surely, this will speed up the code:
78982 function calls in 4.888 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 4.478 0.407 4.748 0.432 mandel2i_cy.pyx:26(create_fractal) 7845 0.153 0.000 0.256 0.000 Tkinter.py:2135(_create) 1 0.128 0.128 0.128 0.128 {_tkinter.create} 7873 0.040 0.000 0.095 0.000 Tkinter.py:1046(_options) 7898 0.031 0.000 0.048 0.000 Tkinter.py:77(_cnfmerge) 7834 0.014 0.000 0.270 0.000 Tkinter.py:2155(create_line) 15739 0.013 0.000 0.013 0.000 {_tkinter._flatten} 15722 0.013 0.000 0.013 0.000 {method 'update' of 'dict' objects} 37 0.009 0.000 0.009 0.000 {built-in method call} 7873 0.005 0.000 0.005 0.000 {method 'items' of 'dict' objects} 7882 0.002 0.000 0.002 0.000 {callable} 11 0.001 0.000 4.750 0.432 viewer2b.py:20(draw_fractal) 12 0.000 0.000 0.000 0.000 viewer.py:60(info) 4 0.000 0.000 0.000 0.000 {posix.stat}Alas, that is not the case, as the previous profiling run was slightly below 4 seconds. [Note that I did run each profiling test at least three times to prevent any anomalous result.] Apparently my intuition is not a very good guide when it comes to predicting how cython will be able to optimize a given function.
So far, the pictures the program has been able to produce have only been in black and white. It is time to spruce things up and add colour. To do this, we will need to make three general changes:
-
We will modify
mandel()
so that it returns the number of iterations required to evaluate that a given point does not belong to the set; if it does belong, we will return -1. -
We will create a colour palette as a Python list. For a given number
of iterations required by
mandel()
, we will pick a given colour, cycling through the colours from the palette. - We will need to change our line drawing method so that we keep track of the colour (number of iteration) rather than simply whether or not the point is in the set ("black") or not.
# mandel3cy.pyx # cython: profile=True import cython def make_palette(): '''sample coloring scheme for the fractal - feel free to experiment''' colours = [] for i in range(0, 25): colours.append('#%02x%02x%02x' % (i*10, i*8, 50 + i*8)) for i in range(25, 5, -1): colours.append('#%02x%02x%02x' % (50 + i*8, 150+i*2, i*10)) for i in range(10, 2, -1): colours.append('#00%02x30' % (i*15)) return colours colours = make_palette() cdef int nb_colours = len(colours) @cython.profile(False) cdef inline int mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i for i in range(0, max_iterations): z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real, 2*z_real*z_imag + imag ) if (z_real*z_real + z_imag*z_imag) >= 4: return i return -1 def create_fractal(int canvas_width, int canvas_height, double min_x, double min_y, double pixel_size, int nb_iterations, canvas): global colours, nb_colours cdef int x, y, start_y, end_y, current_colour, new_colour cdef double real, imag for x in range(0, canvas_width): real = min_x + x*pixel_size start_y = canvas_height current_colour = mandel(real, min_y, nb_iterations) for y in range(1, canvas_height): imag = min_y + y*pixel_size new_colour = mandel(real, imag, nb_iterations) if new_colour != current_colour: if current_colour == -1: canvas.create_line(x, start_y, x, canvas_height-y, fill="black") else: canvas.create_line(x, start_y, x, canvas_height-y, fill=colours[current_colour%nb_colours]) current_colour = new_colour start_y = canvas_height - y if current_colour == -1: canvas.create_line(x, start_y, x, 0, fill="black") else: canvas.create_line(x, start_y, x, 0, fill=colours[current_colour%nb_colours])If we profile this code, we find out that it takes about three times as long to generate a colour picture than it did to generate a black and white one - at least, for the starting configuration...
2370682 function calls in 12.638 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 4.682 0.426 12.184 1.108 mandel3_cy.pyx:36(create_fractal) 237015 4.310 0.000 7.084 0.000 Tkinter.py:2135(_create) 237043 1.019 0.000 2.575 0.000 Tkinter.py:1046(_options) 237068 0.877 0.000 1.353 0.000 Tkinter.py:77(_cnfmerge) 1 0.443 0.443 0.443 0.443 {_tkinter.create} 237004 0.418 0.000 7.502 0.000 Tkinter.py:2155(create_line) 474062 0.361 0.000 0.361 0.000 {method 'update' of 'dict' objects} 474079 0.313 0.000 0.313 0.000 {_tkinter._flatten} 237043 0.143 0.000 0.143 0.000 {method 'items' of 'dict' objects} 237052 0.061 0.000 0.061 0.000 {callable} 37 0.009 0.000 0.009 0.000 {built-in method call} 11 0.000 0.000 12.186 1.108 viewer3.py:20(draw_fractal) 12 0.000 0.000 0.000 0.000 viewer.py:60(info) 4 0.000 0.000 0.000 0.000 {posix.stat}We also generate some nice pictures! First, using only 100 iterations.
Notice how the Mandelbrot set boundary seems rather smooth ... this is clearly a sign that we might be missing something. Increasing the number of iterations to 1000 reveals a different picture.
We see much more details and many points that were wrongly identified as being part of the Mandelbrot sets are correctly excluded (and colored!). What happens if we look at a region that contains more points inside the Mandelbrot set (thus requiring more iterations) and increase the number of iterations to 1500 (as I found that 1000 was not enough in this region).
Unfortunately, the profiling and the timing information displayed does not tell the entire story. In practice, I found that it would take many more seconds (sometimes more than 10) for the canvas to be updated than the timing information given for each of the three pictures displayed above. Something is happening behind the scene when the picture is updated on the screen and which is not being recorded by my simple timing method.
For comparison, I had a look at Aptus, a program that uses a hand-coded C extension for speed. Actually, I had tried Aptus a few years ago, when Ned Batchelder first mentioned it, and tried it again for fun as I was working on my own version. Aptus can produce really nice pictures, really fast. Here's an example that was produced, according to the timing given by Aptus itself in 0.25 seconds.
Note that the timing given by Aptus seems to be truly representative, unlike the timing for my program. I should mention that, in addition to using a hand-crafted C extension, Aptus uses wxPython instead of Tkinter, and it also uses PIL and numpy, both of which are known for their speed. It might be possible that using PIL and numpy with my program would improve the speed significantly. However, all three libraries are not part of the standard library and so do not meet the constraints I had decided upon at the beginning.
This concludes this profiling experiment ... at least for now. I should emphasize that the goal of these posts was to record a profiling experiment using cython. I do not pretend that this code was especially hand-crafted for speed ... even though I did try some simple changes to improve its speed. I may revisit this application at a later time, especially if someone more experienced can point out ways to increase the speed significantly, preferably while staying within the constraints I set up at the beginning: other than cython, use only modules from the standard library. However, I would be interested if anyone adapts this code to use PIL and/or numpy in a straightforward way and, in doing so, increases the speed significantly.
Nice post series, thanks for the quick dive into cython. Seems nice all in all. Acknowledging that hand-written algos might be more efficient, yet the time you needed to get it going that fast was incredibly short. A major +1 for cython imho.
ReplyDeleteIt might not be suitable for any apps but surely at least for some where simple stuff needs a heavy speedup
Just from examining the generated C code, the following mandel function looks tighter because it avoid dipping into lots of CPython stuff.
ReplyDelete@cython.boundscheck(False)
@cython.wraparound(False)
@cython.profile(False)
cdef inline int mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
cdef double threshold = 4.
for i from 0 <= i < max_iterations:
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= threshold:
return i
return -1
You could likely do something similar for the rest. There are good hints for speeding up code with cython in the SciPy 2009 proceedings:
http://conference.scipy.org/proceedings/SciPy2009/
Ryan:
ReplyDeleteThanks for the suggestion and the link. I tried to make this change with the latest version but did not notice any speedup. I will read the papers you linked to and see if I can make some improvements.
You might be able to speed up the drawing quite a bit by getting rid of the drawing calls. Instead, you can use an array to store the image, and alter it pixel by pixel with Cython.
ReplyDeleteI'd use a numpy array to do it, but if you don't want to use any external libs, you can use cython to create a regular old C array, and use that to store your image.
I'm not sure how to get an array into an image with Tk to display it, but I'll bet it can be done.
-Chris Barker
OK -- I was thinking about it, so I did it. I wrote a version that uses a numpy array to create and store the image, and then, since I'm familiar with wx, I displayed it in a wx app.
ReplyDeleteI wrote a Wiki page about it in teh Cython Wiki:
http://wiki.cython.org/examples/mandelbrot
I never got your TK version working on my machine, so I don't know how it compares, but I think I got the image drawing bit down to totally negligible.
-Chris Barker
Chris: Thanks for doing this. I saw your post on the cython users list. Unfortunately, I can't see the cython wiki - it is denying any request to see it. Hopefully it won't do the same to others (if so, look up the cython-users list).
ReplyDelete