`mandel()`

by making changes in the Python code.
While I had some success in doing so, it was clearly not enough
for my purpose. As a result, I will now try to use cython. Before
I do this, I note again the result from the last profiling run,
limiting the information to the 5 longest-running functions or methods.3673150 function calls in 84.807 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 3168000 73.210 0.000 73.210 0.000 mandel1c.py:7(mandel) 11 10.855 0.987 84.204 7.655 viewer1.py:23(draw_fractal) 1 0.593 0.593 0.593 0.593 {_tkinter.create} 504530 0.137 0.000 0.137 0.000 viewer1.py:17(draw_pixel) 37 0.009 0.000 0.009 0.000 {built-in method call}

The goal of cython could be described as providing an easy way to convert a Python module into a C extension. This is what I will do. [There are other ways to work with cython extensions than what I use here; for more information, please consult the cython web site.]

**Note that I am NOT a cython expert; this is only the first project for which I use cython.**While I am not interested in creating an application for distribution, and hence do not use the setup method for cython, it is quite possible that there are better ways to use cython than what I explore here.

I first start by taking my existing module and copying it into a new file, with a ".pyx" extension instead of the traditional ".py".

# mandel2cy.pyx # cython: profile=True def mandel(c, max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' z = 0 for i in range(0, max_iterations): z = z**2 + c if abs(z) >= 4: return False return abs(z) < 2

Note that I have removed the equivalence between

`range`

and `xrange`

. The reason I have done this
is because with `xrange`

present like this in the file
results in a compilation
error when running cython with Python 3.1. Furthermore, as will
be seen later, it is not really needed even for Python 2.x when using
cython properly.
I have also included a commented line stating that 'profile' was equal to True; this is a cython directive that will enable the Python profiler to also include cython functions in its tally.

In order to import this module, I also need to modify the viewer to import the cython module. Here is the new version.

# viewer2.py import pyximport pyximport.install() from mandel2_cy import mandel from viewer import Viewer import time import sys if sys.version_info < (3,): import Tkinter as tk range = xrange else: import tkinter as tk class FancyViewer(Viewer): '''Application to display fractals''' def draw_pixel(self, x, y): '''Simulates drawing a given pixel in black by drawing a black line of length equal to one pixel.''' return #self.canvas.create_line(x, y, x+1, y, fill="black") def draw_fractal(self): '''draws a fractal on the canvas''' self.calculating = True begin = time.time() # clear the canvas self.canvas.create_rectangle(0, 0, self.canvas_width, self.canvas_height, fill="white") for x in range(0, self.canvas_width): real = self.min_x + x*self.pixel_size for y in range(0, self.canvas_height): imag = self.min_y + y*self.pixel_size c = complex(real, imag) if mandel(c, self.nb_iterations): self.draw_pixel(x, self.canvas_height - y) self.status.config(text="Time required = %.2f s [%s iterations] %s" %( (time.time() - begin), self.nb_iterations, self.zoom_info)) self.status2.config(text=self.info()) self.calculating = False if __name__ == "__main__": root = tk.Tk() app = FancyViewer(root) root.mainloop()

Other than the top few lines, nothing has changed. Time to run the profiler with this new version.

6841793 function calls in 50.145 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 3168000 35.913 0.000 35.913 0.000 mandel2_cy.pyx:4(mandel) 11 10.670 0.970 48.754 4.432 viewer2.py:26(draw_fractal) 3168000 2.001 0.000 37.914 0.000 {mandel2_cy.mandel} 1 1.356 1.356 1.356 1.356 {_tkinter.create} 505173 0.167 0.000 0.167 0.000 viewer2.py:20(draw_pixel)

A reduction from 85 to 50 seconds; cython must be doing something right! Note that the calls to

`abs()`

have been eliminated
by using cython. All I did is import the module via Python without
making any other change to the code.
Note also that

`mandel`

appears twice: once (the longest running) as
the function defined on line 8 of mandel2_cy.pyx, and once as
a object belonging to the module mandel2_cy. I will come back to this
later but, for now, I will do some changes to help cython do even better.
As mentioned before, cython is a tool to help create C extensions. One of the differences between C and Python is that variables have a declared type in C. If one tells cython about what type a given variable is, cython can often use that information to make the code run faster. As an example, I know that two of the variables are of type integers which is a native C type; I can add this information as follows.

# mandel2a_cy.pyx # cython: profile=True def mandel(c, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef int i z = 0 for i in range(0, max_iterations): z = z**2 + c if abs(z) >= 2: return False return abs(z) < 2

Running the profiler with this change yields the following:

6841793 function calls in 39.860 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 3168000 27.431 0.000 27.431 0.000 mandel2a_cy.pyx:4(mandel) 11 9.869 0.897 39.339 3.576 viewer2.py:26(draw_fractal) 3168000 1.906 0.000 29.337 0.000 {mandel2a_cy.mandel} 1 0.511 0.511 0.511 0.511 {_tkinter.create} 505173 0.131 0.000 0.131 0.000 viewer2.py:20(draw_pixel)

Another significant time reduction, this time of the order of 20%. And we didn't tell cython that "z" and "c" are complex yet.

Actually, C does not have a complex data type. So, I can choose one of two strategies:

- I can change the code so that I deal only with real numbers, by working myself how to multiply and add complex numbers.
- I can use some special cython technique to extract all the relevant information about the Python built-in complex data type without changing the code inside the function (other than adding some type declaration).

# mandel2b_cy.pyx # cython: profile=True cdef extern from "complexobject.h": struct Py_complex: double real double imag ctypedef class __builtin__.complex [object PyComplexObject]: cdef Py_complex cval def mandel(complex c, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef int i cdef complex z z = 0. + 0.j for i in range(0, max_iterations): z = z**2 + c if abs(z) >= 2: return False return abs(z) < 2

The timing results are the following:

6841793 function calls in 38.424 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 3168000 26.771 0.000 26.771 0.000 mandel2b_cy.pyx:14(mandel) 11 9.435 0.858 38.209 3.474 viewer2.py:26(draw_fractal) 3168000 1.865 0.000 28.636 0.000 {mandel2b_cy.mandel} 1 0.205 0.205 0.205 0.205 {_tkinter.create} 505173 0.136 0.000 0.136 0.000 viewer2.py:20(draw_pixel)

The time difference between this run and the previous one is within the variation I observe from one profiling run to the next (using exactly the same program). Therefore, I conclude that this latest attempt didn't speed up the code. It is possible that I have overlooked something to ensure that cython could make use of the information about the complex datatype more efficiently ... It seems like I need a different strategy. I will resort to doing the complex algebra myself, and work only with real numbers. Here's the modified code for the mandel module.

# mandel2c_cy.pyx # cython: profile=True def mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i for i in range(0, max_iterations): z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real, 2*z_real*z_imag + imag ) if (z_real*z_real + z_imag*z_imag) >= 4: return False return (z_real*z_real + z_imag*z_imag) < 4

I also change the call within

`draw_fractal()`

so that
I don't use complex variables. The result is extremely encouraging:
6841793 function calls in 7.205 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 4.379 0.398 7.066 0.642 viewer2a.py:26(draw_fractal) 3168000 1.557 0.000 2.570 0.000 {mandel2c_cy.mandel} 3168000 1.013 0.000 1.013 0.000 mandel2c_cy.pyx:4(mandel) 1 0.130 0.130 0.130 0.130 {_tkinter.create} 505173 0.114 0.000 0.114 0.000 viewer2a.py:20(draw_pixel)

This total execution time has been reduced from 38 to 7 seconds.

`mandel()`

is no longer the largest contributor to the
overall execution time; `draw_fractal()`

is. However,
the program is still a bit too slow: without actually doing any drawing,
it takes approximately 0.6 seconds to generate one fractal image.
However, I can do better. Looking at the code, I notice that
`draw_fractal()`

contains two embedded for loops, resulting to
all those calls to
`mandel()`

. Remember how telling cython about integer types
used in loops sped up the code? This suggest that perhaps I should
do something similar and move some
of the code of `draw_fractal()`

to the cython module.
Here's a modified viewer module.
# viewer2b.py import pyximport pyximport.install() from mandel2d_cy import create_fractal from viewer import Viewer import time import sys if sys.version_info < (3,): import Tkinter as tk range = xrange else: import tkinter as tk class FancyViewer(Viewer): '''Application to display fractals''' def draw_fractal(self): '''draws a fractal on the canvas''' self.calculating = True begin = time.time() # clear the canvas self.canvas.create_rectangle(0, 0, self.canvas_width, self.canvas_height, fill="white") create_fractal(self.canvas_width, self.canvas_height, self.min_x, self.min_y, self.pixel_size, self.nb_iterations, self.canvas) self.status.config(text="Time required = %.2f s [%s iterations] %s" %( (time.time() - begin), self.nb_iterations, self.zoom_info)) self.status2.config(text=self.info()) self.calculating = False if __name__ == "__main__": root = tk.Tk() app = FancyViewer(root) root.mainloop()

And here is the new cython module, without any additional type declaration.

# mandel2d_cy.pyx # cython: profile=True def mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i for i in range(0, max_iterations): z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real, 2*z_real*z_imag + imag ) if (z_real*z_real + z_imag*z_imag) >= 4: return False return (z_real*z_real + z_imag*z_imag) < 4 def draw_pixel(x, y, canvas): '''Simulates drawing a given pixel in black by drawing a black line of length equal to one pixel.''' return #canvas.create_line(x, y, x+1, y, fill="black") def create_fractal(canvas_width, canvas_height, min_x, min_y, pixel_size, nb_iterations, canvas): for x in range(0, canvas_width): real = min_x + x*pixel_size for y in range(0, canvas_height): imag = min_y + y*pixel_size if mandel(real, imag, nb_iterations): draw_pixel(x, canvas_height - y, canvas)

The profiling result is as follows:

3673815 function calls in 3.873 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 2.632 0.239 3.706 0.337 mandel2d_cy.pyx:24(create_fractal) 3168000 1.002 0.000 1.002 0.000 mandel2d_cy.pyx:4(mandel) 1 0.155 0.155 0.155 0.155 {_tkinter.create} 505173 0.072 0.000 0.072 0.000 mandel2d_cy.pyx:18(draw_pixel) 37 0.009 0.000 0.009 0.000 {built-in method call}

Simply by moving over some of the code to the cython module, I have reduced the profiling time to almost half of it previous value. Looking more closely at the profiling results, I also notice that calls to

`mandel()`

now only appear once; some overhead in calling
cython functions from python modules has disappeared. Let's see what happens
if I now add some type information.
def create_fractal(int canvas_width, int canvas_height, double min_x, double min_y, double pixel_size, int nb_iterations, canvas): cdef int x, y cdef double real, imag for x in range(0, canvas_width): real = min_x + x*pixel_size for y in range(0, canvas_height): imag = min_y + y*pixel_size if mandel(real, imag, nb_iterations): draw_pixel(x, canvas_height - y, canvas)

The result is only slightly better:

3673815 function calls in 3.475 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 2.189 0.199 3.308 0.301 mandel2e_cy.pyx:24(create_fractal) 3168000 1.046 0.000 1.046 0.000 mandel2e_cy.pyx:4(mandel) 1 0.135 0.135 0.135 0.135 {_tkinter.create} 505173 0.074 0.000 0.074 0.000 mandel2e_cy.pyx:18(draw_pixel) 37 0.028 0.001 0.028 0.001 {built-in method call}

However, one thing I remember from the little I know about C it that, not only do variables have to be declared to be of a certain type, but the same has to be done to functions as well. Here,

`mandel()`

has not been declared
to be of a specific type, so cython assumes it to be a generic
Python object. After reading the cython documentation, and noticing
that `mandel()`

is only called from within the cython
module, I conclude that not only should I specify the type for
`mandel()`

but that it probably makes sense to
specify that it can be "inlined"; I also
do the same for `draw_pixel()`

.
# mandel2f_cy.pyx # cython: profile=True cdef inline bint mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i for i in range(0, max_iterations): z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real, 2*z_real*z_imag + imag ) if (z_real*z_real + z_imag*z_imag) >= 4: return False return (z_real*z_real + z_imag*z_imag) < 4 cdef inline void draw_pixel(x, y, canvas): '''Simulates drawing a given pixel in black by drawing a black line of length equal to one pixel.''' return #canvas.create_line(x, y, x+1, y, fill="black")

This yields a nice improvement.

3673815 function calls in 2.333 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 1.190 0.108 2.194 0.199 mandel2f_cy.pyx:24(create_fractal) 3168000 0.930 0.000 0.930 0.000 mandel2f_cy.pyx:4(mandel) 1 0.127 0.127 0.127 0.127 {_tkinter.create} 505173 0.074 0.000 0.074 0.000 mandel2f_cy.pyx:18(draw_pixel) 37 0.009 0.000 0.009 0.000 {built-in method call}

However... I asked cython to "inline"

`mandel`

,
thus treating them as a pure
C function. Yet, they both appear in the Python profiling information, which
was not the case for `abs()`

once I used cython for the
first time. The reason it appears is that cython has been instructed
to profile all functions in the module, via the directive at the top.
I can selectively turn off the profiling for an individual function
by importing the "cython module" and using a special purpose
decorator as follows.
# mandel2g_cy.pyx # cython: profile=True import cython @cython.profile(False) cdef inline bint mandel(double real, double imag, int max_iterations=20): '''determines if a point is in the Mandelbrot set based on deciding if, after a maximum allowed number of iterations, the absolute value of the resulting number is greater or equal to 2.''' cdef double z_real = 0., z_imag = 0. cdef int i for i in range(0, max_iterations): z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real, 2*z_real*z_imag + imag ) if (z_real*z_real + z_imag*z_imag) >= 4: return False return (z_real*z_real + z_imag*z_imag) < 4 cdef inline void draw_pixel(x, y, canvas): '''Simulates drawing a given pixel in black by drawing a black line of length equal to one pixel.''' return #canvas.create_line(x, y, x+1, y, fill="black")

The result is even better than I would have expected!

505519 function calls in 0.817 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 0.605 0.055 0.676 0.061 mandel2g_cy.pyx:27(create_fractal) 1 0.128 0.128 0.128 0.128 {_tkinter.create} 504877 0.070 0.000 0.070 0.000 mandel2g_cy.pyx:21(draw_pixel) 37 0.010 0.000 0.010 0.000 {built-in method call} 11 0.001 0.000 0.678 0.062 viewer2b.py:20(draw_fractal)

From 85 seconds (at the beginning of this post) down to 0.8 seconds: a reduction by a factor of 100 ...thank you cython! :-)

However, increasing the number of iterations to 1000 (from the current value of 100 used for testing) does increase the time significantly.

495235 function calls in 3.872 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 11 3.653 0.332 3.723 0.338 mandel2g_cy.pyx:27(create_fractal) 1 0.136 0.136 0.136 0.136 {_tkinter.create} 494593 0.071 0.000 0.071 0.000 mandel2g_cy.pyx:21(draw_pixel) 37 0.009 0.000 0.009 0.000 {built-in method call} 11 0.001 0.000 3.726 0.339 viewer2b.py:20(draw_fractal)

It is probably a good time to put back the drawing to see what the overall time profile looks like in a more realistic situation.

5441165 function calls in 20.747 CPU seconds ncalls tottime percall cumtime percall filename:lineno(function) 494604 8.682 0.000 14.427 0.000 Tkinter.py:2135(_create) 11 3.863 0.351 20.572 1.870 mandel2g_cy.pyx:27(create_fractal) 494632 2.043 0.000 5.326 0.000 Tkinter.py:1046(_options) 494657 1.845 0.000 2.861 0.000 Tkinter.py:77(_cnfmerge) 494593 1.548 0.000 16.709 0.000 mandel2g_cy.pyx:21(draw_pixel)

Clearly, the limiting time factor is now the Tkinter based drawing, and not the other code. It is time to think of a better drawing strategy. However, this will have to wait until next post.

## 4 comments:

You should likely use

cdef double complex z

instead of

cdef complex z

The former makes z a low-level "unboxed" number (fast), whereas the latter AFAIK makes it a Python complex object (much slower since all arithmetic goes through the PyNumber API).

Anonymous: thanks for the suggestion. I did that (and the same for the variable "c"). In addition, I had to replace z**2 by z*z for this to work, as I got an error message about "complex powers not yet supported".

Simply replacing z**2 by z*z brought down the time from 38 to 29 seconds (approximately); adding the "double" type declaration brought down the time to 22 seconds: nice, but not as much as can be obtained by working with real numbers only.

Now that's a surprise. Examining the assembler output, it seems that gcc does not inline complex multiplication, which apparently gives a factor of 10 difference.

With `-ffast-math` flag there is no performance difference (of plain C) double vs. complex routines.

The program wasn't actually drawing anything until I noticed that all instances on this page of

#self.canvas.create_line(x, y, x+1, y, fill="black")

are commented out.

I might also point out that, as best I can tell, the posts don't include actual instructions to build (or profile) with cython. I figured out how to do so of course looking elsewhere on the web. E.g.

Building Cython code

Other than that the exercise has been very helpful.

Je vous remercie - Patrick

Post a Comment