Lately, there seems to be a lot of posts trying to encourage people to port their apps/libraries/modules to Python 3. I'd like to add my voice and perhaps, use as a reminder this old post: more than 2 years ago, Crunchy was made compatible with Python 2.4, 2.5, 2.6 ... and 3.1. At the time, we made the decision to not follow what seem to be the official recommendation, namely to have one code base based on 2.x and use the 2to3 tool to do the automatic translation. Instead, we decided to have a single code base, which seems the way people are doing it these days. There were of course some challenges, especially as we kept compatibility all the way back to Python 2.4: since Crunchy puts a Python interpreter inside your browser and manipulates what's there, regardless of the encoding, it has to be able to do a lot of string (and bytes for 3.x) manipulations, as well as dealing with incompatible syntax for handling exceptions, etc. when running the code that is fed to it. This was done when there was very little known about best practices for porting from 2.x to 3. Partly as a result, Crunchy's code is not the best example to look at when it comes to doing such a port ... but it certainly demonstrated that it was possible to port a non-trivial program with a tiny team. Now, the situation is quite different, and many more projects have been ported, and you can benefit from their experience.
So, what are you waiting for?
Friday, December 09, 2011
Thursday, September 22, 2011
Errors should never pass silently
- The Zen of Python
Lately, I have been programming a lot in Javascript, and have come to appreciate even more what Tim Peters expressed so well in writing what is now known as The Zen of Python. In particular, I appreciate the philosophy that "Errors should never pass silently."
Those that don't care about Javascript can stop reading now. Others may benefit, and perhaps even enlighten me even further, about a "feature" I came across.
I'm working on a new website which will include only interactive tutorials. [No, not Crunchy-based ones, but honest-to-goodness interactive tutorials that require nothing but a browser.] To make my life easier, I use jQuery as well as CodeMirror. A user can edit the code in the CodeMirror editor and view the result "live" in a separate window (html iframe). Every time the content of the editor is changed, the iframe gets updated, as illustrated here.
I started by using the standard $ jQuery notation inside the html iframe without including a link to jQuery in the iframe, since I already included it in the parent page, and got an error in the console to the effect that $ was not defined. Fair enough. I changed the source to also include a link to jQuery inside the content of the editor (to be copied in the iframe) and the error vanished. However, no javascript code was run from the iframe, not even a simple test alert. Meanwhile, the javascript console remained silent.
Errors should never pass silently.
Eventually, by searching on the web, I was able to find a solution. However, before I found a solution, I encountered a truly puzzling "feature".
After including a link to jQuery in the editor (which then was copied into the iframe) and reloading the page, if I then proceeded to remove that link from the editor, suddenly all javascript code embedded in the iframe, including all jQuery calls, would work.
I can just imagine giving these instructions to beginners:
In any event, to solve my problem, I had to do 3 things:
Now everything appears to work just as expected.
Lately, I have been programming a lot in Javascript, and have come to appreciate even more what Tim Peters expressed so well in writing what is now known as The Zen of Python. In particular, I appreciate the philosophy that "Errors should never pass silently."
Those that don't care about Javascript can stop reading now. Others may benefit, and perhaps even enlighten me even further, about a "feature" I came across.
I'm working on a new website which will include only interactive tutorials. [No, not Crunchy-based ones, but honest-to-goodness interactive tutorials that require nothing but a browser.] To make my life easier, I use jQuery as well as CodeMirror. A user can edit the code in the CodeMirror editor and view the result "live" in a separate window (html iframe). Every time the content of the editor is changed, the iframe gets updated, as illustrated here.
I started by using the standard $ jQuery notation inside the html iframe without including a link to jQuery in the iframe, since I already included it in the parent page, and got an error in the console to the effect that $ was not defined. Fair enough. I changed the source to also include a link to jQuery inside the content of the editor (to be copied in the iframe) and the error vanished. However, no javascript code was run from the iframe, not even a simple test alert. Meanwhile, the javascript console remained silent.
Errors should never pass silently.
Eventually, by searching on the web, I was able to find a solution. However, before I found a solution, I encountered a truly puzzling "feature".
After including a link to jQuery in the editor (which then was copied into the iframe) and reloading the page, if I then proceeded to remove that link from the editor, suddenly all javascript code embedded in the iframe, including all jQuery calls, would work.
I can just imagine giving these instructions to beginners:
We are going to use jQuery, as you can see from the code in the editor. Now, to begin, you have to select the line where jQuery is loaded and delete it so that the browser (iframe) can use it.I still have not figured out how that phantom jQuery worked... and strongly believe that some error message or warning should have appeared in the javascript console (for Chrome, or Firebug for Firefox).
In any event, to solve my problem, I had to do 3 things:
- Only load jQuery once, in the main window.
- Write $ = parent.$; inside the iframe before using the $ notation.
- As I wanted to draw things on a canvas, replace the usual $("#canvas") jQuery idiom by $("#canvas", document) everywhere.
Now everything appears to work just as expected.
Sunday, January 16, 2011
Book review: Pro Python System Administration
Summary: Pro Python System Administration is a comprehensive book showing how Python can be used effectively to perform a variety of system administration tasks. I would recommend it highly to anyone having to do system administration work. For more information, please consult the author's web site.
===
There is a saying that "no good deed goes unpunished". I feel that a counterpart should be "no bad talk goes unrewarded". At Pycon 2009, I gave a talk on plugins that has to be amongst the worst presentations I ever gave. Yet, as an unexpected result from that talk, I received a free copy of Pro Python System Administration written by Rytis Sileika. This blog entry is a review of that book.
This book is written for system administrators, something in which I have no experience; therefore, this review will definitely not have the depth that an expert may have given it.
Four general areas of system administrations are covered: network management, web server and web application management, database system management, and system monitoring. Examples are given on a Linux system with no indication as to whether or not a given example is applicable to other operating systems. Given that Python works with all major operating systems, and that the book focuses on using Python packages, I suspect that the content would be easily adaptable to other environments.
While the book is classified, on the cover, as being addressed to advanced Python programmers, the author in the book introduction indirectly suggests that this book would be appropriate for people that have some minimal experience with Python. I suspect that the classification on the book cover was done (wrongly) by the editor as I found the examples very readable and I would not claim to be an advanced Python programmer.
The book is divided into 13 chapters, each focused on one or a few well-defined tasks. While the tasks in a given chapter are relatively independent of those of other chapters, there is a natural progression in terms of topics introduced and it is probably better to read the book in the natural sequence rather than reading chapters randomly - in other words, this book is not simply a random collection of recipes for a series of tasks.
1. Reading and Collecting Performance Data Using SNMP
In this chapter, Sileika introduces the Simple Network Management Protocol, or SNMP after which he shows how one can query SNMP devices using Python and the PySNMP library. In the second part of that chapter, he introduces RRDTool, an application for graphing monitoring data, and shows how to interact with it using the rrdtool module. In the last section of this first chapter, he shows how to create web pages (with the eventual goal of displaying on the web monitoring data) using the Jinja2 templating system.
2. Managing Devices Using the SOAP API
In this chapter, Sileika introduces the Simple Object Access Protocol or SOAP and gives examples based on using the Zolera SOAP Infrastructure (ZSI) package. The bulk of the chapter focuses on explaining how to manage and monitor Citrix Netscaler load balancers. The Python logging module is introduced and used.
3. Creating a Web Application for IP Address Accountancy
In this chapter, the Django framework is introduced and used to build a web application that maintains IP addresses allocation on an internal network. Rather than using the web server included with Django, Sileika shows how to use Django with the Apache web server.
4. Integrating the IP Address Application with DHCP
This chapter is a continuation of the previous one, where the application previously developed is enhanced with the addition of Dynamic Host Configuration Protocol (DHCP) services as well as a few others. More advanced Django methods are included as well as some AJAX calls.
5. Maintaining a List of Virtual Hosts in an Apache Configuration File
Another Django based application is introduced, this time with a focus on the administration side.
6. Gathering and Presenting Statistical Data from Apache Log Files
This chapter focuses on building a plugin-based modular framework to analyze log files. The content of this chapter is the reason why I received a free copy of the book in the first place: Sileika mentioned to me in an email that the architecture described was mostly inspired by the presentation I gave at PyCon with a few modifications that allow for information exchange between the plug-in modules. When I got the original email, I was really surprised given that I had tried to forget about the talk I had given on plugins.
In my opinion, Sileika does an excellent job in explaining how plugins can be easily created with Python and used effectively to design modular applications.
Dynamic discovery and loading of plugins is illustrated, and the GeoIP Python library is used to find the physical location corresponding to a given IP address.
7. Performing Complex Searches and Reporting on Application Log Files
This chapter shows how to use Exctractor, an open source log file parser tool, to parse more complex log files than those generated by an Apache server as illustrated in the previous chapter. Of note is the introduction and usage of Python generators as well as an introduction to parsing XML files with Python.
8. A Web Site Availability Check Script for Nagios
This chapter shows how to use Python with Nagios, a network monitoring system. Python packages/modules illustrated include BeautifulSoup, urllib and urllib2. The monitoring scripts developed do more than simply checking for site availability and actually test the web application logic.
9. Management and Monitoring Subsystem
10. Remote Monitoring Agents
11. Statistics Gathering and Reporting
Chapters 9, 10 and 11 explain how to build a "simple" distributed monitoring system. A number of Python module & packages are used including xmlrpclib, SimpleXMLRPCServer, CherryPy, sqlite3, multiprocessing, ConfigParser, subprocess, Numpy, matplotlib and others. The application developed is a good example of using Python as a "glue language", to use third-party modules & packages. One weakness is that the introduction to statistics included is rather elementary and, I believe, could have been shortened considerably given the intended audience.
12. Automatic MySQL Database Performance Tuning
This chapter revisits the use of plugins, with a slightly more advanced application, where information can be exchanged between the different plugins.
13. Using Amazon EC2/S3 as a Data Warehouse Solution
This last chapter gives a crash course on Amazon's "cloud computing" offerings. It is a good final note to the book, and a good starting point for future explorations.
Overall, I found the book quite readable even though it is outside of my area of expertise. Occasionally, I had the feeling that there were a few "fillers" (e.g. overlong log listings, etc.) that could have been shortened without losing anything of real value. This is very much an applied book, with real life examples that could be either used "as-is" or used as starting points for more extended applications.
I would recommend this book highly to anyone who has to perform any one of the system administration tasks mentioned above. It is also a good source of non-trivial examples demonstrating the use of a number of Python modules and packages. The code that is given would likely save many hours of development. As is becoming the norm, the source code included in the book is available from the publisher. Also, many of the prototypes covered in the book are available as open source projects at the author's web site http://www.sysadminpy.com, where a discussion forum is also available.
===
There is a saying that "no good deed goes unpunished". I feel that a counterpart should be "no bad talk goes unrewarded". At Pycon 2009, I gave a talk on plugins that has to be amongst the worst presentations I ever gave. Yet, as an unexpected result from that talk, I received a free copy of Pro Python System Administration written by Rytis Sileika. This blog entry is a review of that book.
This book is written for system administrators, something in which I have no experience; therefore, this review will definitely not have the depth that an expert may have given it.
Four general areas of system administrations are covered: network management, web server and web application management, database system management, and system monitoring. Examples are given on a Linux system with no indication as to whether or not a given example is applicable to other operating systems. Given that Python works with all major operating systems, and that the book focuses on using Python packages, I suspect that the content would be easily adaptable to other environments.
While the book is classified, on the cover, as being addressed to advanced Python programmers, the author in the book introduction indirectly suggests that this book would be appropriate for people that have some minimal experience with Python. I suspect that the classification on the book cover was done (wrongly) by the editor as I found the examples very readable and I would not claim to be an advanced Python programmer.
The book is divided into 13 chapters, each focused on one or a few well-defined tasks. While the tasks in a given chapter are relatively independent of those of other chapters, there is a natural progression in terms of topics introduced and it is probably better to read the book in the natural sequence rather than reading chapters randomly - in other words, this book is not simply a random collection of recipes for a series of tasks.
1. Reading and Collecting Performance Data Using SNMP
In this chapter, Sileika introduces the Simple Network Management Protocol, or SNMP after which he shows how one can query SNMP devices using Python and the PySNMP library. In the second part of that chapter, he introduces RRDTool, an application for graphing monitoring data, and shows how to interact with it using the rrdtool module. In the last section of this first chapter, he shows how to create web pages (with the eventual goal of displaying on the web monitoring data) using the Jinja2 templating system.
2. Managing Devices Using the SOAP API
In this chapter, Sileika introduces the Simple Object Access Protocol or SOAP and gives examples based on using the Zolera SOAP Infrastructure (ZSI) package. The bulk of the chapter focuses on explaining how to manage and monitor Citrix Netscaler load balancers. The Python logging module is introduced and used.
3. Creating a Web Application for IP Address Accountancy
In this chapter, the Django framework is introduced and used to build a web application that maintains IP addresses allocation on an internal network. Rather than using the web server included with Django, Sileika shows how to use Django with the Apache web server.
4. Integrating the IP Address Application with DHCP
This chapter is a continuation of the previous one, where the application previously developed is enhanced with the addition of Dynamic Host Configuration Protocol (DHCP) services as well as a few others. More advanced Django methods are included as well as some AJAX calls.
5. Maintaining a List of Virtual Hosts in an Apache Configuration File
Another Django based application is introduced, this time with a focus on the administration side.
6. Gathering and Presenting Statistical Data from Apache Log Files
This chapter focuses on building a plugin-based modular framework to analyze log files. The content of this chapter is the reason why I received a free copy of the book in the first place: Sileika mentioned to me in an email that the architecture described was mostly inspired by the presentation I gave at PyCon with a few modifications that allow for information exchange between the plug-in modules. When I got the original email, I was really surprised given that I had tried to forget about the talk I had given on plugins.
In my opinion, Sileika does an excellent job in explaining how plugins can be easily created with Python and used effectively to design modular applications.
Dynamic discovery and loading of plugins is illustrated, and the GeoIP Python library is used to find the physical location corresponding to a given IP address.
7. Performing Complex Searches and Reporting on Application Log Files
This chapter shows how to use Exctractor, an open source log file parser tool, to parse more complex log files than those generated by an Apache server as illustrated in the previous chapter. Of note is the introduction and usage of Python generators as well as an introduction to parsing XML files with Python.
8. A Web Site Availability Check Script for Nagios
This chapter shows how to use Python with Nagios, a network monitoring system. Python packages/modules illustrated include BeautifulSoup, urllib and urllib2. The monitoring scripts developed do more than simply checking for site availability and actually test the web application logic.
9. Management and Monitoring Subsystem
10. Remote Monitoring Agents
11. Statistics Gathering and Reporting
Chapters 9, 10 and 11 explain how to build a "simple" distributed monitoring system. A number of Python module & packages are used including xmlrpclib, SimpleXMLRPCServer, CherryPy, sqlite3, multiprocessing, ConfigParser, subprocess, Numpy, matplotlib and others. The application developed is a good example of using Python as a "glue language", to use third-party modules & packages. One weakness is that the introduction to statistics included is rather elementary and, I believe, could have been shortened considerably given the intended audience.
12. Automatic MySQL Database Performance Tuning
This chapter revisits the use of plugins, with a slightly more advanced application, where information can be exchanged between the different plugins.
13. Using Amazon EC2/S3 as a Data Warehouse Solution
This last chapter gives a crash course on Amazon's "cloud computing" offerings. It is a good final note to the book, and a good starting point for future explorations.
Overall, I found the book quite readable even though it is outside of my area of expertise. Occasionally, I had the feeling that there were a few "fillers" (e.g. overlong log listings, etc.) that could have been shortened without losing anything of real value. This is very much an applied book, with real life examples that could be either used "as-is" or used as starting points for more extended applications.
I would recommend this book highly to anyone who has to perform any one of the system administration tasks mentioned above. It is also a good source of non-trivial examples demonstrating the use of a number of Python modules and packages. The code that is given would likely save many hours of development. As is becoming the norm, the source code included in the book is available from the publisher. Also, many of the prototypes covered in the book are available as open source projects at the author's web site http://www.sysadminpy.com, where a discussion forum is also available.
Sunday, August 01, 2010
My favourite Python one-liner
I am slowly working on two new projects involving web-based interactive Python tutorials and had to set up a local web server for handling ajax request. After browsing the web, I stumbled upon the following one-liner in the Python documentation that I had completely forgotten about:
python -m SimpleHTTPServer 8000
Then, simply pointing the browser at "http://localhost:8000" and all is set to go. Granted, it is not what one might consider a traditional "one-liner" but it works for me. Python's batteries included is certainly a nice feature!
python -m SimpleHTTPServer 8000
Then, simply pointing the browser at "http://localhost:8000" and all is set to go. Granted, it is not what one might consider a traditional "one-liner" but it works for me. Python's batteries included is certainly a nice feature!
Sunday, January 17, 2010
Profiling adventures and cython - Faster drawing and conclusion
In the last blog post, I made use of cython to speed up some calculations
involving the iterative equation defining the Mandelbrot set. From
the initial profiling run with 1000 iterations in the second post
to the last run in the previous post, the profiling time went from
575 seconds down to 21 seconds, which is a reduction by a factor of
27. This is nothing to sneer at. Yet, I can do better.
Let's start by having a sample profile run with 1000 iterations with the program as it was last time, but keeping track of more function calls in the profile.
Currently, a "line" is potentially drawn for each pixel. If I look at a given fractal drawing, I can see that it could be drawn using "longer lines", when consecutive pixels are to be drawn with the same colour. I can easily implement this as follows.
So far, the pictures the program has been able to produce have only been in black and white. It is time to spruce things up and add colour. To do this, we will need to make three general changes:
Notice how the Mandelbrot set boundary seems rather smooth ... this is clearly a sign that we might be missing something. Increasing the number of iterations to 1000 reveals a different picture.
We see much more details and many points that were wrongly identified as being part of the Mandelbrot sets are correctly excluded (and colored!). What happens if we look at a region that contains more points inside the Mandelbrot set (thus requiring more iterations) and increase the number of iterations to 1500 (as I found that 1000 was not enough in this region).
Unfortunately, the profiling and the timing information displayed does not tell the entire story. In practice, I found that it would take many more seconds (sometimes more than 10) for the canvas to be updated than the timing information given for each of the three pictures displayed above. Something is happening behind the scene when the picture is updated on the screen and which is not being recorded by my simple timing method.
For comparison, I had a look at Aptus, a program that uses a hand-coded C extension for speed. Actually, I had tried Aptus a few years ago, when Ned Batchelder first mentioned it, and tried it again for fun as I was working on my own version. Aptus can produce really nice pictures, really fast. Here's an example that was produced, according to the timing given by Aptus itself in 0.25 seconds.
Note that the timing given by Aptus seems to be truly representative, unlike the timing for my program. I should mention that, in addition to using a hand-crafted C extension, Aptus uses wxPython instead of Tkinter, and it also uses PIL and numpy, both of which are known for their speed. It might be possible that using PIL and numpy with my program would improve the speed significantly. However, all three libraries are not part of the standard library and so do not meet the constraints I had decided upon at the beginning.
This concludes this profiling experiment ... at least for now. I should emphasize that the goal of these posts was to record a profiling experiment using cython. I do not pretend that this code was especially hand-crafted for speed ... even though I did try some simple changes to improve its speed. I may revisit this application at a later time, especially if someone more experienced can point out ways to increase the speed significantly, preferably while staying within the constraints I set up at the beginning: other than cython, use only modules from the standard library. However, I would be interested if anyone adapts this code to use PIL and/or numpy in a straightforward way and, in doing so, increases the speed significantly.
Let's start by having a sample profile run with 1000 iterations with the program as it was last time, but keeping track of more function calls in the profile.
5441165 function calls in 20.484 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
494604 8.461 0.000 14.218 0.000 Tkinter.py:2135(_create)
11 3.839 0.349 20.315 1.847 mandel2g_cy.pyx:27(create_fractal)
494632 2.053 0.000 5.309 0.000 Tkinter.py:1046(_options)
494657 1.809 0.000 2.811 0.000 Tkinter.py:77(_cnfmerge)
494593 1.522 0.000 16.475 0.000 mandel2g_cy.pyx:21(draw_pixel)
989240 0.752 0.000 0.752 0.000 {method 'update' of 'dict' objects}
494593 0.736 0.000 14.953 0.000 Tkinter.py:2155(create_line)
989257 0.698 0.000 0.698 0.000 {_tkinter._flatten}
494632 0.315 0.000 0.315 0.000 {method 'items' of 'dict' objects}
1 0.158 0.158 0.158 0.158 {_tkinter.create}
494641 0.131 0.000 0.131 0.000 {callable}
37 0.009 0.000 0.009 0.000 {built-in method call}
11 0.001 0.000 20.317 1.847 viewer2b.py:20(draw_fractal)
12 0.000 0.000 0.000 0.000 viewer.py:60(info)
I notice that there are many function calls: over 5 millions of them.
While most of them appear to take very little time, they do add up
in the end. It is time to adopt a smarter drawing strategy.
Currently, a "line" is potentially drawn for each pixel. If I look at a given fractal drawing, I can see that it could be drawn using "longer lines", when consecutive pixels are to be drawn with the same colour. I can easily implement this as follows.
def create_fractal(int canvas_width, int canvas_height,
double min_x, double min_y, double pixel_size,
int nb_iterations, canvas):
cdef int x, y, start_y, end_y
cdef double real, imag
cdef bint start_line
for x in range(0, canvas_width):
real = min_x + x*pixel_size
start_line = False
for y in range(0, canvas_height):
imag = min_y + y*pixel_size
if mandel(real, imag, nb_iterations):
if not start_line:
start_line = True
start_y = canvas_height - y
else:
if start_line:
start_line = False
end_y = canvas_height - y
canvas.create_line(x, start_y, x, end_y, fill="black")
if start_line:
end_y = canvas_height - y
canvas.create_line(x, start_y, x, end_y, fill="black")
Note that I no longer need the function draw_pixel().
The result is a reduction from 20 seconds down to 4 seconds:
79092 function calls in 3.959 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 3.552 0.323 3.815 0.347 mandel2h_cy.pyx:21(create_fractal)
7856 0.150 0.000 0.250 0.000 Tkinter.py:2135(_create)
1 0.131 0.131 0.131 0.131 {_tkinter.create}
7884 0.037 0.000 0.091 0.000 Tkinter.py:1046(_options)
7909 0.030 0.000 0.047 0.000 Tkinter.py:77(_cnfmerge)
7845 0.014 0.000 0.263 0.000 Tkinter.py:2155(create_line)
15761 0.014 0.000 0.014 0.000 {_tkinter._flatten}
15744 0.013 0.000 0.013 0.000 {method 'update' of 'dict' objects}
37 0.009 0.000 0.009 0.000 {built-in method call}
7884 0.005 0.000 0.005 0.000 {method 'items' of 'dict' objects}
7893 0.002 0.000 0.002 0.000 {callable}
11 0.001 0.000 3.818 0.347 viewer2b.py:20(draw_fractal)
12 0.000 0.000 0.000 0.000 viewer.py:60(info)
22 0.000 0.000 0.001 0.000 Tkinter.py:1172(_configure)
And it is now again my own code in create_fractal() that appears
to be the limiting factor. Thinking back of when I increased the number
of iterations from 100 to 1000, thus only affecting the execution time
of mandel(), it seemed like this might be a good place
to look at for possible time improvements. Let's recall what the code
looks like.
cdef inline bint mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
for i in range(0, max_iterations):
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= 4:
return False
return (z_real*z_real + z_imag*z_imag) < 4
I used a Pythonic tuple assignement to avoid the use of temporary
variables. However, in a typical iteration, there will be 4 multiplications
for the tuple re-assigment and two more for the "if" statement, for a total
of 6. It is certainly possible to reduce the number of multiplications
by using temporary variables, as follows:
cdef inline bint mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
cdef double zr_sq, zi_sq, z_cross
for i in range(0, max_iterations):
zr_sq = z_real*z_real
zi_sq = z_imag*z_imag
z_cross = 2*z_real*z_imag
z_real = zr_sq - zi_sq + real
z_imag = z_cross + imag
if (zr_sq + zi_sq) >= 4:
return False
return (zr_sq + zi_sq) < 4
So, there are now fewer multiplications to compute. Surely, this will
speed up the code:
78982 function calls in 4.888 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 4.478 0.407 4.748 0.432 mandel2i_cy.pyx:26(create_fractal)
7845 0.153 0.000 0.256 0.000 Tkinter.py:2135(_create)
1 0.128 0.128 0.128 0.128 {_tkinter.create}
7873 0.040 0.000 0.095 0.000 Tkinter.py:1046(_options)
7898 0.031 0.000 0.048 0.000 Tkinter.py:77(_cnfmerge)
7834 0.014 0.000 0.270 0.000 Tkinter.py:2155(create_line)
15739 0.013 0.000 0.013 0.000 {_tkinter._flatten}
15722 0.013 0.000 0.013 0.000 {method 'update' of 'dict' objects}
37 0.009 0.000 0.009 0.000 {built-in method call}
7873 0.005 0.000 0.005 0.000 {method 'items' of 'dict' objects}
7882 0.002 0.000 0.002 0.000 {callable}
11 0.001 0.000 4.750 0.432 viewer2b.py:20(draw_fractal)
12 0.000 0.000 0.000 0.000 viewer.py:60(info)
4 0.000 0.000 0.000 0.000 {posix.stat}
Alas, that is not the case, as the previous profiling run was slightly
below 4 seconds. [Note that I did run each profiling test at least three
times to prevent any anomalous result.] Apparently my intuition is not a very good
guide when it comes to predicting how cython will be able to optimize
a given function.
So far, the pictures the program has been able to produce have only been in black and white. It is time to spruce things up and add colour. To do this, we will need to make three general changes:
-
We will modify
mandel()so that it returns the number of iterations required to evaluate that a given point does not belong to the set; if it does belong, we will return -1. -
We will create a colour palette as a Python list. For a given number
of iterations required by
mandel(), we will pick a given colour, cycling through the colours from the palette. - We will need to change our line drawing method so that we keep track of the colour (number of iteration) rather than simply whether or not the point is in the set ("black") or not.
# mandel3cy.pyx
# cython: profile=True
import cython
def make_palette():
'''sample coloring scheme for the fractal - feel free to experiment'''
colours = []
for i in range(0, 25):
colours.append('#%02x%02x%02x' % (i*10, i*8, 50 + i*8))
for i in range(25, 5, -1):
colours.append('#%02x%02x%02x' % (50 + i*8, 150+i*2, i*10))
for i in range(10, 2, -1):
colours.append('#00%02x30' % (i*15))
return colours
colours = make_palette()
cdef int nb_colours = len(colours)
@cython.profile(False)
cdef inline int mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
for i in range(0, max_iterations):
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= 4:
return i
return -1
def create_fractal(int canvas_width, int canvas_height,
double min_x, double min_y, double pixel_size,
int nb_iterations, canvas):
global colours, nb_colours
cdef int x, y, start_y, end_y, current_colour, new_colour
cdef double real, imag
for x in range(0, canvas_width):
real = min_x + x*pixel_size
start_y = canvas_height
current_colour = mandel(real, min_y, nb_iterations)
for y in range(1, canvas_height):
imag = min_y + y*pixel_size
new_colour = mandel(real, imag, nb_iterations)
if new_colour != current_colour:
if current_colour == -1:
canvas.create_line(x, start_y, x, canvas_height-y,
fill="black")
else:
canvas.create_line(x, start_y, x, canvas_height-y,
fill=colours[current_colour%nb_colours])
current_colour = new_colour
start_y = canvas_height - y
if current_colour == -1:
canvas.create_line(x, start_y, x, 0, fill="black")
else:
canvas.create_line(x, start_y, x, 0,
fill=colours[current_colour%nb_colours])
If we profile this code, we find out that it takes about three times as
long to generate a colour picture than it did to generate a black
and white one - at least, for the starting configuration...
2370682 function calls in 12.638 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 4.682 0.426 12.184 1.108 mandel3_cy.pyx:36(create_fractal)
237015 4.310 0.000 7.084 0.000 Tkinter.py:2135(_create)
237043 1.019 0.000 2.575 0.000 Tkinter.py:1046(_options)
237068 0.877 0.000 1.353 0.000 Tkinter.py:77(_cnfmerge)
1 0.443 0.443 0.443 0.443 {_tkinter.create}
237004 0.418 0.000 7.502 0.000 Tkinter.py:2155(create_line)
474062 0.361 0.000 0.361 0.000 {method 'update' of 'dict' objects}
474079 0.313 0.000 0.313 0.000 {_tkinter._flatten}
237043 0.143 0.000 0.143 0.000 {method 'items' of 'dict' objects}
237052 0.061 0.000 0.061 0.000 {callable}
37 0.009 0.000 0.009 0.000 {built-in method call}
11 0.000 0.000 12.186 1.108 viewer3.py:20(draw_fractal)
12 0.000 0.000 0.000 0.000 viewer.py:60(info)
4 0.000 0.000 0.000 0.000 {posix.stat}
We also generate some nice pictures! First, using only 100 iterations.Notice how the Mandelbrot set boundary seems rather smooth ... this is clearly a sign that we might be missing something. Increasing the number of iterations to 1000 reveals a different picture.
We see much more details and many points that were wrongly identified as being part of the Mandelbrot sets are correctly excluded (and colored!). What happens if we look at a region that contains more points inside the Mandelbrot set (thus requiring more iterations) and increase the number of iterations to 1500 (as I found that 1000 was not enough in this region).
Unfortunately, the profiling and the timing information displayed does not tell the entire story. In practice, I found that it would take many more seconds (sometimes more than 10) for the canvas to be updated than the timing information given for each of the three pictures displayed above. Something is happening behind the scene when the picture is updated on the screen and which is not being recorded by my simple timing method.
For comparison, I had a look at Aptus, a program that uses a hand-coded C extension for speed. Actually, I had tried Aptus a few years ago, when Ned Batchelder first mentioned it, and tried it again for fun as I was working on my own version. Aptus can produce really nice pictures, really fast. Here's an example that was produced, according to the timing given by Aptus itself in 0.25 seconds.
Note that the timing given by Aptus seems to be truly representative, unlike the timing for my program. I should mention that, in addition to using a hand-crafted C extension, Aptus uses wxPython instead of Tkinter, and it also uses PIL and numpy, both of which are known for their speed. It might be possible that using PIL and numpy with my program would improve the speed significantly. However, all three libraries are not part of the standard library and so do not meet the constraints I had decided upon at the beginning.
This concludes this profiling experiment ... at least for now. I should emphasize that the goal of these posts was to record a profiling experiment using cython. I do not pretend that this code was especially hand-crafted for speed ... even though I did try some simple changes to improve its speed. I may revisit this application at a later time, especially if someone more experienced can point out ways to increase the speed significantly, preferably while staying within the constraints I set up at the beginning: other than cython, use only modules from the standard library. However, I would be interested if anyone adapts this code to use PIL and/or numpy in a straightforward way and, in doing so, increases the speed significantly.
Profiling adventures and cython - Introducing cython
In the previous blog post, I made some attempts at speeding up the
function
The goal of cython could be described as providing an easy way to convert a Python module into a C extension. This is what I will do. [There are other ways to work with cython extensions than what I use here; for more information, please consult the cython web site.] Note that I am NOT a cython expert; this is only the first project for which I use cython. While I am not interested in creating an application for distribution, and hence do not use the setup method for cython, it is quite possible that there are better ways to use cython than what I explore here.
I first start by taking my existing module and copying it into a new file, with a ".pyx" extension instead of the traditional ".py".
Note that I have removed the equivalence between
I have also included a commented line stating that 'profile' was equal to True; this is a cython directive that will enable the Python profiler to also include cython functions in its tally.
In order to import this module, I also need to modify the viewer to import the cython module. Here is the new version.
Other than the top few lines, nothing has changed. Time to run the profiler with this new version.
A reduction from 85 to 50 seconds; cython must be doing something right! Note that the calls to
Note also that
As mentioned before, cython is a tool to help create C extensions. One of the differences between C and Python is that variables have a declared type in C. If one tells cython about what type a given variable is, cython can often use that information to make the code run faster. As an example, I know that two of the variables are of type integers which is a native C type; I can add this information as follows.
Running the profiler with this change yields the following:
Another significant time reduction, this time of the order of 20%. And we didn't tell cython that "z" and "c" are complex yet.
Actually, C does not have a complex data type. So, I can choose one of two strategies:
The timing results are the following:
The time difference between this run and the previous one is within the variation I observe from one profiling run to the next (using exactly the same program). Therefore, I conclude that this latest attempt didn't speed up the code. It is possible that I have overlooked something to ensure that cython could make use of the information about the complex datatype more efficiently ... It seems like I need a different strategy. I will resort to doing the complex algebra myself, and work only with real numbers. Here's the modified code for the mandel module.
I also change the call within
This total execution time has been reduced from 38 to 7 seconds.
And here is the new cython module, without any additional type declaration.
The profiling result is as follows:
Simply by moving over some of the code to the cython module, I have reduced the profiling time to almost half of it previous value. Looking more closely at the profiling results, I also notice that calls to
The result is only slightly better:
However, one thing I remember from the little I know about C it that, not only do variables have to be declared to be of a certain type, but the same has to be done to functions as well. Here,
This yields a nice improvement.
However... I asked cython to "inline"
The result is even better than I would have expected!
From 85 seconds (at the beginning of this post) down to 0.8 seconds: a reduction by a factor of 100 ...thank you cython! :-)
However, increasing the number of iterations to 1000 (from the current value of 100 used for testing) does increase the time significantly.
It is probably a good time to put back the drawing to see what the overall time profile looks like in a more realistic situation.
Clearly, the limiting time factor is now the Tkinter based drawing, and not the other code. It is time to think of a better drawing strategy. However, this will have to wait until next post.
mandel() by making changes in the Python code.
While I had some success in doing so, it was clearly not enough
for my purpose. As a result, I will now try to use cython. Before
I do this, I note again the result from the last profiling run,
limiting the information to the 5 longest-running functions or methods. 3673150 function calls in 84.807 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 73.210 0.000 73.210 0.000 mandel1c.py:7(mandel)
11 10.855 0.987 84.204 7.655 viewer1.py:23(draw_fractal)
1 0.593 0.593 0.593 0.593 {_tkinter.create}
504530 0.137 0.000 0.137 0.000 viewer1.py:17(draw_pixel)
37 0.009 0.000 0.009 0.000 {built-in method call}
The goal of cython could be described as providing an easy way to convert a Python module into a C extension. This is what I will do. [There are other ways to work with cython extensions than what I use here; for more information, please consult the cython web site.] Note that I am NOT a cython expert; this is only the first project for which I use cython. While I am not interested in creating an application for distribution, and hence do not use the setup method for cython, it is quite possible that there are better ways to use cython than what I explore here.
I first start by taking my existing module and copying it into a new file, with a ".pyx" extension instead of the traditional ".py".
# mandel2cy.pyx
# cython: profile=True
def mandel(c, max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
z = 0
for i in range(0, max_iterations):
z = z**2 + c
if abs(z) >= 4:
return False
return abs(z) < 2
Note that I have removed the equivalence between
range and xrange. The reason I have done this
is because with xrange present like this in the file
results in a compilation
error when running cython with Python 3.1. Furthermore, as will
be seen later, it is not really needed even for Python 2.x when using
cython properly.
I have also included a commented line stating that 'profile' was equal to True; this is a cython directive that will enable the Python profiler to also include cython functions in its tally.
In order to import this module, I also need to modify the viewer to import the cython module. Here is the new version.
# viewer2.py
import pyximport
pyximport.install()
from mandel2_cy import mandel
from viewer import Viewer
import time
import sys
if sys.version_info < (3,):
import Tkinter as tk
range = xrange
else:
import tkinter as tk
class FancyViewer(Viewer):
'''Application to display fractals'''
def draw_pixel(self, x, y):
'''Simulates drawing a given pixel in black by drawing a black line
of length equal to one pixel.'''
return
#self.canvas.create_line(x, y, x+1, y, fill="black")
def draw_fractal(self):
'''draws a fractal on the canvas'''
self.calculating = True
begin = time.time()
# clear the canvas
self.canvas.create_rectangle(0, 0, self.canvas_width,
self.canvas_height, fill="white")
for x in range(0, self.canvas_width):
real = self.min_x + x*self.pixel_size
for y in range(0, self.canvas_height):
imag = self.min_y + y*self.pixel_size
c = complex(real, imag)
if mandel(c, self.nb_iterations):
self.draw_pixel(x, self.canvas_height - y)
self.status.config(text="Time required = %.2f s [%s iterations] %s" %(
(time.time() - begin), self.nb_iterations,
self.zoom_info))
self.status2.config(text=self.info())
self.calculating = False
if __name__ == "__main__":
root = tk.Tk()
app = FancyViewer(root)
root.mainloop()
Other than the top few lines, nothing has changed. Time to run the profiler with this new version.
6841793 function calls in 50.145 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 35.913 0.000 35.913 0.000 mandel2_cy.pyx:4(mandel)
11 10.670 0.970 48.754 4.432 viewer2.py:26(draw_fractal)
3168000 2.001 0.000 37.914 0.000 {mandel2_cy.mandel}
1 1.356 1.356 1.356 1.356 {_tkinter.create}
505173 0.167 0.000 0.167 0.000 viewer2.py:20(draw_pixel)
A reduction from 85 to 50 seconds; cython must be doing something right! Note that the calls to
abs() have been eliminated
by using cython. All I did is import the module via Python without
making any other change to the code.
Note also that
mandel appears twice: once (the longest running) as
the function defined on line 8 of mandel2_cy.pyx, and once as
a object belonging to the module mandel2_cy. I will come back to this
later but, for now, I will do some changes to help cython do even better.
As mentioned before, cython is a tool to help create C extensions. One of the differences between C and Python is that variables have a declared type in C. If one tells cython about what type a given variable is, cython can often use that information to make the code run faster. As an example, I know that two of the variables are of type integers which is a native C type; I can add this information as follows.
# mandel2a_cy.pyx
# cython: profile=True
def mandel(c, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef int i
z = 0
for i in range(0, max_iterations):
z = z**2 + c
if abs(z) >= 2:
return False
return abs(z) < 2
Running the profiler with this change yields the following:
6841793 function calls in 39.860 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 27.431 0.000 27.431 0.000 mandel2a_cy.pyx:4(mandel)
11 9.869 0.897 39.339 3.576 viewer2.py:26(draw_fractal)
3168000 1.906 0.000 29.337 0.000 {mandel2a_cy.mandel}
1 0.511 0.511 0.511 0.511 {_tkinter.create}
505173 0.131 0.000 0.131 0.000 viewer2.py:20(draw_pixel)
Another significant time reduction, this time of the order of 20%. And we didn't tell cython that "z" and "c" are complex yet.
Actually, C does not have a complex data type. So, I can choose one of two strategies:
- I can change the code so that I deal only with real numbers, by working myself how to multiply and add complex numbers.
- I can use some special cython technique to extract all the relevant information about the Python built-in complex data type without changing the code inside the function (other than adding some type declaration).
# mandel2b_cy.pyx
# cython: profile=True
cdef extern from "complexobject.h":
struct Py_complex:
double real
double imag
ctypedef class __builtin__.complex [object PyComplexObject]:
cdef Py_complex cval
def mandel(complex c, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef int i
cdef complex z
z = 0. + 0.j
for i in range(0, max_iterations):
z = z**2 + c
if abs(z) >= 2:
return False
return abs(z) < 2
The timing results are the following:
6841793 function calls in 38.424 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 26.771 0.000 26.771 0.000 mandel2b_cy.pyx:14(mandel)
11 9.435 0.858 38.209 3.474 viewer2.py:26(draw_fractal)
3168000 1.865 0.000 28.636 0.000 {mandel2b_cy.mandel}
1 0.205 0.205 0.205 0.205 {_tkinter.create}
505173 0.136 0.000 0.136 0.000 viewer2.py:20(draw_pixel)
The time difference between this run and the previous one is within the variation I observe from one profiling run to the next (using exactly the same program). Therefore, I conclude that this latest attempt didn't speed up the code. It is possible that I have overlooked something to ensure that cython could make use of the information about the complex datatype more efficiently ... It seems like I need a different strategy. I will resort to doing the complex algebra myself, and work only with real numbers. Here's the modified code for the mandel module.
# mandel2c_cy.pyx
# cython: profile=True
def mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
for i in range(0, max_iterations):
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= 4:
return False
return (z_real*z_real + z_imag*z_imag) < 4
I also change the call within
draw_fractal() so that
I don't use complex variables. The result is extremely encouraging:
6841793 function calls in 7.205 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 4.379 0.398 7.066 0.642 viewer2a.py:26(draw_fractal)
3168000 1.557 0.000 2.570 0.000 {mandel2c_cy.mandel}
3168000 1.013 0.000 1.013 0.000 mandel2c_cy.pyx:4(mandel)
1 0.130 0.130 0.130 0.130 {_tkinter.create}
505173 0.114 0.000 0.114 0.000 viewer2a.py:20(draw_pixel)
This total execution time has been reduced from 38 to 7 seconds.
mandel() is no longer the largest contributor to the
overall execution time; draw_fractal() is. However,
the program is still a bit too slow: without actually doing any drawing,
it takes approximately 0.6 seconds to generate one fractal image.
However, I can do better. Looking at the code, I notice that
draw_fractal() contains two embedded for loops, resulting to
all those calls to
mandel(). Remember how telling cython about integer types
used in loops sped up the code? This suggest that perhaps I should
do something similar and move some
of the code of draw_fractal() to the cython module.
Here's a modified viewer module.
# viewer2b.py
import pyximport
pyximport.install()
from mandel2d_cy import create_fractal
from viewer import Viewer
import time
import sys
if sys.version_info < (3,):
import Tkinter as tk
range = xrange
else:
import tkinter as tk
class FancyViewer(Viewer):
'''Application to display fractals'''
def draw_fractal(self):
'''draws a fractal on the canvas'''
self.calculating = True
begin = time.time()
# clear the canvas
self.canvas.create_rectangle(0, 0, self.canvas_width,
self.canvas_height, fill="white")
create_fractal(self.canvas_width, self.canvas_height,
self.min_x, self.min_y, self.pixel_size,
self.nb_iterations, self.canvas)
self.status.config(text="Time required = %.2f s [%s iterations] %s" %(
(time.time() - begin), self.nb_iterations,
self.zoom_info))
self.status2.config(text=self.info())
self.calculating = False
if __name__ == "__main__":
root = tk.Tk()
app = FancyViewer(root)
root.mainloop()
And here is the new cython module, without any additional type declaration.
# mandel2d_cy.pyx
# cython: profile=True
def mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
for i in range(0, max_iterations):
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= 4:
return False
return (z_real*z_real + z_imag*z_imag) < 4
def draw_pixel(x, y, canvas):
'''Simulates drawing a given pixel in black by drawing a black line
of length equal to one pixel.'''
return
#canvas.create_line(x, y, x+1, y, fill="black")
def create_fractal(canvas_width, canvas_height,
min_x, min_y, pixel_size,
nb_iterations, canvas):
for x in range(0, canvas_width):
real = min_x + x*pixel_size
for y in range(0, canvas_height):
imag = min_y + y*pixel_size
if mandel(real, imag, nb_iterations):
draw_pixel(x, canvas_height - y, canvas)
The profiling result is as follows:
3673815 function calls in 3.873 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 2.632 0.239 3.706 0.337 mandel2d_cy.pyx:24(create_fractal)
3168000 1.002 0.000 1.002 0.000 mandel2d_cy.pyx:4(mandel)
1 0.155 0.155 0.155 0.155 {_tkinter.create}
505173 0.072 0.000 0.072 0.000 mandel2d_cy.pyx:18(draw_pixel)
37 0.009 0.000 0.009 0.000 {built-in method call}
Simply by moving over some of the code to the cython module, I have reduced the profiling time to almost half of it previous value. Looking more closely at the profiling results, I also notice that calls to
mandel() now only appear once; some overhead in calling
cython functions from python modules has disappeared. Let's see what happens
if I now add some type information.
def create_fractal(int canvas_width, int canvas_height,
double min_x, double min_y, double pixel_size,
int nb_iterations, canvas):
cdef int x, y
cdef double real, imag
for x in range(0, canvas_width):
real = min_x + x*pixel_size
for y in range(0, canvas_height):
imag = min_y + y*pixel_size
if mandel(real, imag, nb_iterations):
draw_pixel(x, canvas_height - y, canvas)
The result is only slightly better:
3673815 function calls in 3.475 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 2.189 0.199 3.308 0.301 mandel2e_cy.pyx:24(create_fractal)
3168000 1.046 0.000 1.046 0.000 mandel2e_cy.pyx:4(mandel)
1 0.135 0.135 0.135 0.135 {_tkinter.create}
505173 0.074 0.000 0.074 0.000 mandel2e_cy.pyx:18(draw_pixel)
37 0.028 0.001 0.028 0.001 {built-in method call}
However, one thing I remember from the little I know about C it that, not only do variables have to be declared to be of a certain type, but the same has to be done to functions as well. Here,
mandel() has not been declared
to be of a specific type, so cython assumes it to be a generic
Python object. After reading the cython documentation, and noticing
that mandel() is only called from within the cython
module, I conclude that not only should I specify the type for
mandel() but that it probably makes sense to
specify that it can be "inlined"; I also
do the same for draw_pixel().
# mandel2f_cy.pyx
# cython: profile=True
cdef inline bint mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
for i in range(0, max_iterations):
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= 4:
return False
return (z_real*z_real + z_imag*z_imag) < 4
cdef inline void draw_pixel(x, y, canvas):
'''Simulates drawing a given pixel in black by drawing a black line
of length equal to one pixel.'''
return
#canvas.create_line(x, y, x+1, y, fill="black")
This yields a nice improvement.
3673815 function calls in 2.333 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 1.190 0.108 2.194 0.199 mandel2f_cy.pyx:24(create_fractal)
3168000 0.930 0.000 0.930 0.000 mandel2f_cy.pyx:4(mandel)
1 0.127 0.127 0.127 0.127 {_tkinter.create}
505173 0.074 0.000 0.074 0.000 mandel2f_cy.pyx:18(draw_pixel)
37 0.009 0.000 0.009 0.000 {built-in method call}
However... I asked cython to "inline"
mandel,
thus treating them as a pure
C function. Yet, they both appear in the Python profiling information, which
was not the case for abs() once I used cython for the
first time. The reason it appears is that cython has been instructed
to profile all functions in the module, via the directive at the top.
I can selectively turn off the profiling for an individual function
by importing the "cython module" and using a special purpose
decorator as follows.
# mandel2g_cy.pyx
# cython: profile=True
import cython
@cython.profile(False)
cdef inline bint mandel(double real, double imag, int max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
cdef double z_real = 0., z_imag = 0.
cdef int i
for i in range(0, max_iterations):
z_real, z_imag = ( z_real*z_real - z_imag*z_imag + real,
2*z_real*z_imag + imag )
if (z_real*z_real + z_imag*z_imag) >= 4:
return False
return (z_real*z_real + z_imag*z_imag) < 4
cdef inline void draw_pixel(x, y, canvas):
'''Simulates drawing a given pixel in black by drawing a black line
of length equal to one pixel.'''
return
#canvas.create_line(x, y, x+1, y, fill="black")
The result is even better than I would have expected!
505519 function calls in 0.817 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 0.605 0.055 0.676 0.061 mandel2g_cy.pyx:27(create_fractal)
1 0.128 0.128 0.128 0.128 {_tkinter.create}
504877 0.070 0.000 0.070 0.000 mandel2g_cy.pyx:21(draw_pixel)
37 0.010 0.000 0.010 0.000 {built-in method call}
11 0.001 0.000 0.678 0.062 viewer2b.py:20(draw_fractal)
From 85 seconds (at the beginning of this post) down to 0.8 seconds: a reduction by a factor of 100 ...thank you cython! :-)
However, increasing the number of iterations to 1000 (from the current value of 100 used for testing) does increase the time significantly.
495235 function calls in 3.872 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
11 3.653 0.332 3.723 0.338 mandel2g_cy.pyx:27(create_fractal)
1 0.136 0.136 0.136 0.136 {_tkinter.create}
494593 0.071 0.000 0.071 0.000 mandel2g_cy.pyx:21(draw_pixel)
37 0.009 0.000 0.009 0.000 {built-in method call}
11 0.001 0.000 3.726 0.339 viewer2b.py:20(draw_fractal)
It is probably a good time to put back the drawing to see what the overall time profile looks like in a more realistic situation.
5441165 function calls in 20.747 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
494604 8.682 0.000 14.427 0.000 Tkinter.py:2135(_create)
11 3.863 0.351 20.572 1.870 mandel2g_cy.pyx:27(create_fractal)
494632 2.043 0.000 5.326 0.000 Tkinter.py:1046(_options)
494657 1.845 0.000 2.861 0.000 Tkinter.py:77(_cnfmerge)
494593 1.548 0.000 16.709 0.000 mandel2g_cy.pyx:21(draw_pixel)
Clearly, the limiting time factor is now the Tkinter based drawing, and not the other code. It is time to think of a better drawing strategy. However, this will have to wait until next post.
Saturday, January 16, 2010
Profiling adventures [and cython]: basic profiling
In the previous blog post, I introduced a simple Tkinter-based viewer for the Mandelbrot set. As I mentioned at that time, the viewer was really too slow to be usable. In this post, I will start do some basic profiling and start looking for some strategies designed to make it faster.
The first rule for making an application faster is to do a proper profile rather than guessing. I make use of the profiler module, focusing on the main method (
The profile run will call
Clearly, running the profiler adds some overhead. I should also add that there are variations from run to run done with the profiler, caused by background activities. As a consequence, I normally run the profiler 3 times and focus on the fastest of the three runs; however I will not bother to do this here: I simply want to start by establishing some rough baseline to identify the main contributors to the relative lack of speed of this program.
It appears clear that the largest contributor to the overall execution time is
Also, since I want to establish a rough baseline, I should probably see what happens when I increase the number of iterations from 100 to 1000 for
Ouch! Close to 10 minutes of running time. However, it is clear that I have accomplished my goal of reducing the importance of Tkinter calls so that I can focus on my own code. Let's repeat this profiling test using Python 3.1.
We note that the total time taken is significantly less. Doing a comparison function by function, two significant differences appear: The built-in function
I also do a similar change to viewer1.py. From now on, except where otherwise noted, I will focus on using only Python 2.5. So, after doing this change, we can run the profiler one more time with the same number of iterations. The result is as follows:
This is now approximately the same as what we had for Python 3.1 as expected. Moving down on the list of time-consuming functions, we note that
Since a fair bit of time is spent inside
The result is worse than before, even though the total number of function calls has almost been cut in half! Actually, this should not come as a total surprise:
Clearly, I need a different strategy if we are to reduce significantly the execution time. It is time to introduce cython. However, this will have to wait until the next blog post!
The first rule for making an application faster is to do a proper profile rather than guessing. I make use of the profiler module, focusing on the main method (
draw_fractal()) which I wish to make faster, and paying a closer look only at the most time-consuming functions/methods.# profile1.py
import pstats
import cProfile
from viewer1 import tk, FancyViewer
def main():
root = tk.Tk()
app = FancyViewer(root)
app.nb_iterations = 100
for i in range(10):
app.draw_fractal()
if __name__ == "__main__":
cProfile.run("main()", "Profile.prof")
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats(10)
The profile run will call
draw_fractal() once, when app is created, with the number of iterations for mendel() set at 20 (the default) and then call it again 10 times with a larger number of iterations. Running the profiler adds some overhead. Based on the previous run with no profiles, I would have expected a profiler run to take approximately 75 seconds: a little over 4 seconds for the initial set up and slightly more than 7 seconds for each of the subsequent runs. Instead, what I observe is that69802205 function calls in 115.015 CPU seconds
Ordered by: internal time
List reduced from 60 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 59.908 0.000 81.192 0.000 mandel1a.py:3(mandel)
57908682 14.005 0.000 14.005 0.000 {abs}
11 13.387 1.217 114.484 10.408 viewer1.py:23(draw_fractal)
505184 10.950 0.000 17.672 0.000 Tkinter.py:2135(_create)
3168001 7.279 0.000 7.279 0.000 {range}
505212 2.359 0.000 6.220 0.000 Tkinter.py:1046(_options)
505237 2.169 0.000 3.389 0.000 Tkinter.py:77(_cnfmerge)
505173 1.457 0.000 19.902 0.000 viewer1.py:17(draw_pixel)
1010400 0.906 0.000 0.906 0.000 {method 'update' of 'dict' objects}
1010417 0.817 0.000 0.817 0.000 {_tkinter._flatten}
Clearly, running the profiler adds some overhead. I should also add that there are variations from run to run done with the profiler, caused by background activities. As a consequence, I normally run the profiler 3 times and focus on the fastest of the three runs; however I will not bother to do this here: I simply want to start by establishing some rough baseline to identify the main contributors to the relative lack of speed of this program.
It appears clear that the largest contributor to the overall execution time is
mandel(). Going down the lists of functions that contribute significantly to the overall time, I notice quite a few calls to Tkinter function/methods. So as to reduce the time to take a given profile, and to focus on mandel(), I will temporarily eliminate some Tkinter calls by changing draw_pixel() as follows.def draw_pixel(self, x, y):
'''Simulates drawing a given pixel in black by drawing a black line
of length equal to one pixel.'''
return
#self.canvas.create_line(x, y, x+1, y, fill="black")
Also, since I want to establish a rough baseline, I should probably see what happens when I increase the number of iterations from 100 to 1000 for
mandel(), which is what I expect to have to use in many cases to get accurate pictures. I do this first using Python 2.5465659765 function calls in 574.947 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 419.296 0.000 561.467 0.000 mandel1a.py:3(mandel)
458828552 100.464 0.000 100.464 0.000 {abs}
3168001 41.707 0.000 41.707 0.000 {range}
11 12.731 1.157 574.341 52.213 viewer1.py:23(draw_fractal)
1 0.596 0.596 0.596 0.596 {_tkinter.create}
494593 0.140 0.000 0.140 0.000 viewer1.py:17(draw_pixel)
37 0.010 0.000 0.010 0.000 {built-in method call}
11 0.000 0.000 0.001 0.000 Tkinter.py:2135(_create)
54 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
39 0.000 0.000 0.000 0.000 Tkinter.py:1046(_options)
Ouch! Close to 10 minutes of running time. However, it is clear that I have accomplished my goal of reducing the importance of Tkinter calls so that I can focus on my own code. Let's repeat this profiling test using Python 3.1.
462491974 function calls (462491973 primitive calls) in 506.148 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 386.169 0.000 493.823 0.000 mandel1a.py:3(mandel)
458828552 107.654 0.000 107.654 0.000 {built-in method abs}
11 11.474 1.043 505.439 45.949 viewer1.py:23(draw_fractal)
1 0.694 0.694 0.694 0.694 {built-in method create}
494593 0.140 0.000 0.140 0.000 viewer1.py:17(draw_pixel)
48 0.014 0.000 0.014 0.000 {method 'call' of 'tkapp' objects}
39 0.000 0.000 0.001 0.000 __init__.py:1032(_options)
64 0.000 0.000 0.001 0.000 __init__.py:66(_cnfmerge)
206 0.000 0.000 0.000 0.000 {built-in method isinstance}
2/1 0.000 0.000 506.148 506.148 {built-in method exec}
We note that the total time taken is significantly less. Doing a comparison function by function, two significant differences appear: The built-in function
abs is 7% slower with Python 3.1, which is a bit disappointing. On the other hand, range no longer appears as a function in Python 3.1; this appears to be the main contributor to the significant decrease in time when using Python 3.1 as compared with Python 2.5. This is easily understood: range in Python 3.1 does not create a list like it did in Python 2.x; it is rather like the old xrange. This suggest that I modify mandel1a.py to be as follows:# mandel1b.py
import sys
if sys.version_info < (3,):
range = xrange
def mandel(c):
'''determines if a point is in the Mandelbrot set based on deciding if,
after 20 iterations, the absolute value of the resulting number is
greater or equal to 2.'''
z = 0
for iter in range(0, 20):
z = z**2 + c
if abs(z) >= 2:
return False
return abs(z) < 2
I also do a similar change to viewer1.py. From now on, except where otherwise noted, I will focus on using only Python 2.5. So, after doing this change, we can run the profiler one more time with the same number of iterations. The result is as follows:
462491765 function calls in 503.926 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 391.297 0.000 491.642 0.000 mandel1b.py:7(mandel)
458828552 100.345 0.000 100.345 0.000 {abs}
11 11.409 1.037 503.189 45.744 viewer1.py:23(draw_fractal)
1 0.726 0.726 0.726 0.726 {_tkinter.create}
494593 0.136 0.000 0.136 0.000 viewer1.py:17(draw_pixel)
37 0.009 0.000 0.009 0.000 {built-in method call}
11 0.000 0.000 0.001 0.000 Tkinter.py:2135(_create)
54 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
39 0.000 0.000 0.000 0.000 Tkinter.py:1046(_options)
64 0.000 0.000 0.001 0.000 Tkinter.py:77(_cnfmerge)
This is now approximately the same as what we had for Python 3.1 as expected. Moving down on the list of time-consuming functions, we note that
abs appears to be another function we should look at. Let's first reduce the number of iterations inside mandel to 100, so that a profiling run does not take as long but that proper attention is still focuses on mandel as well as abs. Here's the result from a typical run to use as a new baseline:61582475 function calls in 81.998 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 55.347 0.000 69.054 0.000 mandel1b.py:7(mandel)
57908682 13.706 0.000 13.706 0.000 {abs}
11 11.591 1.054 80.773 7.343 viewer1.py:23(draw_fractal)
1 1.213 1.213 1.213 1.213 {_tkinter.create}
505173 0.125 0.000 0.125 0.000 viewer1.py:17(draw_pixel)
37 0.011 0.000 0.011 0.000 {built-in method call}
11 0.000 0.000 0.001 0.000 Tkinter.py:2135(_create)
4 0.000 0.000 0.000 0.000 {posix.stat}
3 0.000 0.000 0.000 0.000 Tkinter.py:1892(_setup)
39 0.000 0.000 0.000 0.000 Tkinter.py:1046(_options)
Since a fair bit of time is spent inside
abs(), perhIaps could speed things up by using another method. The way that we approximate the Mandlebrot set is by iterating over a number of time and checking if the absolute value of the complex number is greater than 2; if it is, then it can be proven that subsequent iterations will yield larger and larger values which means that the number we are considering is not in the Mandelbrot set. Since taking an absolute value of a complex number involves taking a square root, perhaps we can speed things up by not taking the square root. Let's implement this and try it out.# mandel1c.py
import sys
if sys.version_info < (3,):
range = xrange
def mandel(c, max_iterations=20):
'''determines if a point is in the Mandelbrot set based on deciding if,
after a maximum allowed number of iterations, the absolute value of
the resulting number is greater or equal to 2.'''
z = 0
for iter in range(0, max_iterations):
z = z**2 + c
z_sq = z.real**2 + z.imag**2
if z_sq >= 4:
return False
return z_sq < 4
3673150 function calls in 84.807 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3168000 73.210 0.000 73.210 0.000 mandel1c.py:7(mandel)
11 10.855 0.987 84.204 7.655 viewer1.py:23(draw_fractal)
1 0.593 0.593 0.593 0.593 {_tkinter.create}
504530 0.137 0.000 0.137 0.000 viewer1.py:17(draw_pixel)
37 0.009 0.000 0.009 0.000 {built-in method call}
11 0.000 0.000 0.001 0.000 Tkinter.py:2135(_create)
54 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
39 0.000 0.000 0.000 0.000 Tkinter.py:1046(_options)
64 0.000 0.000 0.001 0.000 Tkinter.py:77(_cnfmerge)
22 0.000 0.000 0.001 0.000 Tkinter.py:1172(_configure)
The result is worse than before, even though the total number of function calls has almost been cut in half! Actually, this should not come as a total surprise:
abs is a Python built-in function, which has been already optimized in C. Extracting the real and imaginary parts explictly like we have done is bound to be a time-consuming operation when performed in pure Python as opposed to C. At this point, we might be tempted to convert complex numbers everywhere into pairs of real numbers so as to reduce the overhead of dealing with complex numbers ... but this would not have any significant effect on the overall time. [Those curious may want to try ... I've done it and it's not worth reporting in details.] Clearly, I need a different strategy if we are to reduce significantly the execution time. It is time to introduce cython. However, this will have to wait until the next blog post!
Subscribe to:
Posts (Atom)



