Friday, January 15, 2010

Profiling adventures and cython - setting the stage

Summary This post is the first in a series dedicated to
examining the use of profiling and, eventually, using cython, as a
means to improve greatly the speed of an application. The intended
audience is for programmers who have never done any profiling and/or
never used cython before. Note that we will not make use of cython
until the third post in this series.


Preamble

Python is a great multi-purpose language which is really fun to use. However, it is sometimes too slow for some applications. Since I only program for fun, I had never really faced a situation where I found Python's speed to be truly a limiting factor - at least, not until a few weeks ago when I did some exploration of a  four-colouring grid problem I talked about. I started exploring ways to speed things up using only Python and trying to come up with different algorithms, but every one I tried was just too slow. So, I decided it was time to take the plunge and do something different. After considering various alternatives, like using shedskin or
attempting to write a C extension (which I found too daunting since I don't know C), I decided to try to use  cython.

cython, for those that don't know it, is a Python look-alike language that claims
to make writing C extensions for the Python language
as easy as Python itself.

After looking at a few examples on the web, I concluded that such a rather bold statement might very well be true and that it was worthwhile trying it out on a more complex example. Furthermore, I thought it might be of interest to record what I've done in a series of blog posts, as a more detailed example than what I had found so far on the web. As I was wondering if an esoteric problem like the four-colouring grid challenge mentioned previously was a good candidate to use as an example, by sheer serendipity, I came accross a link on reddit by a new programmer about his simple Mandelbrot viewer.

Who does not like fractals? ... Furthermore, I have never written a fractal viewer. This seemed like a good time to write one. So, my goal at the end of this series of posts, is to have a "nice" (for some definition of "nice") fractal viewer that is fast enough for explorations of the Mandelbrot set.  In addition, in order to make it easy for anyone having Python on their system to follow along and try their own variation, I decided to stick by the following constraints:
  • With the exception of cython, I will only use modules found in the
    standard library. This means using Tkinter for the GUI.
  • The program should work using Python 2.5+ (including Python 3).

So, without further ado, and based on the example found on the reddit link I mentioned, here's a very basic fractal viewer that can be used as a starting point.


''' mandel1.py

Mandelbrot set drawn in black and white.'''

import time

import sys
if sys.version_info < (3,):
    import Tkinter as tk
else:
    import tkinter as tk

def mandel(c):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after 20 iterations, the absolute value of the resulting number is
       greater or equal to 2.'''
    z = 0
    for iter in range(0, 20):
        z = z**2 + c
        if abs(z) >= 2:
            return False
    return abs(z) < 2

class Viewer(object):
    def __init__(self, parent, width=500, height=500):
        self.canvas = tk.Canvas(parent, width=width, height=height)
        self.width = width
        self.height = height

        # the following "shift" variables are used to center the drawing
        self.shift_x = 0.5*self.width
        self.shift_y = 0.5*self.height
        self.scale = 0.01

        self.canvas.pack()
        self.draw_fractal()

    def draw_pixel(self, x, y):
        '''Simulates drawing a given pixel in black by drawing a black line
           of length equal to one pixel.'''
        self.canvas.create_line(x, y, x+1, y, fill="black")

    def draw_fractal(self):
        '''draws a fractal picture'''
        begin = time.time()
        print("Inside draw_fractal.")
        for x in range(0, self.width):
            real = (x - self.shift_x)*self.scale
            for y in range(0, self.height):
                imag = (y - self.shift_y)*self.scale
                c = complex(real, imag)
                if mandel(c):
                    self.draw_pixel(x, self.height - y)
        print("Time taken for calculating and drawing = %s" %
                                                (time.time() - begin))

if __name__ == "__main__":
    print("Starting...")
    root = tk.Tk()
    app = Viewer(root)
    root.mainloop()

At this point, perhaps a few comments about the program might be useful
  1. I have tried to write the code in the most straightforward and pythonic way, with no thought given to making calculations fast. It should be remembered that this is just a starting point: first we make it work, then, if needed, we make it fast.
  2. The function mandel() is the simplest translation of the Mandelbrot fractal iteration into Python code that I could come up with. The fact that Python has a built-in complex type makes it very easy to implement the standard Mandelbrot set algorithm.
  3. I have taken the maximum number of iterations inside mandel() to be 20, the same value used in the post I mentioned before. According to the very simple method used to time the application, it takes about 2 seconds on my computer to draw a simple picture. This is annoying slow. Furthermore, by looking at the resulting picture, and trying out with different number of iterations in mandel(), it is clear that 20 iterations is not sufficient to adaquately represent the Mandelbrot set; this is especially noticeable when exploring smaller regions of the complex plane. A more realistic value might be to take 100 if not 1000 iterations which takes too long to be practical.
  4. Tkinter's canvas does not have a method to set the colour of individual pixels. We can simulate such a method by drawing a line (for which there is a primitive method) of length 1.
  5. The screen vertical coordinates ("y") increase in values from the top towards the bottom, in opposite direction to the usual vertical coordinates in the complex plane. While the picture produced is vertically symmetric about the x-axis, I nonetheless wrote the code so that the inversion of direction was properly handled.
This basic application is not really useful as a tool for exploring the Mandelbrot set, as the region of the complex plane it displays is fixed. However, it is useful to start with something simple like this as a first prototype. Once we know it is working we can move on to a better second version. So, let's write a fancier fractal viewer following the outline below:

class Viewer(object):
    '''Base class viewer to display fractals'''

        # The viewer should be able to enlarge ("zoom in") various regions
        # of the complex plane.  I will implement this
        # using keyboard shortcuts.
        #
        self.parent.bind("+", self.zoom_in)
        self.parent.bind("-", self.zoom_out)
    def zoom_in(self, event):
    def zoom_out(self, event):
    def change_scale(self, scale):

        #  Since one might want to "zoom in" quickly in some regions,
        # and then be able to do finer scale adjustments,
        # I will use keyboard shortcuts to enable switching back
        # and forth between two possible zooming mode.
        # A better application might give the user more control
        # over the zooming scale.
        self.parent.bind("n", self.normal_zoom)
        self.parent.bind("b", self.bigger_zoom)
    def normal_zoom(self, event, scale=1):
    def bigger_zoom(self, event):

        # Set the maximum number of iterations via a keyboard-triggered event
        self.parent.bind("i", self.set_max_iter)
    def set_max_iter(self, event):

        # Like what is done with google maps and other
        # similar applications, we should be able to move the image
        # to look at various regions of interest in the complex plane.
        # I will implement this using mouse controls.
        self.parent.bind("<button-1>", self.mouse_down)
        self.parent.bind("<button1-motion>", self.mouse_motion)
        self.parent.bind("<button1-buttonrelease>", self.mouse_up)
    def mouse_down(self, event):
    def mouse_motion(self, event):
    def mouse_up(self, event):

        # Presuming that "nice pictures" will be eventually produced,
        # and that it might be desired to reproduce them,
        # I will include some information about the region of the
        # complex plane currently displayed.
    def info(self):
        '''information about fractal location'''
  • Furthermore, while I plan to use proper profiling tools, I will nonetheless display some basic timing information as part of the GUI as a quick evaluation
    of the speed of the application.
  • Finally, since I expect that both the function mandel() and the drawing method draw_fractal to be the speed-limiting factors, I will leave them out of the fractal viewer and work on them separately. If it turns out that the profiling information obtained indicates otherwise, I will revisit this hypothesis.
Here is a second prototype for my fractal viewer, having the features described above.

''' viewer.py

Base class viewer for fractals.'''

import sys
if sys.version_info < (3,):
    import Tkinter as tk
    import tkSimpleDialog as tk_dialog
else:
    import tkinter as tk
    from tkinter import simpledialog as tk_dialog

class Viewer(object):
    '''Base class viewer to display fractals'''

    def __init__(self, parent, width=600, height=480,
                 min_x=-2.5, min_y=-1.5, max_x=1.):

        self.parent = parent
        self.canvas_width = width
        self.canvas_height = height

        # The following are drawing boundaries in the complex plane
        self.min_x = min_x
        self.min_y = min_y
        self.max_x = max_x
        self.calculate_pixel_size()
        self.max_y = self.min_y + self.canvas_height*self.pixel_size

        self.calculating = False
        self.nb_iterations = 20
        self.normal_zoom(None)

        self.canvas = tk.Canvas(parent, width=width, height=height)
        self.canvas.pack()
        self.status = tk.Label(self.parent, text="", bd=1, relief=tk.SUNKEN,
                               anchor=tk.W)
        self.status.pack(side=tk.BOTTOM, fill=tk.X)
        self.status2 = tk.Label(self.parent, text=self.info(), bd=1,
                                relief=tk.SUNKEN, anchor=tk.W)
        self.status2.pack(side=tk.BOTTOM, fill=tk.X)

        # We change the size of the image using the keyboard.
        self.parent.bind("+", self.zoom_in)
        self.parent.bind("-", self.zoom_out)
        self.parent.bind("n", self.normal_zoom)
        self.parent.bind("b", self.bigger_zoom)

        # Set the maximum number of iterations via a keyboard-triggered event
        self.parent.bind("i", self.set_max_iter)

        # We move the canvas using the mouse.
        self.translation_line = None
        self.parent.bind("<button-1>", self.mouse_down)
        self.parent.bind("<button1-motion>", self.mouse_motion)
        self.parent.bind("<button1-buttonrelease>", self.mouse_up)

        self.draw_fractal()  # Needs to be implemented by subclass

    def info(self):
        '''information about fractal location'''
        return "Location: (%f, %f) to (%f, %f)" %(self.min_x, self.min_y,
                                                  self.max_x, self.max_y)

    def calculate_pixel_size(self):
        '''Calculates the size of a (square) pixel in complex plane
        coordinates based on the canvas_width.'''
        self.pixel_size = 1.*(self.max_x - self.min_x)/self.canvas_width
        return

    def mouse_down(self, event):
        '''records the x and y positions of the mouse when the left button
           is clicked.'''
        self.start_x = self.canvas.canvasx(event.x)
        self.start_y = self.canvas.canvasy(event.y)

    def mouse_motion(self, event):
        '''keep track of the mouse motion by drawing a line from its
           starting point to the current point.'''
        x = self.canvas.canvasx(event.x)
        y = self.canvas.canvasy(event.y)

        if (self.start_x != event.x)  and (self.start_y != event.y) :
            self.canvas.delete(self.translation_line)
            self.translation_line = self.canvas.create_line(
                                self.start_x, self.start_y, x, y, fill="orange")
            self.canvas.update_idletasks()

    def mouse_up(self, event):
        '''Moves the canvas based on the mouse motion'''
        dx = (self.start_x - event.x)*self.pixel_size
        dy = (self.start_y - event.y)*self.pixel_size
        self.min_x += dx
        self.max_x += dx
        # y-coordinate in complex plane run in opposite direction from
        # screen coordinates
        self.min_y -= dy
        self.max_y -= dy
        self.canvas.delete(self.translation_line)
        self.status.config(text="Moving the fractal.  Please wait.")
        self.status.update_idletasks()
        self.status2.config(text=self.info())
        self.status2.update_idletasks()
        self.draw_fractal()

    def normal_zoom(self, event, scale=1):
        '''Sets the zooming in/out scale to its normal value'''
        if scale==1:
            self.zoom_info = "[normal zoom]"
        else:
            self.zoom_info = "[faster zoom]"
        if event is not None:
            self.status.config(text=self.zoom_info)
            self.status.update_idletasks()
        self.zoom_in_scale = 0.1
        self.zoom_out_scale = -0.125

    def bigger_zoom(self, event):
        '''Increases the zooming in/out scale from its normal value'''
        self.normal_zoom(event, scale=3)
        self.zoom_in_scale = 0.3
        self.zoom_out_scale = -0.4

    def zoom_in(self, event):
        '''decreases the size of the region of the complex plane displayed'''
        if self.calculating:
            return
        self.status.config(text="Zooming in.  Please wait.")
        self.status.update_idletasks()
        self.change_scale(self.zoom_in_scale)

    def zoom_out(self, event):
        '''increases the size of the region of the complex plane displayed'''
        if self.calculating:
            return
        self.status.config(text="Zooming out.  Please wait.")
        self.status.update_idletasks()
        self.change_scale(self.zoom_out_scale)

    def change_scale(self, scale):
        '''changes the size of the region of the complex plane displayed and
           redraws'''
        if self.calculating:
            return
        dx = (self.max_x - self.min_x)*scale
        dy = (self.max_y - self.min_y)*scale
        self.min_x += dx
        self.max_x -= dx
        self.min_y += dy
        self.max_y -= dy
        self.calculate_pixel_size()
        self.draw_fractal()

    def set_max_iter(self, event):
        '''set maximum number of iterations'''
        i = tk_dialog.askinteger('title', 'prompt')
        if i is not None:
            self.nb_iterations = i
            self.status.config(text="Redrawing.  Please wait.")
            self.status.update_idletasks()
            self.draw_fractal()

    def draw_fractal(self):
        '''draws a fractal on the canvas'''
        raise NotImplementedError

I move the Mandelbrot set calculation in a separate file.

# mandel1a.py

def mandel(c, max_iterations=20):
    '''determines if a point is in the Mandelbrot set based on deciding if,
       after a maximum allowed number of iterations, the absolute value of
       the resulting number is greater or equal to 2.'''
    z = 0
    for iter in range(0, max_iterations):
        z = z**2 + c
        if abs(z) >= 2:
            return False
    return abs(z) < 2

And, finally, I implement the missing functions for the viewer in a new main application.


# viewer1.py

from mandel1a import mandel
from viewer import Viewer
import time

import sys
if sys.version_info &lt; (3,):
    import Tkinter as tk
    range = xrange
else:
    import tkinter as tk

class FancyViewer(Viewer):
    '''Application to display fractals'''

    def draw_pixel(self, x, y):
        '''Simulates drawing a given pixel in black by drawing a black line
           of length equal to one pixel.'''
        self.canvas.create_line(x, y, x+1, y, fill="black")

    def draw_fractal(self):
        '''draws a fractal on the canvas'''
        self.calculating = True
        begin = time.time()
        # clear the canvas
        self.canvas.create_rectangle(0, 0, self.canvas_width,
                                    self.canvas_height, fill="white")
        for x in range(0, self.canvas_width):
            real = self.min_x + x*self.pixel_size
            for y in range(0, self.canvas_height):
                imag = self.min_y + y*self.pixel_size
                c = complex(real, imag)
                if mandel(c, self.nb_iterations):
                    self.draw_pixel(x, self.canvas_height - y)
        self.status.config(text="Time required = %.2f s  [%s iterations]  %s" %(
                                (time.time() - begin), self.nb_iterations,
                                                                self.zoom_info))
        self.status2.config(text=self.info())
        self.status2.update_idletasks()
        self.calculating = False

if __name__ == "__main__":
    root = tk.Tk()
    app = FancyViewer(root)
    root.mainloop()


Let me conclude with few black and white pictures obtained using this program, which, if you look at the time, highlight the need for something faster. First for 20 iterations, drawn in 4 seconds.




Then, for 100 interations - better image, but 7 seconds to draw...





Next post, I'll start profiling the application and make it faster.

Sunday, January 10, 2010

Python + cython: faster than C?

[Note added on January 15, 2013: I am amazed that this post, clearly written tongue-in-cheek 3 years ago, can still spur people into doing their own test, and feeling the urge to comment.]

Now that I have your attention... ;-)

I've been playing around with cython and will write more about it soon.  But I could not resist doing a quick test when I read this post, comparing C and Java on some toy micro benchmark.

First, the result:

andre$ gcc -o fib -O3 fib.c
andre$ time ./fib
433494437

real    0m3.765s
user    0m3.463s
sys     0m0.028s

andre$ time python fib_test.py
433494437

real    0m2.953s
user    0m2.452s
sys     0m0.380s

Yes, Python+cython is faster than C!  :-)

Now, the code.  First, the C program taken straight from this post, except for a different, more meaningful value for "num" ;-)

#include 

double fib(int num)
{
   if (num <= 1)
       return 1;
   else
       return fib(num-1) + fib(num-2);
 }
 int main(void)
{
     printf("%.0f\n", fib(42));
    return 0;
 }

Next, the cython code ...

# fib.pyx

cdef inline int fib(int num):
    if (num <= 1):
        return 1
    else:
        return fib(num-1) + fib(num-2)

def fib_import (int num):
    return fib(num)

... and the Python file that calls it

# fib_test.py

import pyximport
pyximport.install()

from fib import fib_import

print fib_import(42)


  1. I know, I declared fib() to be of type "double" in the C code (like what was done in the original post) and "int" in the cython code; however, if I declare it to be of type "int" in the C code, I get -0 as the answer instead of 433494437.  I could declare it to be of type "unsigned int" ... but even when I did that, the Python code was marginally faster.
  2. If I declared fib to be of type "double" in the cython code, it was slightly slower than the corresponding C code.  However, what Python user would ever think of an integer variable as a floating point number! ;-)
  3. What is most impressive is that the cython code is NOT pre-compiled, unlike the C code.  However, to be fair ... I did run it a few times and took the best result, and it was always slower the first time.
  4. Yes, it is silly to compare such micro-benchmarks ... but it can be fun, no? ;-)
More seriously: using cython is a relatively painless way to significantly increase the speed of Python applications ... without having to learn a totally different language.

Saturday, January 02, 2010

Rur-ple is alive again!


Thanks to some wonderful work by some developers who joined the project, rur-ple is moving forward again. :-)  Its new home also features a brand new logo for it displayed above.

Thanks in particular go to Frederic Muller who is moving things along rather nicely.

Friday, December 25, 2009

The 17x17 challenge: setting up the grid

In the previous post, I talked about the 17x17 challenge, but did not provide any code.  Today, I will describe the code I use to represent a given grid.  Consider the following grid where A and b are used to represent two different colours:
AbAb
bAbA
bbAA
bAAA
This grid has two rectangles, one of which is emphasized below
....
.A.A
....
.A.A
Finding such a rectangle in a small grid is fairly easy to do.  A naive, but very inefficient way to do this is to have a function like the following:
def find_rect_candidates(row_1, row_2):
   points_in_common = [0, 0, ...] # one per colour
    for i, point in enumerate(row_1):
       if row_1[i] == row_2[i]:
          points_in_common[row_1[i]] += 1
   return points_in_common
where row_x[i] is set to a given colour, represented by an integer (0, 1, 2, ...). For an NxN grid, the first loop is of order N.  Note that this does not tell us (yet) if we have found a rectangle; we still have to loop over points_in_common and see if any entry is greater than 1.

A better way to do the comparison, which does not grow with N, is mentioned in this post and is based on the following observation:  for a given row, at a given point, a given colour is either present (True or 1) or not (False or 0). Thus, a given colour distribution on a single row can be represented as a string of 0s and 1s ... which can be thought of as the binary representation of an integer.  For example, the row containing the colour "A" in the pattern ".....A..A" can have this pattern represented by the number 9="1001".  Consider another row represented by the number 6="110". These two rows have no points in common (for this colour) and hence do not form a rectangle.  If we do a bitwise "and" for these two rows, i.e. 9&6 we will get zero.  This is achieved by a single operation instead of a series of comparisons.

What happens if two rows have a single point in common (for a given colour) so that no rectangle is present?  For example, consider 9="1001" and 15="1110".  If we do a bitwise "and" we have

answer = 9 & 15 = 8 = "1000"

Taking the bitwise "and" of answer and answer-1, we get

answer & (answer-1) = 8 & 7 = "1000" & "111" = 0.

If we have two points (1 bit) in common, it is easy to see that the bitwise comparison of answer & (answer-1) will not give zero.

Going back to the function above, we could rewrite it instead as follows:

def find_rect_candidates(row_1, row_2, colours):
   points_in_common = [0, 0, ...] # one per colour
      for c in colours:
         points_in_common[c] = row_1[c] & row_2[c]
   return points_in_common


where we still have to do the same (N-independent) processing of the return value as with the function above to determine if we do have rectangles.

Now, without further ado, here is the basic code that I use to represent a grid, together with two utility functions.

def rectangles_for_two_rows(row_1, row_2):
    '''returns 0 if two rows (given colour, encoded as a bit string)
        do not form a rectangle -
        otherwise returns the points in common encoded as a bit string'''
    intersect = row_1 & row_2
    if not intersect:   # no point in common
        return 0
    # perhaps there is only one point in common of the form 00010000...
    if not(intersect & (intersect-1)):
        return 0
    else:
        return intersect

def count_bits(x):
    '''counts the number of bits
    see http://stackoverflow.com/questions/407587/python-set-bits-count-popcount
    for reference and other alternative'''
    return bin(x).count('1')

class AbstractGrid(object):
    def __init__(self, nb_colours, max_rows, max_columns):
        self.nb_colours = nb_colours
        self.colours = list(range(nb_colours))
        self.max_rows = max_rows
        self.max_columns = max_columns
        self.powers_of_two = [2**i for i in range(max_columns)]
        self.grid = {}

    def initialise(self):
        '''initialise a grid according to some strategy'''
        raise NotImplemented

    def print_grid(self):
        '''prints a representation of the grid.
        Used for diagnostic only - no need to optimize further.'''
        for row in self.grid:
            row_rep = []
            for column in range(self.max_columns-1, -1, -1):
                for colour_ in self.colours:
                    if self.powers_of_two[column] & self.grid[row][colour_]:
                        row_rep.append(str(colour_))
            print("".join(row_rep))

    def identify_intersect_points(self):
        '''identifies the dots that contribute to forming rectangles'''
        # possibly could cut down the calculation time by only computing
        # for colours that have changed...
        nb_intersect = [0 for colour in self.colours]
        intersect_info = []
        for colour in self.colours:
            for row in self.grid:
                for other_row in range(row+1, self.max_rows):
                    intersect = rectangles_for_two_rows(
                                            self.grid[row][colour],
                                            self.grid[other_row][colour])
                    if intersect != 0:
                        nb_intersect[colour] += count_bits(intersect)
                        intersect_info.append([colour, row, other_row, intersect])
        return nb_intersect, intersect_info

The actual code I use has a few additional methods introduced for convenience.  If anyone can find a better method to identify intersection points between rows (from which rectangles can be formed), I would be very interested to hear about it.

The 17x17 challenge

In a recent blog post, William Gasarch issued a challenge: find a 17x17 four-colour grid for which there are no rectangles with 4 corners of the same colour.  If you can do this, Gasarch will give you $289.  For a more detailed explanation, you can either read Gasarch's post, or this post which contains a slightly friendlier explanation of the problem.

This problem, which is fairly easy to state, is extremely hard to solve as the number of possible grid configurations is 4289 which is a number much too large to solve by random searches or by naive methods.  As can be seen from the comments of the posts mentioned above, many people have devoted a lot of cpu cycles in an attempt to find a solution, but with no success.  Like others, I have tried to write programs to solve it ... but with no luck.  In a future post, I may write about the various strategies I have implemented in Python in an attempt to solve this problem.

Gasarch mentions that it is not known if a solution can be found for a 17x18 four-colour grid or even an 18x18 grid or a 17x19 one.  (Note that because of symmetry, an MxN solution can be rotated to give an NxM solution.)  While the number of configurations is large, the larger the grid is, I believe I have found a proof demonstrating that no solution can be found for a 17x18 grid.  This proof builds on the work of Elizabeth Kupin who has found the largest "colour class" i.e. the number of single-colour points on a 17x17 grid which is rectangle-free.  A generic solution found by Kupin is reproduced below:



Consider a 17x18 grid (17 rows and 18 columns).  Such a grid has 306 points.  Using the pigeonhole principle, one can easily show that, for any colouring, at least one colour must cover 77 points.  (Indeed, the most symmetric colouring would be 77, 77, 76, 76.)  Also, if a 17x18 rectangle-free solution exist, it must contain a 17x17 rectangle-free subset, which we take to be the above solution (other solutions for the 17x17 grid can be derived from this one using interchanges of rows and/or columns).

Let us attempt to add points in an additional column. First, we can try to add points in a row with 4 elements.  Without loss of generality, we can take this row to be the first one (row A).  Once we do this, we can not add any more points to row 3-17 without creating a rectangle.  The only row to which we can add a point is the second (row B) bringing the total number of points to 76 - one short of what we need.

Perhaps we can first remove a point from the 17x17 solution and then add a new column.  There are three cases we must consider: 1) that of a point belonging to a row with 5 points and a column with 5 points; 2) that of a point belonging to a row with 5 points and a column with 4 points; 3) that of a point belonging to a row with 4 points and a column with 5 points.

Case 1) Without loss of generality, let us move the point on row F in the first column to a new additional column, keeping the number of points at 74.  It is easy to show that the only rows to which we can then add an additional point without creating a rectangle are rows A, C, D, E. Once we add such a point (say on row A), we can no longer add a point on any of the remaining rows (C, D, E) without creating a rectangle.

Case 2) Without loss of generality, let us move the left-most point on row Q to a new column.  The only row to which we can then add a point in this new column is row A, bringing the total to 75.  We can't add another point without creating a rectangle.

Case 3) Again, without loss of generality, let us remove the top-left element (A) and move it to a new column added to the above solution.  We can add one more point, bringing the total to 75, to that new column (in row B) without creating a rectangle; any other addition of a point on that new column will result in a rectangle.


This concludes the sketch of the proof. 

UPDATE: Instead of adding a column, we can add a row (R) with 3 points located in the 9th, 12th and 17th column, bringing the total to 77 points and keeping the grid rectangle-free.  So the 17x18 case is still open... but I can't see how one could add a row with an extra 4 points to build a "single-colour" rectangle-free 17x19 grid.  (However, I won't venture again to say it is a proof until I have looked at all cases exhaustively.)

Note that construction of solutions for a 17x17 grid such as those found by Kupin, even for a single colours, can possibly be used to restrict the search space for the more general problem and help find a solution.  Unfortunately, no one has been able to do this (yet).

Sunday, November 15, 2009

Google Wave Fun

After a long time waiting, I finally got an Google Wave invitation this week - just as I was starting to look forward to a work slowdown so I could start programming again in the evenings.  The call of playing with a new toy was enough motivation to overcome inertia and spend a few hours, here and there in the evening, in order to create a new "gadget".  Google Wave Gadgets are actually fairly easy to create.  The only downside (from my point of view) is that they have to be written using javascript instead of Python.  Since this blog is called "Only Python", you may wonder how a post about a Google Wave Gadget could possibly belong here.  Believe me, it does.

A few years ago, in order to learn programming in Python, I created a "Karel the Robot" application in Python, to teach/learn Python, called rur-ple.  rur-ple was actually inspired by Guido van Robot (or GvR) which uses a Python like language.

I thought it would be a good idea to have something like rur-ple as a wave gadget.  Rather than re-inventing the wheel, I contacted Paul Carduner, who wrote gvr-online, and suggested that we could collaborate on a gadget - and he agreed.  The end goal is to have something like gvr-online as a gadget, but one which would recognize both the traditional GvR syntax OR the Python syntax used with rur-ple, as a choice for the end user.

The idea of having something like gvr-online as a wave gadget is the following: a teacher can embed such a gadget in a wave, with a predefined world, and assign it as a problem to either one, or more students - who could collaborate in that wave. All wave participants, including the teacher, can thus contribute and help each other.

Having had the idea, I wrote what was basically an empty shell of a gadget which Paul quickly connected to the existing code for gvr-online.  Some minor work remained for me to do to enable shared states and, apart from a minor UI bug, the prototype is now working.  What's left to do includes incorporation the Python syntax as an alternative to the traditional GvR syntax.

For those that might be interested in such a gadget, I suggest you first check gvr-online itself to get a better idea of what it does.  Then, if you have a Google Wave account, you can embed the gadget in a wave using this link.  Note that this is our "work in progress" link, that should not be expected to be a permanent one.

Sunday, September 13, 2009

Live preview of reStructuredText to HTML conversion in your browser

I've implemented a new plugin for Crunchy which may be of interest to some: a reStructuredText editor with live preview inside the browser window.  Each time a key is pressed in the editor (html textarea), an ajax request is sent to the Python backend with the entire content of the editor.  This is processed by docutils with the result sent back to the browser as a full html page displayed inside the browser as illustrated below.

I expected this to be slow (make that very slow) and for the first prototype I wrote, I only sent the ajax request when "enter" was pressed.  After playing with it for a while, I realized that there was no need for this.

If you type fast enough, the previewer does not keep up for each key press, but this does not hinder in any way the speed at which you can enter text in the editor.  By the way, the editor (html textarea) can be converted into a true embedded editor which can save and load files, etc. - otherwise, it would not be a very useful tool for working with reStructuredText documents.  When doing so, the live preview is turned off.  I've also included the possibility to save the html file produced so that one does not have to invoke docutils separately.  And this also incorporates the two custom directives (crunchy and getpythonsource) used by Crunchy and crst2s5.

This is included in the newest release (1.1.2) of Crunchy.

Wednesday, September 02, 2009

Upcoming Pycon and crst2s5

In a few months, many pythonistas will be converging to Atlanta to take part in the next Pycon.  I'm hoping to be able to attend it as I have been really inspired by the many people I met in two previous conferences.  While I most likely won't be presenting anything this time, I thought I would try to contribute to some presentations in an indirect way.

A while ago, Ned Batchelder wrote about presentation tools - more specifically about how he found all of them lacking.  In the comments section, many people made various suggestions - including one suggestion by Steve Holden which read as follows:

Crunchy? http://code.google.com/p/crunchy/

HTML plus live Python - what could be better?
While I, of course, approve of any positive support of Crunchy, I had to admit to myself (and mention in a subsequent comment) that Crunchy was really not up to the task - at least not yet.

In his post, Ned addresses some of the weaknesses of the tools he looked at.  In particular, he mentioned S5 about which he wrote:

S5 (A Simple Standards-Based Slide Show System) is all HTML, CSS, and Javascript, runs in the browser, and was created by Eric Meyer, a very nice pedigree. Remarkably, the thing I like least about it is that when I display the slides in a full-screen Firefox, they look horrible. The text is either too small or too large, and the line-spacing too tight, while the slide title overlaps the text in the wrong way, exposing some of the structure of the divs.
I would have hoped that a CSS-based slideshow by the king of CSS would be a shining example of how information could be cleanly authored and then sparklingly displayed. S5 seems to miss this mark, especially since there don't seem to be many themes available for it, another surprise given how CSS should have made it accessible to lots of designers. Also, although (or perhaps because) the format is native to the web, it's not possible to get the slides as illustrations.
Still, S5 is used for quite a few presentations by pythonistas thank to rst2s5, the docutils script that transform reStructuredText files into S5 presentations.  If I can summarize, rst2s5 strengths are as follow:

  • uses reStructuredText as source - easier to write than html
  • produce html files - thus easy to post and share -see link above.
  • use well known S5 slides (version 1.1 designed by Eric Meyer) to switch between "full paper"/slides.  It is possible to include extra notes in the "full paper".
Some of the weaknesses are the following:
  • From rst2s5's own documentation, it is impossible to have:
     - speaker's notes on the laptop screen
     - slides on the projector
     - two views stay in sync
     - presentation controlled from either display
  • like most (all?) slide-based presentation, content on one slide is limited to a given number of lines ... which makes it ackward to show long-ish code samples.
  • as Ned Batchelder alludes to, the automatic scaling done by S5 often yields unsatisfactory results in terms of line spacing, etc.
  • Finally, no interactive Python code in presentation ;-)

rst2s5 is based on S5 version 1.1.  A new version (1.2 alpha) is available which addresses some of the shortcomings of version 1.1 with the addition of speaker's notes which are in a separate window (e.g. on the laptop screen) while the slides themselves are in a different window (which can be on the projector) with the two synchronized.

I have decided to hack at the source (I don't like javascript...) and adapt S5 1.2 in ways which address some of Ned Batchelder's criticisms about the line spacing, etc, by disabling the automatic font scaling.  I incorporated it in a new program called crst2s5 which can be thought of as a Crunchy-extension to rst2s5.  crst2s5 on its own can produce standalone presentations (like rst2s5).  I consider it to be an improvement to rst2s5 ... time will tell if others share my opinion.

However, where crst2s5 can yield really interesting results is when you view the html output with Crunchy.  For now (version 1.1), when you do so, the automatic synchronization of speaker notes and slides is disabled - this will be addressed hopefully in the near future.   You can either try it by downloading the code or have a look at this sneak preview which I put together in one quick take as an experiment:



Get the Flash Player to see this movie.


Original location: 'Crunchy'
at ShowMeDo from the Python category.

Now, if only I had better graphical skills to design good looking slides...

Sunday, August 23, 2009

New plugin for Crunchy ... and bug fixes

Crunchy has a new plugin: getsource. What it does is enable a tutorial writer to embed a "link" to a python module inside an html file, or a class within that module, or a function or method, and have the source code being extracted by the inspect module and inserted within the html page.

Crunchy being Crunchy, it can also embed an interpreter or an editor right below the code source so that a user can interact with it.

And since not everyone likes to write documentation using straight html, a custom docutils directive is supported so that it works from reStructuredText files too.

The docutils directive looks as follows:

..getsource:: relative/path/to/module[.function] [linenumber] [editor or interpreter]

with a similar syntax for html files.

This plugin (which still has some minor bugs) is included in release 1.0.1 of Crunchy, which contains other minor bug fixes (as compared with release 1.0).

Crunchy 1.0 released!!

Crunchy 1.0 has finally been released. :-)

Crunchy 1.0 is compatible with Python 2.4, 2.5, 2.6 ... and 3.1. It is also compatible with Jython 2.5 modulo some bugs when trying to work with examples containing unicode strings.

Crunchy, for those that are not familiar with it, is a one-of-a-kind application. It is an application designed to transforms otherwise static html Python tutorials into interactive session viewed within a browser. Currently Crunchy has only been fully tested with Firefox. Crunchy should work with all operating systems - it has been tested fairly extensively on Linux, Windows and Mac OS.

In more details, here's what Crunchy does:

1. It loads an existing html file (or reStructuredText file) containing some Python code; this file can reside locally, or anywhere on the web.
2. It removes any existing javascript code & embedded interactive elements (applets, flash apps, etc.).
3. It further transforms the file by inserting some extra elements, including some custom javascript code.
4. It sends the transformed file to your browser of choice (read: Firefox).
5. It establishes a communication between the browser and the Crunchy back end and wait for user interaction.

Crunchy can embed Python interpreters and code editors (and more!) within an html page, enabling a user to enter (or edit) some Python code inside a browser window, send it to the Crunchy back end for execution, and observe the result back in the browser window.

In a sense, Crunchy is a Python programming environment embedded within a browser. However, it does not require a custom format for interactivity: for html files, as long as the Python code is included in a pre-formatted element (<pre>), Crunchy will recognize it, style it, and include the appropriate interactive element.

Crunchy comes with a fairly complete tutorial and supporting documentation. It is highly configurable.

Release 1.0 is NOT the end of the road for Crunchy. Future plans include support for interactivity with languages other than Python.

Thursday, April 16, 2009

Python textbooks wanted

Python textbooks wanted(1). Actually, what I'm mostly interested in are links to textbooks and other resources that would be useful to educators either teaching Python as a programming language or other courses where Python feature prominently. The reason behind this request is that I volunteered to be responsible to take care of the edu-sig page on the Python site. Through no one's fault in particular, the old page (2) had gone stale and only one textbook was listed. We're now up to six and I am sure I have not included them all. By ensuring that up to date and complete information is available on that page, we can facilitate the adoption of Python as a language taught in High Schools, Colleges and Universities.

So, if you know of any useful information that should be added to the edu-sig page, please let me know.

(1) Actually, this is not exactly correct ... but if you want to send me some for reviews, I won't say no. ;-)

(2) The page linked here will eventually disappear...

Friday, April 10, 2009

Learning paths at ShowMeDo

ShowMeDo is a great site to learn about Python (and a few other subjects) by watching screncasts. One of my favourite videos is Learn Django: Create a Wiki in 20 minutes, which taught me the basic concepts so that I was later able to adapt and play around with Google App Engine. Many authors on ShowMeDo have multiple screencasts (I made a few on RUR-PLE and Crunchy a while ago), some of which (not mine) are very professional looking. Such series of videos give the viewpoint of a given author, introducing some concepts in a logical sequence.

The newest addition to the ShowMeDo site is called Learning Paths. The idea of Learning Paths is to improve upon the existing series by including videos from multiple authors into a coherent sequence. While this concept is very much in its infancy, it promises to become a great addition to what is already a fantastic site.

In case anyone were wondering: I have no financial interest whatsover in ShowMeDo; I am just a satisfied user who considers this site a very good resource, well worth exploring. If only I had more time...

Thursday, January 08, 2009

Testing code highlighting

Update:

In order to have the styling work, I had to include by hand the styling information (keyword and string) from this style sheet - and adapt as desired.

- - - - - - -
This is a test of Python code highlighting on Blogger following these instructions.
print "Hello World!"

# And this is a comment
End of test.

Saturday, December 20, 2008

Plugins - part 6: setuptools based approach

setuptools is collection of enhancements to the standard Python distutils module. While it is not included in the Python standard library, chances are that it is already installed on your system if you have installed some additional Python libraries since it is so widely used to install Python packages ("eggs") via easy_install.

setuptools is also used in many applications as a handler for plugins. Many such applications include tutorials for creating new plugins using setuptools. For a somewhat general introduction to using setuptools to create plugin based applications, I suggest you have a look at the two tutorials from which this one is inspired.

Our starting point for this tutorial is essentially the same as in the third one in this series. We start with the following files:

root_directory/
calculator.py
setup.py
plugins/
__init__.py
base.py
op_1.py
op_2.py

where we have one new file, setup.py. This file is a special file for setuptools. Its content is as follows:

''' run with python setup.py develop '''

from setuptools import setup, find_packages

setup(
name="Calculator_s_tools",
version="1.0",
packages=['plugins'],
entry_points="""
[plugin_tutorial.s_tools]
add = plugins.op_1:operator_add_token
sub = plugins.op_1:operator_sub_token
mul = plugins.op_1:operator_mul_token
div = plugins.op_1:operator_div_token
pow = plugins.op_2:operator_pow_token"""
)

The key concept for setuptools is that of "entry_points". We define some entry points, with a name "plugin_tutorial.s_tools" chosen as unique to our application. Within this entrypoint we indicate which classes should be imported. This method effectively replace our custom method for finding and loading plugins. However, if you remember from previous tutorials, the way the application was designed originally (all within a single file) resulted in a "wart", where we had to create a link to a function ("expression") in each plugin module. Since setuptools will import the classes for us, we have no way to tell it how to fix that "wart" - we have to find another way. The method we chose was to create a different Plugin base class, one that implements the Borg idiom, so that each instance shares common attributes. Here's the new "base.py":

import os
import sys
import pkg_resources # setuptools specific

OPERATORS = {}
ENTRYPOINT = 'plugin_tutorial.s_tools' # same name as in setup.py

class Plugin(object):
'''A Borg class.'''
_shared_states = {}
def __init__(self):
self.__dict__ = self._shared_states

def init_plugins(expression):
'''simple plugin initializer'''
Plugin().expression = expression # fixing the wart
load_plugins()

def load_plugins():
'''setuptools based plugin loader'''
for entrypoint in pkg_resources.iter_entry_points(ENTRYPOINT):
plugin_class = entrypoint.load()
OPERATORS[plugin_class.symbol] = plugin_class

The actual plugin files are slightly modified to derive from the new Plugin class; for example, op_2.py contains the following:

from plugins.base import Plugin

class operator_pow_token(Plugin):
symbol = '**'
lbp = 30
def led(self, left):
return left ** self.expression(30-1)

where "expression" is now a class variable obtained from Plugin.

The code involved to make the setuptools approach is approximately the same level of complexity as the class-based plugin system covered previously. The advantages of using the setuptools approach are as follows:
  1. Since it is a widely used tool, many people know how to use it properly.
  2. It is possible to package plugins as eggs to be uploaded to a repository.
  3. It is possible to keep track of dependencies in a fairly detailed way (e.g. module X version Y required).
By comparison, it suffers from the following disadvantages:
  1. Some information about plugin location (entry_points name) is duplicated, appearing in both setup.py and base.py (in our example).
  2. Automatic plugin discovery without editing of a file (setup.py) is not possible, unlike the cases we covered before. Because of this, dynamic loading of "external" plugins while the application is already running may be problematic to achieve. (I am not familiar enough with setuptools to determine if it is feasible or not.) See the first comment par Phillip J. Eby on how to achieve this.
  3. A preliminary step ("python setup.py develop") is required to generate entrypoints information.
  4. A number of additional files are created by the previous step, "cluttering" slightly the file structure by adding an extra directory with a few files.
That being said, the differences between the two approaches are relatively minor when everything is taken into account. Choosing one approach over the other is a matter of individual taste - at least for simple applications such as the one we considered.

Plugins - part 5: Activation and Deactivation

(Note: the code indentation may appear to be wrong due to some blogspot's quirks...)

While plugins are a great way to extend the functionality of an application, sometimes it makes sense to limit the number of available features, based on a user's preference. For example, gedit, the official text editor of the Gnome environment, offers the possibility to activate or deactivate a plugin.
[link to image of activated plugins for gedit]
In this post, using the class-based plugin approach, I will explain how to add the possibility to activate or deactivate a given plugin. Furthermore, I will show how to use this capability to dynamically load new plugins.

Starting from the beginning...

Our starting point will be the following modified core application (calculator.py):
import re

from plugins.base import OPERATORS, init_plugins, activate, deactivate

class literal_token(object):
def __init__(self, value):
self.value = value
def nud(self):
return self.value

class end_token(object):
lbp = 0

def tokenize(program):
for number, operator in re.findall("\s*(?:(\d+)|(\*\*|.))", program):
if number:
yield literal_token(int(number))
elif operator in OPERATORS:
yield OPERATORS[operator]()
else:
raise SyntaxError("unknown operator: %r" % operator)
yield end_token()

def expression(rbp=0):
global token
t = token
token = next()
left = t.nud()
while rbp < token.lbp:
t = token
token = next()
left = t.led(left)
return left

def calculate(program):
global token, next
next = tokenize(program).next
token = next()
return expression()

if __name__ == "__main__":
init_plugins(expression)
assert calculate("+1") == 1
assert calculate("-1") == -1
assert calculate("10") == 10
assert calculate("1+2") == 3
assert calculate("1+2+3") == 6
assert calculate("1+2-3") == 0
assert calculate("1+2*3") == 7
assert calculate("1*2+3") == 5
assert calculate("6*2/3") == 4

# "**" has not been activated at the start in base.py
try:
assert calculate("2**3") == 8
except SyntaxError:
print "Correcting error..."
activate("**")
assert calculate("2*2**3") == 16

deactivate('+')
try:
assert calculate("1+2") == 3
except SyntaxError:
activate('+')
assert calculate("1+2") == 3

print "Done!"


The new features are indicated by different colours. In blue, we have two new functions imported to either activate or deactivate a given plugin. When the application is started, exponentiation is disabled - this can only be seen by looking at the modified version of base.py. When a disabled plugin is called, a SyntaxError already present in the old version) is raised and we activate the plugin.

To make this possible, we need to modify base.py. Before showing the new version, here's the result of running the above code:
Activating +
Activating -
Activating *
Activating /
Correcting error...
Activating **
Deactivating +
Activating +
Done!


And here's the new version of base.py:

import os
import sys

OPERATORS = {}

# We simulate a configuration file that would be based on a user's preference
# as to which plugin should be activated by default
# We will leave one symbol "**" out of the list as a test.
preferences = ['+', '-', '*', '/']

# We also keep track of all available plugins, activated or not
all_plugins = {}

class Plugin(object):
'''base class for all plugins'''

def activate(self):
'''activate a given plugin'''
if self.symbol not in OPERATORS:
print "Activating %s" % self.symbol
OPERATORS[self.symbol] = self.__class__
if self.symbol not in all_plugins:
all_plugins[self.symbol] = self.__class__

def deactivate(self):
'''deactivate a given plugin'''
print "Deactivating %s" % self.symbol
if self.symbol in OPERATORS:
del OPERATORS[self.symbol]

def activate(symbol):
'''activate a given plugin based on its symbol'''
if symbol in OPERATORS:
return
all_plugins[symbol]().activate()

def deactivate(symbol):
'''deactivate a given plugin, based on its symbol'''
if symbol not in OPERATORS:
return
all_plugins[symbol]().deactivate()

def init_plugins(expression):
'''simple plugin initializer
'''
find_plugins(expression)
register_plugins()

def find_plugins(expression):
'''find all files in the plugin directory and imports them'''
plugin_dir = os.path.dirname(os.path.realpath(__file__))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
mod = __import__(plugin)
mod.expression = expression

def register_plugins():
'''Register all class based plugins.

Uses the fact that a class knows about all of its subclasses
to automatically initialize the relevant plugins
'''
for plugin in Plugin.__subclasses__():
# only register plugins according to user's preferences
if plugin.symbol in preferences:
plugin().activate()
else: # record its existence
all_plugins[plugin.symbol] = plugin

Changes from the old version are indicated in blue (with corresponding comments in green). Note that we did not change a single line of code for the actual plugins! We did use the same names (activate and deactivate) both for a function and a class method. This should probably be avoided in a larger application. In this example, the code is short enough that it should not create too much confusion. In a real application we would also give the possibility of changing the user's preferences, storing the information in some configuration file.

Dynamic activation

Now that we now how to activate and deactivate a plugin, it might be useful to consider dynamic activation of an external plugin, not located in the normal plugins directory. For example, consider the following plugin (located in op_3.py):


from plugins.base import Plugin

class operator_mod_token(Plugin):
symbol = '%'
lbp = 10
def nud(self):
return expression(100)
def led(self, left):
return left % expression(10)

This file is located in subdirectory "external" which is at the same level as "plugins" in our sample code. To invoke this plugin from our base application, we need to add the following code to calculator.py:

if __name__ == "__main__":
#...

# Simulating dynamic external plugin initialization
external_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)),
'external')
sys.path.insert(0, external_dir)
mod = __import__('op_3')
mod.expression = expression
# register this plugin using our default method
register_plugins()
# Since it is not activated by default, we need to do it explictly
activate('%')
assert calculate("7%2") == 1

print "Done!"

Note that we also need to import register_plugins() from base.py to make this work.

That's it! If you get the code from the py-fun repository, you can try it out yourself.

Friday, December 19, 2008

A small svg module

Update: by combining suggestions made in comments, one can probably do away with much of what I describe in this blog post. To wit:

>>> from xml.etree import ElementTree as etree
>>> from functools import partial
>>> Circle = partial(etree.Element, 'svg:circle')
>>> c = Circle(cx='100', cy='200', fill='red')
>>> etree.tostring(c)
'<svg:circle cx="100" cy="200" fill="red" />'


The only minor drawback is that attributes have to be strings, whereas the module described in this post could handle integer attributes. (Python 2.5+ required for ElementTree)


Original post below
(Time to take a break from the plugins blog series...)

Scalable Vector Graphics (SVG) are becoming more and more common on the web due to the increased support by decent browsers. SVG specifications include basic shapes such as circle, rectangles, etc., as well as supporting clipping, masking and composition, filter effects and much more. Attempting to write a python-ic module supporting all possible SVG primitives and options via code like

test_circle = Circle(x=10, y=10, r=5, color='red')

can be a daunting task. Furthermore, documenting such a module would result in a lot of duplication with the official specification document. Fortunately, there is a simpler way than simply attempting to write a complete SVG module using Class-based definitions such as the one written above. The idea is to use instead an API similar to that of ElementTree (see also) - albeit much simplified.

Suppose that we would want to be able to create SVG circles, such as
<circle cx="600" cy="200" r="100" fill="red" stroke="blue" width="10"/>
and rectangles, such as
<rect x="1" y="1" height="398" fill="none" stroke="blue" width="1198"/>
using Python code. A simple way to achieve this would be to define the following class:

class XmlElement(object):
'''First prototype from which all the xml elements are derived.

By design, this enables all elements to automatically give a
text representation of themselves - it is not quite complete.'''

def __init__(self, tag, **attributes):
'''A basic definition that will be replaced by the specific
one required by any element.'''
self.tag = tag
if attributes is not None:
self.attributes = attributes
else:
self.attributes = {}

def __repr__(self):
'''This normal python method used to give a string representation
for an object is used to automatically create the appropriate
syntax representing an xml object.'''
attrib = [" <%s" % self.tag] # open tag
for att in self.attributes:
attrib.append(' %s="%s"' % (att, self.attributes[att]))
attrib.append("/>\n")
return ''.join(attrib)
Using this class, we can create a circle instance corresponding to the definition written previously as

circle = XmlElement("circle", cx=600, cy=200, r=100, fill="red",
stroke="blue", width=10)


This is not quite as simple as the very first Circle() class-based example we wrote but it has the advantage of supporting all possible SVG attributes.

While the above XmlElement class definition is adequate for most basic SVG elements, it does not support such features as 1. text, 2. namespace (e.g. svg: prefix) and 3. grouping and sub-elements. All three additional features can be taken care of by the following modified class definition:

class XmlElement(object):
'''Prototype from which all the xml elements are derived.

By design, this enables all elements to automatically give a
text representation of themselves.'''

def __init__(self, tag, **attributes):
'''A basic definition that will be replaced by the specific
one required by any element.'''
self.tag = tag
self.prefix = ""
self.sub_elements = []
if attributes is not None:
self.attributes = attributes
else:
self.attributes = {}

def __repr__(self):
'''This normal python method used to give a string representation
for an object is used to automatically create the appropriate
syntax representing an xml object.'''
attrib = [" <%s%s"%(self.prefix, self.tag)] # open tag
for att in self.attributes:
if att != 'text':
attrib.append(' %s="%s"' % (att, self.attributes[att]))
if 'text' in self.attributes:
attrib.append(">%s\n" % (self.attributes['text'],
self.prefix, self.tag))
elif self.sub_elements:
attrib.append(">\n")
for elem in self.sub_elements:
attrib.append(" %s" % elem)
attrib.append("\n" % (self.prefix, self.tag))
else:
attrib.append("/>\n")
return ''.join(attrib)

def append(self, other):
'''append other to self to create list of lists of elements'''''
self.sub_elements.append(other)


That's almost it! With the exception of comments and Document Type Definition (dtd), we can use the above to create simple xhtml document containing ANY svg graphics without having to worry about xhtml syntax, opening and closing brackets, etc. However, we can possibly do even a little better. Consider the following xhtml document with embedded svg graphics:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink">
<head>
<title>This is the title.</title>
</head>
<body>
<p>This is the body.</p>
<svg:svg width="0" height="0">
<svg:defs>
<svg:circle cy="0" cx="0" r="20" id="red_circle" fill="red"/>
</svg:defs>
</svg:svg>
<svg:svg width="200" height="200">
<svg:use xlink:href="#red_circle" transform="translate(100, 100)"/>
</svg:svg>
<!-- This is a comment. -->
</body>
</html>


With just a few additional definitions, we can create this document using only Python code as follows:
doc = XmlDocument()
doc.head.append(XmlElement("title", text="This is the title."))

# A good practice is to define svg objects, and insert them
# using the definition; this is overkill for this example, but it
# provides a test of the class.
test_def = SvgDefs()
test_def.append(SvgElement("circle", cx=0, cy=0, r=20, fill="red",
id="red_circle"))

doc.body.append(XmlElement("p", text="This is the body."))
doc.body.append(test_def)

# we now create an svg object, that will make use of the definition above.
svg_window = SvgElement("svg", width="200", height="200")
use_circle = SvgElement("use", transform="translate(100, 100)")

# xlink:href can't be used as an attribute name passed to __init__
# this is why we use this two-step process.
use_circle.attributes["xlink:href"] = "#red_circle"

svg_window.append(use_circle)
doc.body.append(svg_window)
doc.body.append(Comment("This is a comment.")) # just for fun.

print doc

The additional definitions are as follow:

class XmlDocument(XmlElement):
def __init__(self):
self._begin = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink">\n"""
self._end = "</html>"
self.head = XmlElement("head")
self.body = XmlElement("body")

def append(self):
'''Directly appending is not allowed'''
assert False, "Append to either head or body."

def __repr__(self):
'''Gives an appropriate representation of an xml document.'''
return self._begin + str(self.head) + str(self.body) + self._end


class SvgElement(XmlElement):
'''Prototype from which all the svg elements are derived.

By design, this enables all elements to automatically give an
appropriate text representation of themselves.'''
def __init__(self, tag, **attributes):
XmlElement.__init__(self, tag, **attributes)
self.prefix = "svg:"

class SvgDefs(SvgElement):
'''Short-cut to create svg defs. A user creates an instance of this
object and simply appends other svg Elements'''
def __init__(self):
self.defs = SvgElement("defs")
self.root = SvgElement("svg", width=0, height=0)
self.root.append(self.defs)

def append(self, other):
'''appends other to defs sub-element, instead of root element'''
self.defs.append(other)

def __repr__(self):
'''gives a string representation of an object, appropriate for
insertion in an html document'''
return str(self.root)

class Comment(object):
'''Comment that can be inserted in code xml documents'''
def __init__(self, text):
self.text = text
def __repr__(self):
return "<!-- " + self.text + " -->\n"


That's it for real this time! Fewer than 100 lines of code that you can use if you need to programmatically create (x)html documents containing svg images. There are a few limitations (elements containing text may not be chained...) but it works for me. If you want to try it yourself, you can find the module here.

Plugins - part 4: Crunchy-style plugin

In this 4th post in the plugins series, I will explain the approach we used in Crunchy. While explaining the main features, I will also compare with the simple class-base plugin framework introduced in the third post in this series.

Crunchy's approach does not require a plugin to be class-based. In fact, most plugins used in Crunchy only make use of simple functions inside modules. While the class-based framework introduced in the third post used the fact that Python allowed automatic discovery of subclasses, the approach used in Crunchy requires an explicit registration of plugins. Using the same example as before, this means that op_2.py would contain the following code:

def register(OPERATORS):
OPERATORS['**'] = operator_pow_token

class operator_pow_token(object):
lbp = 30
def led(self, left):
return left ** expression(30-1)


Note that the class (operator_pow_token) is unchanged from the original application.

The method used to find plugins is similar to that introduced previously. The entire code required is as follows:

def init_plugins(expression):
plugin_dir = (os.path.dirname(os.path.realpath(__file__)))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
mod = __import__(plugin)
if hasattr(mod, "register"):
mod.expression = expression
mod.register(OPERATORS)

By comparison, the code used in the class-based plugin could have been written as:

def init_plugins(expression):
plugin_dir = os.path.dirname(os.path.realpath(__file__))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
mod = __import__(plugin)
mod.expression = expression
for plugin in Plugin.__subclasses__():
OPERATORS[plugin.symbol] = plugin

So, in one case (Crunchy-style), we have an explicit registration process with no need to create a sample base class (and, in fact, no need to work with classes at all), while in the other we have an automatic registration based on identifying subclasses. This does not mean that the Crunchy-style is better - just different. Both are equally good for this type of simple application. While we have not found the approach used in Crunchy to be limiting us in any way when extending Crunchy, something must be said for the fact that all the other Python examples of plugin-based application I have found have been based on using classes.

I can now give another motivation for having chosen the small expression calculator as a candidate for a plugin-based application: since all mathematical operations were already implemented as classes, it was readily suitable for the class-based approach (and the Zope Component Architecture one, etc.) whereas all my existing code samples that used plugins (from Crunchy, docpicture, etc.) had mostly functions rather than classes in plugins.

Plugins - part 3: Simple class-based plugin

In the first post of this series, I introduced a simple application to be used as a demonstration of a plugin-based application. The chosen application was an expression calculator contained in a single file. In the second post, I modularized the original file so that the new file structure would become a good representative of a plugin based application. In this post, I will explain how to make use of a simple class-base plugin framework. The model I have chosen follows fairly closely the tutorial written by Armin Ronacher. Another tutorial demonstrating a simple class-based plugin framework has been written by Marty Alchin.

The first step is to define a base Plugin class. All we need is to include the following in base.py:

class Plugin(object):
pass


Next, we ensure that classes used in plugins derive from this base class. We only give one explicit example, that of the class included in op_2.py since the 4 classes included in op_1.py would be treated in exactly the same way.

from plugins.base import Plugin

class operator_pow_token(Plugin):
symbol = '**'
lbp = 30
def led(self, left):
return left ** expression(30-1)

Note that we added one more line of code to the class definition. We are now ready to deal with the plugin discovery and registration.

Rather than hard-coding the information about which plugin files to import as we did when we simply modularize the application, we give a way for our program to automatically find plugins. With the file structure that we have created, this can be accomplished as follows:

def find_plugins(expression):
'''find all files in the plugin directory and imports them'''
plugin_dir = os.path.dirname(os.path.realpath(__file__))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
mod = __import__(plugin)
mod.expression = expression

Note that the last line of code is included because of the "wart" mentioned in the previous post and would not usually be included. To be safe, we should probably have ensured that expression was not already defined in the modules to be imported since, in theory, Python files other than plugins (such as __init__.py) might be present in the plugin directory. In this tutorial series we will often ignore the need to insert try/except clauses to simplify the code.

While we have imported the modules containing the plugins, they are not yet known in a useful form by the main application. To do so is very simple in this class-based approach, thanks to Python's treatment of (sub-)classes. Here's the code to do this:

def register_plugins():
'''Register all class based plugins.

Uses the fact that a class knows about all of its subclasses
to automatically initialize the relevant plugins
'''
for plugin in Plugin.__subclasses__():
OPERATORS[plugin.symbol] = plugin


That's it! It is hard to imagine anything simpler. With this last definition, the entire base.py module can be written as:

import os
import sys

OPERATORS = {}

class Plugin(object):
pass

def init_plugins(expression):
'''simple plugin initializer
'''
find_plugins(expression)
register_plugins()

def find_plugins(expression):
'''find all files in the plugin directory and imports them'''
plugin_dir = os.path.dirname(os.path.realpath(__file__))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
mod = __import__(plugin)
mod.expression = expression

def register_plugins():
'''Register all class based plugins.

Uses the fact that a class knows about all of its subclasses
to automatically initialize the relevant plugins
'''
for plugin in Plugin.__subclasses__():
OPERATORS[plugin.symbol] = plugin

In the next post, I will show another simple alternative approach similar to the one used in Crunchy.

Plugins - part 2: modularization

In the first post on the Plugins series, I introduced the small application used to demonstrate how one could modularize applications using a plugin architecture. The digital ink was barely dry on that post that already two people rose to the challenge and presented their solution, one using the standard method with the Zope Component Architecture, the other a modified method using grok. I will comment on these two solutions later in this series.

With apologies to the more advanced users, I have decided to proceed fairly slowly and cover many simple concepts with this series of plugins. Thus, this second post will not yet discuss plugins, but simply lay the groundwork for future posts. By the way, for those interested, and as pointed out by Lennart Regebro in his post, all the code samples that I will use can be browsed at, or retrieved from, my py-fun google code repository.

As a first step before comparing different approaches to dealing with plugins, I will take the sample application introduced in the first post and modularize it.

The core application (calculator.py) is as follows:

import re

from plugins.base import OPERATORS, init_plugins

class literal_token(object):
def __init__(self, value):
self.value = value
def nud(self):
return self.value

class end_token(object):
lbp = 0

def tokenize(program):
for number, operator in re.findall("\s*(?:(\d+)|(\*\*|.))", program):
if number:
yield literal_token(int(number))
elif operator in OPERATORS:
yield OPERATORS[operator]()
else:
raise SyntaxError("unknown operator: %r" % operator)
yield end_token()

def expression(rbp=0):
global token
t = token
token = next()
left = t.nud()
while rbp < token.lbp:
t = token
token = next()
left = t.led(left)
return left

def calculate(program):
global token, next
next = tokenize(program).next
token = next()
return expression()

if __name__ == "__main__":
init_plugins(expression)
assert calculate("+1") == 1
assert calculate("-1") == -1
assert calculate("10") == 10
assert calculate("1+2") == 3
assert calculate("1+2+3") == 6
assert calculate("1+2-3") == 0
assert calculate("1+2*3") == 7
assert calculate("1*2+3") == 5
assert calculate("6*2/3") == 4
assert calculate("2**3") == 8
assert calculate("2*2**3") == 16
print "Done!"


For the next few posts, when I demonstrate some very simple plugin approaches, this core application will remain untouched. This is one important characteristic of plugin-based application: in a well-designed application, plugin writers should not have to modify a single line of the core modules to ensure that their plugins can be used.

Communication between plugins and the core application is ensured via an Application Programming Interface (API) unique to that application. In our example, the API is a simple Python dict (OPERATORS) written in capital letters only to make it stand out.

In a sub-directory (plugins), in addition to an empty __init__.py file, we include the following three files:

1. base.py

OPERATORS = {}

def init_plugins(expression):
'''simulated plugin initializer'''
from plugins import op_1, op_2

op_1.expression = expression
op_2.expression = expression

OPERATORS['+'] = op_1.operator_add_token
OPERATORS['-'] = op_1.operator_sub_token
OPERATORS['*'] = op_1.operator_mul_token
OPERATORS['/'] = op_1.operator_div_token
OPERATORS['**'] = op_2.operator_pow_token

2. op_1.py

class operator_add_token(object):
lbp = 10
def nud(self):
return expression(100)
def led(self, left):
return left + expression(10)

class operator_sub_token(object):
lbp = 10
def nud(self):
return -expression(100)
def led(self, left):
return left - expression(10)

class operator_mul_token(object):
lbp = 20
def led(self, left):
return left * expression(20)

class operator_div_token(object):
lbp = 20
def led(self, left):
return left / expression(20)


and 3. op_2.py

class operator_pow_token(object):
lbp = 30
def led(self, left):
return left ** expression(30-1)


The last two files have been simply extracted with no modification from the original application. Instead of having 2 such files containing classes of the form operator_xxx_token, I could have included them all in one file, or split into 5 different files. The number of files is irrelevant here: they are only introduced to play the role of plugins in this application.

The file base.py plays the role here of a plugin initialization module: it ensures that plugins are properly registered and made available to the core program.

Since I wanted to change the original code as little as possible, a "wart" is present in the code as written since it was never intended to be a plugin-based application: the function expression() was accessible to all objects in the initial single-file application. It is now needed in a number of modules. The file base.py takes care of ensuring that "plugin" modules have access to that function in a transparent way. This will need to be changed when using some standard plugin frameworks, as was done in the zca example or the grok one.

In the next post, I will show how to take this now modularized application and transform it into a proper plugin-based one.