Only Python: 2008

Saturday, December 20, 2008

Plugins - part 6: setuptools based approach

setuptools is collection of enhancements to the standard Python distutils module. While it is not included in the Python standard library, chances are that it is already installed on your system if you have installed some additional Python libraries since it is so widely used to install Python packages ("eggs") via easy_install.

setuptools is also used in many applications as a handler for plugins. Many such applications include tutorials for creating new plugins using setuptools. For a somewhat general introduction to using setuptools to create plugin based applications, I suggest you have a look at the two tutorials from which this one is inspired.

Our starting point for this tutorial is essentially the same as in the third one in this series. We start with the following files:


root_directory/
    calculator.py
    setup.py
    plugins/
        __init__.py
        base.py
        op_1.py
        op_2.py

where we have one new file, setup.py. This file is a special file for setuptools. Its content is as follows:


''' run with python setup.py develop '''

from setuptools import setup, find_packages

setup(
    name="Calculator_s_tools",
    version="1.0",
    packages=['plugins'],
    entry_points="""
        [plugin_tutorial.s_tools]
        add = plugins.op_1:operator_add_token
        sub = plugins.op_1:operator_sub_token
        mul = plugins.op_1:operator_mul_token
        div = plugins.op_1:operator_div_token
        pow = plugins.op_2:operator_pow_token"""
       )

The key concept for setuptools is that of "entry_points". We define some entry points, with a name "plugin_tutorial.s_tools" chosen as unique to our application. Within this entrypoint we indicate which classes should be imported. This method effectively replace our custom method for finding and loading plugins. However, if you remember from previous tutorials, the way the application was designed originally (all within a single file) resulted in a "wart", where we had to create a link to a function ("expression") in each plugin module. Since setuptools will import the classes for us, we have no way to tell it how to fix that "wart" - we have to find another way. The method we chose was to create a different Plugin base class, one that implements the Borg idiom, so that each instance shares common attributes. Here's the new "base.py":


import os
import sys
import pkg_resources  # setuptools specific

OPERATORS = {}
ENTRYPOINT = 'plugin_tutorial.s_tools'  # same name as in setup.py

class Plugin(object):
    '''A Borg class.'''
    _shared_states = {}
    def __init__(self):
         self.__dict__ = self._shared_states

def init_plugins(expression):
    '''simple plugin initializer'''
    Plugin().expression = expression  # fixing the wart
    load_plugins()

def load_plugins():
    '''setuptools based plugin loader'''
    for entrypoint in pkg_resources.iter_entry_points(ENTRYPOINT):
        plugin_class = entrypoint.load()
        OPERATORS[plugin_class.symbol] = plugin_class

The actual plugin files are slightly modified to derive from the new Plugin class; for example, op_2.py contains the following:


from plugins.base import Plugin

class operator_pow_token(Plugin):
    symbol = '**'
    lbp = 30
    def led(self, left):
        return left ** self.expression(30-1)

where "expression" is now a class variable obtained from Plugin.

The code involved to make the setuptools approach is approximately the same level of complexity as the class-based plugin system covered previously. The advantages of using the setuptools approach are as follows:

Since it is a widely used tool, many people know how to use it properly.
It is possible to package plugins as eggs to be uploaded to a repository.
It is possible to keep track of dependencies in a fairly detailed way (e.g. module X version Y required).

By comparison, it suffers from the following disadvantages:

Some information about plugin location (entry_points name) is duplicated, appearing in both setup.py and base.py (in our example).
Automatic plugin discovery without editing of a file (setup.py) is not possible, unlike the cases we covered before. Because of this, dynamic loading of "external" plugins while the application is already running may be problematic to achieve. (I am not familiar enough with setuptools to determine if it is feasible or not.) See the first comment par Phillip J. Eby on how to achieve this.
A preliminary step ("python setup.py develop") is required to generate entrypoints information.
A number of additional files are created by the previous step, "cluttering" slightly the file structure by adding an extra directory with a few files.

That being said, the differences between the two approaches are relatively minor when everything is taken into account. Choosing one approach over the other is a matter of individual taste - at least for simple applications such as the one we considered.

Plugins - part 5: Activation and Deactivation

(Note: the code indentation may appear to be wrong due to some blogspot's quirks...)

While plugins are a great way to extend the functionality of an application, sometimes it makes sense to limit the number of available features, based on a user's preference. For example, gedit, the official text editor of the Gnome environment, offers the possibility to activate or deactivate a plugin.
[link to image of activated plugins for gedit]

[link to image of activated plugins for gedit]

In this post, using the class-based plugin approach, I will explain how to add the possibility to activate or deactivate a given plugin. Furthermore, I will show how to use this capability to dynamically load new plugins.

Starting from the beginning...

Our starting point will be the following modified core application (calculator.py):

import re

from plugins.base import OPERATORS, init_plugins, activate, deactivate

class literal_token(object):
 def __init__(self, value):
     self.value = value
 def nud(self):
     return self.value

class end_token(object):
lbp = 0

def tokenize(program):
 for number, operator in re.findall("\s*(?:(\d+)|(\*\*|.))", program):
    if number:
        yield literal_token(int(number))
    elif operator in OPERATORS:
        yield OPERATORS[operator]()
    else:
        raise SyntaxError("unknown operator: %r" % operator)
 yield end_token()

def expression(rbp=0):
 global token
 t = token
 token = next()
 left = t.nud()
 while rbp < token.lbp:
     t = token
     token = next()
     left = t.led(left)
 return left

def calculate(program):
 global token, next
 next = tokenize(program).next
 token = next()
 return expression()

if __name__ == "__main__":
init_plugins(expression)
assert calculate("+1") == 1
assert calculate("-1") == -1
assert calculate("10") == 10
assert calculate("1+2") == 3
assert calculate("1+2+3") == 6
assert calculate("1+2-3") == 0
assert calculate("1+2*3") == 7
assert calculate("1*2+3") == 5
assert calculate("6*2/3") == 4

    # "**" has not been activated at the start in base.py
    try:
        assert calculate("2**3") == 8
    except SyntaxError:
        print "Correcting error..."
        activate("**")
    assert calculate("2*2**3") == 16

    deactivate('+')
    try:
        assert calculate("1+2") == 3
    except SyntaxError:
        activate('+')
    assert calculate("1+2") == 3

print "Done!"

The new features are indicated by different colours. In blue, we have two new functions imported to either activate or deactivate a given plugin. When the application is started, exponentiation is disabled - this can only be seen by looking at the modified version of base.py. When a disabled plugin is called, a SyntaxError already present in the old version) is raised and we activate the plugin.

To make this possible, we need to modify base.py. Before showing the new version, here's the result of running the above code:

Activating +
Activating -
Activating *
Activating /
Correcting error...
Activating **
Deactivating +
Activating +
Done!

And here's the new version of base.py:


import os
import sys

OPERATORS = {}

# We simulate a configuration file that would be based on a user's preference
# as to which plugin should be activated by default
# We will leave one symbol "**" out of the list as a test.
preferences = ['+', '-', '*', '/']

# We also keep track of all available plugins, activated or not
all_plugins = {}

class Plugin(object):
'''base class for all plugins'''

    def activate(self):
        '''activate a given plugin'''
        if self.symbol not in OPERATORS:
            print "Activating %s" % self.symbol
            OPERATORS[self.symbol] = self.__class__
        if self.symbol not in all_plugins:
            all_plugins[self.symbol] = self.__class__

    def deactivate(self):
        '''deactivate a given plugin'''
        print "Deactivating %s" % self.symbol
        if self.symbol in OPERATORS:
            del OPERATORS[self.symbol]

def activate(symbol):
    '''activate a given plugin based on its symbol'''
    if symbol in OPERATORS:
        return
    all_plugins[symbol]().activate()

def deactivate(symbol):
    '''deactivate a given plugin, based on its symbol'''
    if symbol not in OPERATORS:
        return
    all_plugins[symbol]().deactivate()

def init_plugins(expression):
'''simple plugin initializer
'''
find_plugins(expression)
register_plugins()

def find_plugins(expression):
'''find all files in the plugin directory and imports them'''
plugin_dir = os.path.dirname(os.path.realpath(__file__))
plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
sys.path.insert(0, plugin_dir)
for plugin in plugin_files:
 mod = __import__(plugin)
 mod.expression = expression

def register_plugins():
'''Register all class based plugins.

Uses the fact that a class knows about all of its subclasses
to automatically initialize the relevant plugins
'''
for plugin in Plugin.__subclasses__():
        # only register plugins according to user's preferences
        if plugin.symbol in preferences:
            plugin().activate()
        else:   # record its existence
            all_plugins[plugin.symbol] = plugin

Changes from the old version are indicated in blue (with corresponding comments in green). Note that we did not change a single line of code for the actual plugins! We did use the same names (activate and deactivate) both for a function and a class method. This should probably be avoided in a larger application. In this example, the code is short enough that it should not create too much confusion. In a real application we would also give the possibility of changing the user's preferences, storing the information in some configuration file.

Dynamic activation

Now that we now how to activate and deactivate a plugin, it might be useful to consider dynamic activation of an external plugin, not located in the normal plugins directory. For example, consider the following plugin (located in op_3.py):


from plugins.base import Plugin

class operator_mod_token(Plugin):
symbol = '%'
lbp = 10
def nud(self):
   return expression(100)
def led(self, left):
   return left % expression(10)

This file is located in subdirectory "external" which is at the same level as "plugins" in our sample code. To invoke this plugin from our base application, we need to add the following code to calculator.py:


if __name__ == "__main__":
#...

    # Simulating dynamic external plugin initialization
external_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)),
                          'external')
sys.path.insert(0, external_dir)
mod = __import__('op_3')
mod.expression = expression
    # register this plugin using our default method
register_plugins()
    # Since it is not activated by default, we need to do it explictly
activate('%')
assert calculate("7%2") == 1

print "Done!"

Note that we also need to import register_plugins() from base.py to make this work.

That's it! If you get the code from the py-fun repository, you can try it out yourself.

Friday, December 19, 2008

A small svg module

Update: by combining suggestions made in comments, one can probably do away with much of what I describe in this blog post. To wit:


>>> from xml.etree import ElementTree as etree
>>> from functools import partial
>>> Circle = partial(etree.Element, 'svg:circle')
>>> c = Circle(cx='100', cy='200', fill='red')
>>> etree.tostring(c)
'<svg:circle cx="100" cy="200" fill="red" />'

The only minor drawback is that attributes have to be strings, whereas the module described in this post could handle integer attributes. (Python 2.5+ required for ElementTree)

Original post below
(Time to take a break from the plugins blog series...)

Scalable Vector Graphics (SVG) are becoming more and more common on the web due to the increased support by decent browsers. SVG specifications include basic shapes such as circle, rectangles, etc., as well as supporting clipping, masking and composition, filter effects and much more. Attempting to write a python-ic module supporting all possible SVG primitives and options via code like

test_circle = Circle(x=10, y=10, r=5, color='red')

can be a daunting task. Furthermore, documenting such a module would result in a lot of duplication with the official specification document. Fortunately, there is a simpler way than simply attempting to write a complete SVG module using Class-based definitions such as the one written above. The idea is to use instead an API similar to that of ElementTree (see also) - albeit much simplified.

Suppose that we would want to be able to create SVG circles, such as

<circle cx="600" cy="200" r="100" fill="red" stroke="blue" width="10"/>

and rectangles, such as

<rect x="1" y="1" height="398" fill="none" stroke="blue" width="1198"/>

using Python code. A simple way to achieve this would be to define the following class:


class XmlElement(object):
    '''First prototype from which all the xml elements are derived.

       By design, this enables all elements to automatically give a
       text representation of themselves - it is not quite complete.'''

    def __init__(self, tag, **attributes):
        '''A basic definition that will be replaced by the specific
           one required by any element.'''
self.tag = tag
if attributes is not None:
   self.attributes = attributes
else:
   self.attributes = {}

    def __repr__(self):
        '''This normal python method used to give a string representation
        for an object is used to automatically create the appropriate
        syntax representing an xml object.'''
attrib = ["  <%s" % self.tag] # open tag
for att in self.attributes:
   attrib.append(' %s="%s"' % (att, self.attributes[att]))
attrib.append("/>\n")
return ''.join(attrib)

Using this class, we can create a circle instance corresponding to the definition written previously as


circle = XmlElement("circle", cx=600, cy=200, r=100, fill="red",
            stroke="blue", width=10)

This is not quite as simple as the very first Circle() class-based example we wrote but it has the advantage of supporting all possible SVG attributes.

While the above XmlElement class definition is adequate for most basic SVG elements, it does not support such features as 1. text, 2. namespace (e.g. svg: prefix) and 3. grouping and sub-elements. All three additional features can be taken care of by the following modified class definition:


class XmlElement(object):
    '''Prototype from which all the xml elements are derived.

       By design, this enables all elements to automatically give a
       text representation of themselves.'''

    def __init__(self, tag, **attributes):
        '''A basic definition that will be replaced by the specific
           one required by any element.'''
 self.tag = tag
 self.prefix = ""
 self.sub_elements = []
 if attributes is not None:
     self.attributes = attributes
 else:
     self.attributes = {}

    def __repr__(self):
        '''This normal python method used to give a string representation
        for an object is used to automatically create the appropriate
        syntax representing an xml object.'''
 attrib = ["  <%s%s"%(self.prefix, self.tag)] # open tag
 for att in self.attributes:
     if att != 'text':
         attrib.append(' %s="%s"' % (att, self.attributes[att]))
 if 'text' in self.attributes:
     attrib.append(">%s\n" % (self.attributes['text'],
                                             self.prefix, self.tag))
 elif self.sub_elements:
     attrib.append(">\n")
     for elem in self.sub_elements:
         attrib.append("  %s" % elem)
     attrib.append("\n" % (self.prefix, self.tag))
 else:
     attrib.append("/>\n")
 return ''.join(attrib)

    def append(self, other):
        '''append other to self to create list of lists of elements'''''
 self.sub_elements.append(other)

That's almost it! With the exception of comments and Document Type Definition (dtd), we can use the above to create simple xhtml document containing ANY svg graphics without having to worry about xhtml syntax, opening and closing brackets, etc. However, we can possibly do even a little better. Consider the following xhtml document with embedded svg graphics:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink">
<head>
<title>This is the title.</title>
</head>
<body>
<p>This is the body.</p>
<svg:svg width="0" height="0">
<svg:defs>
<svg:circle cy="0" cx="0" r="20" id="red_circle" fill="red"/>
</svg:defs>
</svg:svg>
<svg:svg width="200" height="200">
<svg:use xlink:href="#red_circle" transform="translate(100, 100)"/>
</svg:svg>
<!-- This is a comment. -->
</body>
</html>

With just a few additional definitions, we can create this document using only Python code as follows:

doc = XmlDocument()
doc.head.append(XmlElement("title", text="This is the title."))

# A good practice is to define svg objects, and insert them
# using the definition; this is overkill for this example, but it
# provides a test of the class.
test_def = SvgDefs()
test_def.append(SvgElement("circle", cx=0, cy=0, r=20, fill="red",
                  id="red_circle"))

doc.body.append(XmlElement("p", text="This is the body."))
doc.body.append(test_def)

# we now create an svg object, that will make use of the definition above.
svg_window = SvgElement("svg", width="200", height="200")
use_circle = SvgElement("use", transform="translate(100, 100)")

# xlink:href can't be used as an attribute name passed to __init__
# this is why we use this two-step process.
use_circle.attributes["xlink:href"] = "#red_circle"

svg_window.append(use_circle)
doc.body.append(svg_window)
doc.body.append(Comment("This is a comment.")) # just for fun.

print doc

The additional definitions are as follow:


class XmlDocument(XmlElement):
    def __init__(self):
   self._begin = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
    xmlns:svg="http://www.w3.org/2000/svg"
    xmlns:xlink="http://www.w3.org/1999/xlink">\n"""
   self._end = "</html>"
   self.head = XmlElement("head")
   self.body = XmlElement("body")

    def append(self):
        '''Directly appending is not allowed'''
   assert False, "Append to either head or body."

    def __repr__(self):
        '''Gives an appropriate representation of an xml document.'''
   return self._begin + str(self.head) + str(self.body) + self._end


class SvgElement(XmlElement):
    '''Prototype from which all the svg elements are derived.

       By design, this enables all elements to automatically give an
       appropriate text representation of themselves.'''
    def __init__(self, tag, **attributes):
   XmlElement.__init__(self, tag, **attributes)
   self.prefix = "svg:"

class SvgDefs(SvgElement):
    '''Short-cut to create svg defs.  A user creates an instance of this
    object and simply appends other svg Elements'''
    def __init__(self):
   self.defs = SvgElement("defs")
   self.root = SvgElement("svg", width=0, height=0)
   self.root.append(self.defs)

    def append(self, other):
        '''appends other to defs sub-element, instead of root element'''
   self.defs.append(other)

    def __repr__(self):
        '''gives a string representation of an object, appropriate for
           insertion in an html document'''
   return str(self.root)

class Comment(object):
    '''Comment that can be inserted in code xml documents'''
def __init__(self, text):
   self.text = text
def __repr__(self):
   return "<!-- " + self.text + " -->\n"

That's it for real this time! Fewer than 100 lines of code that you can use if you need to programmatically create (x)html documents containing svg images. There are a few limitations (elements containing text may not be chained...) but it works for me. If you want to try it yourself, you can find the module here.

Plugins - part 4: Crunchy-style plugin

In this 4th post in the plugins series, I will explain the approach we used in Crunchy. While explaining the main features, I will also compare with the simple class-base plugin framework introduced in the third post in this series.

Crunchy's approach does not require a plugin to be class-based. In fact, most plugins used in Crunchy only make use of simple functions inside modules. While the class-based framework introduced in the third post used the fact that Python allowed automatic discovery of subclasses, the approach used in Crunchy requires an explicit registration of plugins. Using the same example as before, this means that op_2.py would contain the following code:


def register(OPERATORS):
    OPERATORS['**'] = operator_pow_token

class operator_pow_token(object):
 lbp = 30
 def led(self, left):
     return left ** expression(30-1)

Note that the class (operator_pow_token) is unchanged from the original application.

The method used to find plugins is similar to that introduced previously. The entire code required is as follows:


def init_plugins(expression):
  plugin_dir = (os.path.dirname(os.path.realpath(__file__)))
  plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
  sys.path.insert(0, plugin_dir)
  for plugin in plugin_files:
      mod = __import__(plugin)
       if hasattr(mod, "register"):
          mod.expression = expression
           mod.register(OPERATORS)

By comparison, the code used in the class-based plugin could have been written as:


def init_plugins(expression):
   plugin_dir = os.path.dirname(os.path.realpath(__file__))
   plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
   sys.path.insert(0, plugin_dir)
   for plugin in plugin_files:
       mod = __import__(plugin)
       mod.expression = expression
    for plugin in Plugin.__subclasses__():
        OPERATORS[plugin.symbol] = plugin

So, in one case (Crunchy-style), we have an explicit registration process with no need to create a sample base class (and, in fact, no need to work with classes at all), while in the other we have an automatic registration based on identifying subclasses. This does not mean that the Crunchy-style is better - just different. Both are equally good for this type of simple application. While we have not found the approach used in Crunchy to be limiting us in any way when extending Crunchy, something must be said for the fact that all the other Python examples of plugin-based application I have found have been based on using classes.

I can now give another motivation for having chosen the small expression calculator as a candidate for a plugin-based application: since all mathematical operations were already implemented as classes, it was readily suitable for the class-based approach (and the Zope Component Architecture one, etc.) whereas all my existing code samples that used plugins (from Crunchy, docpicture, etc.) had mostly functions rather than classes in plugins.

Plugins - part 3: Simple class-based plugin

In the first post of this series, I introduced a simple application to be used as a demonstration of a plugin-based application. The chosen application was an expression calculator contained in a single file. In the second post, I modularized the original file so that the new file structure would become a good representative of a plugin based application. In this post, I will explain how to make use of a simple class-base plugin framework. The model I have chosen follows fairly closely the tutorial written by Armin Ronacher. Another tutorial demonstrating a simple class-based plugin framework has been written by Marty Alchin.

The first step is to define a base Plugin class. All we need is to include the following in base.py:


class Plugin(object):
   pass

Next, we ensure that classes used in plugins derive from this base class. We only give one explicit example, that of the class included in op_2.py since the 4 classes included in op_1.py would be treated in exactly the same way.


from plugins.base import Plugin

class operator_pow_token(Plugin):
   symbol = '**'
  lbp = 30
  def led(self, left):
      return left ** expression(30-1)

Note that we added one more line of code to the class definition. We are now ready to deal with the plugin discovery and registration.

Rather than hard-coding the information about which plugin files to import as we did when we simply modularize the application, we give a way for our program to automatically find plugins. With the file structure that we have created, this can be accomplished as follows:


def find_plugins(expression):
   '''find all files in the plugin directory and imports them'''
   plugin_dir = os.path.dirname(os.path.realpath(__file__))
   plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
   sys.path.insert(0, plugin_dir)
   for plugin in plugin_files:
       mod = __import__(plugin)
       mod.expression = expression

Note that the last line of code is included because of the "wart" mentioned in the previous post and would not usually be included. To be safe, we should probably have ensured that expression was not already defined in the modules to be imported since, in theory, Python files other than plugins (such as __init__.py) might be present in the plugin directory. In this tutorial series we will often ignore the need to insert try/except clauses to simplify the code.

While we have imported the modules containing the plugins, they are not yet known in a useful form by the main application. To do so is very simple in this class-based approach, thanks to Python's treatment of (sub-)classes. Here's the code to do this:


def register_plugins():
 '''Register all class based plugins.

    Uses the fact that a class knows about all of its subclasses
    to automatically initialize the relevant plugins
 '''
 for plugin in Plugin.__subclasses__():
     OPERATORS[plugin.symbol] = plugin

That's it! It is hard to imagine anything simpler. With this last definition, the entire base.py module can be written as:


import os
import sys

OPERATORS = {}

class Plugin(object):
  pass

def init_plugins(expression):
  '''simple plugin initializer
  '''
  find_plugins(expression)
  register_plugins()

def find_plugins(expression):
  '''find all files in the plugin directory and imports them'''
  plugin_dir = os.path.dirname(os.path.realpath(__file__))
  plugin_files = [x[:-3] for x in os.listdir(plugin_dir) if x.endswith(".py")]
  sys.path.insert(0, plugin_dir)
  for plugin in plugin_files:
      mod = __import__(plugin)
      mod.expression = expression

def register_plugins():
  '''Register all class based plugins.

     Uses the fact that a class knows about all of its subclasses
     to automatically initialize the relevant plugins
  '''
  for plugin in Plugin.__subclasses__():
      OPERATORS[plugin.symbol] = plugin

In the next post, I will show another simple alternative approach similar to the one used in Crunchy.

Plugins - part 2: modularization

In the first post on the Plugins series, I introduced the small application used to demonstrate how one could modularize applications using a plugin architecture. The digital ink was barely dry on that post that already two people rose to the challenge and presented their solution, one using the standard method with the Zope Component Architecture, the other a modified method using grok. I will comment on these two solutions later in this series.

With apologies to the more advanced users, I have decided to proceed fairly slowly and cover many simple concepts with this series of plugins. Thus, this second post will not yet discuss plugins, but simply lay the groundwork for future posts. By the way, for those interested, and as pointed out by Lennart Regebro in his post, all the code samples that I will use can be browsed at, or retrieved from, my py-fun google code repository.

As a first step before comparing different approaches to dealing with plugins, I will take the sample application introduced in the first post and modularize it.

The core application (calculator.py) is as follows:


import re

from plugins.base import OPERATORS, init_plugins

class literal_token(object):
  def __init__(self, value):
      self.value = value
  def nud(self):
      return self.value

class end_token(object):
  lbp = 0

def tokenize(program):
  for number, operator in re.findall("\s*(?:(\d+)|(\*\*|.))", program):
      if number:
          yield literal_token(int(number))
        elif operator in OPERATORS:
            yield OPERATORS[operator]()
      else:
          raise SyntaxError("unknown operator: %r" % operator)
  yield end_token()

def expression(rbp=0):
  global token
  t = token
  token = next()
  left = t.nud()
  while rbp < token.lbp:
      t = token
      token = next()
      left = t.led(left)
  return left

def calculate(program):
  global token, next
  next = tokenize(program).next
  token = next()
  return expression()

if __name__ == "__main__":
  init_plugins(expression)
  assert calculate("+1") == 1
  assert calculate("-1") == -1
  assert calculate("10") == 10
  assert calculate("1+2") == 3
  assert calculate("1+2+3") == 6
  assert calculate("1+2-3") == 0
  assert calculate("1+2*3") == 7
  assert calculate("1*2+3") == 5
  assert calculate("6*2/3") == 4
  assert calculate("2**3") == 8
  assert calculate("2*2**3") == 16
  print "Done!"

For the next few posts, when I demonstrate some very simple plugin approaches, this core application will remain untouched. This is one important characteristic of plugin-based application: in a well-designed application, plugin writers should not have to modify a single line of the core modules to ensure that their plugins can be used.

Communication between plugins and the core application is ensured via an Application Programming Interface (API) unique to that application. In our example, the API is a simple Python dict (OPERATORS) written in capital letters only to make it stand out.

In a sub-directory (plugins), in addition to an empty __init__.py file, we include the following three files:

1. base.py


OPERATORS = {}

def init_plugins(expression):
    '''simulated plugin initializer'''
    from plugins import op_1, op_2

    op_1.expression = expression
    op_2.expression = expression

    OPERATORS['+'] = op_1.operator_add_token
    OPERATORS['-'] = op_1.operator_sub_token
    OPERATORS['*'] = op_1.operator_mul_token
    OPERATORS['/'] = op_1.operator_div_token
    OPERATORS['**'] = op_2.operator_pow_token

2. op_1.py


class operator_add_token(object):
   lbp = 10
   def nud(self):
       return expression(100)
   def led(self, left):
       return left + expression(10)

class operator_sub_token(object):
   lbp = 10
   def nud(self):
       return -expression(100)
   def led(self, left):
       return left - expression(10)

class operator_mul_token(object):
   lbp = 20
   def led(self, left):
       return left * expression(20)

class operator_div_token(object):
   lbp = 20
   def led(self, left):
       return left / expression(20)

and 3. op_2.py


class operator_pow_token(object):
   lbp = 30
   def led(self, left):
       return left ** expression(30-1)

The last two files have been simply extracted with no modification from the original application. Instead of having 2 such files containing classes of the form operator_xxx_token, I could have included them all in one file, or split into 5 different files. The number of files is irrelevant here: they are only introduced to play the role of plugins in this application.

The file base.py plays the role here of a plugin initialization module: it ensures that plugins are properly registered and made available to the core program.

Since I wanted to change the original code as little as possible, a "wart" is present in the code as written since it was never intended to be a plugin-based application: the function expression() was accessible to all objects in the initial single-file application. It is now needed in a number of modules. The file base.py takes care of ensuring that "plugin" modules have access to that function in a transparent way. This will need to be changed when using some standard plugin frameworks, as was done in the zca example or the grok one.

In the next post, I will show how to take this now modularized application and transform it into a proper plugin-based one.

Thursday, December 18, 2008

Plugins - part 1: the application

My interest in plugins started two years ago listening to Ivan Krstić talk about the OLPC. Following his talk, I wrote the following on edu-sig:

One open issue (as I understand it) is that of finding the "best practice" for plugins. The idea is that the core programs should be as small as possible but easy to extend via plugins. I thought that there already was a "well known and best way" to design plugins - and it was on my list of things to learn about (to eventually incorporate rur-ple within crunchy).

After discussing this off-list with Johannes Woolard, I concluded that we should try to redesign Crunchy to make use of plugins. While I was thinking about how we might proceed to do this, Johannes went ahead and implemented a simple plugin framework which we eventually adopted for Crunchy.

While there are a few agreed-upon "standards" when it comes to dealing with plugins in Python (such as setuptools and Zope Component Architecture), I tend to agree with Ivan Krstić's observation that there are no "best practice" for plugins - at least, none that I have seen documented. As what might be considered to be a first step in determining the "best practice" for writing plugin-based applications with Python, I will take a sample application, small enough so that it can be completely included and described in a blog post, and not written with plugins in mind. I thought it would be a more representative example to use an arbitrary sample application, rather than trying to come up with one specifically written for the purpose of this series of post.

The application I have chosen is a small modification of an expression calculator written and described by Fredrik Lundh, aka effbot, a truly outstanding pythonista. The entire code is as follows:


""" A simple expression calculator entirely contained in a single file.

See http://effbot.org/zone/simple-top-down-parsing.htm for detailed explanations
as to how it works.

This is the basic application used to demonstrate various plugin frameworks.
"""

import re

class literal_token(object):
   def __init__(self, value):
       self.value = value
   def nud(self):
       return self.value

class operator_add_token(object):
    lbp = 10
    def nud(self):
        return expression(100)
    def led(self, left):
        return left + expression(10)

class operator_sub_token(object):
    lbp = 10
    def nud(self):
        return -expression(100)
    def led(self, left):
        return left - expression(10)

class operator_mul_token(object):
    lbp = 20
    def led(self, left):
        return left * expression(20)

class operator_div_token(object):
    lbp = 20
    def led(self, left):
        return left / expression(20)

class operator_pow_token(object):
    lbp = 30
    def led(self, left):
        return left ** expression(30-1)

class end_token(object):
   lbp = 0

def tokenize(program):
   for number, operator in re.findall("\s*(?:(\d+)|(\*\*|.))", program):
       if number:
           yield literal_token(int(number))
        elif operator == "+":
            yield operator_add_token()
        elif operator == "-":
            yield operator_sub_token()
        elif operator == "*":
            yield operator_mul_token()
        elif operator == "/":
            yield operator_div_token()
        elif operator == "**":
            yield operator_pow_token()
        else:
           raise SyntaxError("unknown operator: %r" % operator)
   yield end_token()

def expression(rbp=0):  # note that expression is a global object in this module
   global token
   t = token
   token = next()
   left = t.nud()
   while rbp < token.lbp:
       t = token
       token = next()
       left = t.led(left)
   return left

def calculate(program):
   global token, next
   next = tokenize(program).next
   token = next()
   return expression()

if __name__ == "__main__":
   assert calculate("+1") == 1
   assert calculate("-1") == -1
   assert calculate("10") == 10
   assert calculate("1+2") == 3
   assert calculate("1+2+3") == 6
   assert calculate("1+2-3") == 0
   assert calculate("1+2*3") == 7
   assert calculate("1*2+3") == 5
   assert calculate("6*2/3") == 4
   assert calculate("2**3") == 8
   assert calculate("2*2**3") == 16
   print "Done!"

The latest version used can be found online.

In the above code, I have highlighted in red classes that will be transformed into plugins. I have also highlighted in green hard-coded if/elif choices that will become indirect references to the plugin components.

In the next post in this series, I will break up this single file in a set of different modules as a required preliminary step before transforming the whole applications into a plugin-based one, with a small core. In subsequent posts, I will keep the core constant and compare various approaches that one can use to link the plugins with the core.

Wednesday, December 17, 2008

Seeing double at Pycon 2009

Jesse Noller is going to give two talks at Pycon 2009. So is Tarek Ziadé. And Mike Fletcher is as well. And Brett Cannon has a talk and a panel. So far there I have not seen any post on Planet Python about someone giving just one talk.

I would hate to be the one breaking the streak. So, I might as well announce that I will be giving two talks as well. :-)

Not surprisingly, the first one is about Crunchy. The title of the talk is Learning and Teaching Python Programming: The Crunchy Way, and the abstract reads as follows:

Crunchy (http://code.google.com/p/crunchy) is a program that transforms a static Python tutorial into an interactive session within a browser. In this talk, I will present Crunchy, focusing on the features that are specifically designed to be helpful in a formal teaching setting.

Not exactly Earth-shattering but hopefully of interest to anyone that has to teach programming in a formal setting or who would just be interested in showing off Python to anyone. This Crunchy talk is, of course, not going to be your traditional slide-based talk but rather more like an interactive demo using Crunchy. I am hoping to have a few surprises by the time the conference occurs.

My other talk is going to be very different. I doubt very much that I will be using Crunchy for it. The title is Plugins and Monkeypatching: increasing flexibility, dealing with inflexibility, and the abstract reads as follows:

By using plugins, one can create software that is easily extensible by others, thereby promoting collaborative development. The flip side of extensible software occurs when dealing with some standard framework whose interface is closed but which does not do exactly what is desired. In this case, monkeypatching may be worth considering.

In this talk, I'll give concrete examples of both plugin design and using monkeypatching, using small code samples from existing projects, and discuss the advantages and the shortcomings of the methods used. I will also include the design of a tiny, but flexible module for generating svg code - and compare it with other existing approaches.

I can not pretend to even come close to being an expert about designing plugin based applications. Still, I felt that I have had some potentially useful experiences to share about these topics which motivated my talk proposal. Now that it has been accepted, I have started working on fleshing out the original outline.

In preparation for the actual talk, which will not go into much code details due to time constraints, I plan to start a short series of posts about plugins. In the first post I will give an overview of a simple application (a calculator) that is written as a single file. In the second post, I will reorganize the code so as to use multiple files, with a number of modules located in a "plugins" directory, laying out the groundwork for working with actual plugins. Subsequent posts will be used to demonstrate different approaches used to transform the application into a truly plugin-based one.

Of course, the plugin model used in Crunchy will be one approach showcased. A second one (which I have already implemented) is a simple class based one inspired by a tutorial written by Armin Romacher. I also plan to demonstrate how to use the Zope component architecture approach as well as the setuptools based method (and possibly others depending on suggestions I might receive).

Since I have never actually written any code using the Zope component architecture or the setuptools based approach, I thought it would be interesting to do this in a truly open-source spirit. Therefore, once I have written the first two or three posts in this series, I would like to invite anyone interested to contribute their own code demonstrating their favourite framework. This way, experts could make sure that their favourite framework is properly showcased, and not misrepresented by me. Interested parties can contribute either by sending me the code directly or by blogging about it. (If your blog appears on either planet.python.org or planetpython.org, I will most likely read it.)

Anyone who contributes in this way to my talk will be mentioned at Pycon AND receive half of the stipend I get as a presenter. ;-)

Friday, November 28, 2008

Thwarted by lack of speed

I was hoping to make an announcement of a new cool app based on Google's App Engine but unfortunately I have been thwarted by Python's relative lack of speed.

I have started working on a new version of Crunchy that would run as a web app on Google's servers. While the current version of Crunchy fetches existing html pages, processes them and displays them in the browser, this new version would retrieve html page content (in reStructuredText format) from Google's datastore, transform it into html, process it to add interactive elements, and then displays them.

This new app was going to be usable as a wiki to create new material. This was my starting point, greatly helped by an already existing wiki example that I adapted to use reStructuredText. When requesting a page, the following was supposed to happen:

1. reStructuredText content (for the body of the html page) is fetched from the datastore.
2. said content is transformed (by docutils) into html
3. html content is further processed by modified "crunchy engine" to add interactive elements.
4. modified html content is inserted in page template and made available.

The user would then be able to enter some Python code which could be send back to the App Engine using Ajax for processing and updating the page display.

A normal user would only be able to interact with already existing pages. Special users ("editors") only would have been able to add pages. I was hoping that people teaching Python would be interested in writing doctest-based exercises and that a useful collection could be implemented over time.

Unfortunately, this approach can not work, at least not using Google's App Engine on Google's own servers. :-(

Just playing with small pages, steps 1 and 2 are long enough that I get warnings logged mentioning that requests are taking too long. I know from experience that step 3 (which I have not started to implement/port from the standard Crunchy) can take even longer for reasonably size pages. So, this does not appear to be feasible ... which is unfortunate.

I think I will continue to develop this app to be used as a local one and perhaps write a second wiki-based app that would take html code with no further processing. I could use the first one to create a page, have it processed and use the "view source" feature of Firefox to cut and paste the content into the online app. This would remove the need for any processing of pages on Google's servers - only Python code execution would need to be taken care of. (Of course, a user could enter some code sample that would take too long to execute and hit Google's time limit ...)

If anyone has a better idea, feel free to leave it as a comment.

Saturday, November 01, 2008

docpicture progress

For those interested, docpicture can now display images from the web. There's also a somewhat silly example where I embedded the code for a matplotlib example inside a docstring and have it displayed as a plot when viewing the docstring via docpicture inside a web browser. In order to do so I had to exec the code which is not exactly good practice ... but it serves to highlight the need to either only allow "parsers" from the standard distribution or require the user to give permission to a parser to be able to register itself with docpicture while it is running. I chose this second approach, although if you run the demo, you will not be given the opportunity to approve or not the parser - it will be done for you. This may need to be revisited...

I just announced a new release on the Python list. You can get docpicture 0.2 from here.

Wednesday, October 29, 2008

svg mathematical equation

Ok, it's done: mathematical equations generated dynamically and displayed as svg graphics. Only using the standard Python library ... and one "tiny" additional download: matplotlib. Here's the first result (saved as a "hard-copy"; you may have to download the page and reopen it locally using Firefox.)

Note: do not bother looking for the files in the "py-fun" repository where I had the first release of docpicture. I will clean up things a bit and do a new release from a different place.

As usual, comments & suggestions are welcome.

Tuesday, October 28, 2008

docpicture and uml sequence diagrams

In a previous post about docpicture, I gave an example of a graphics generated from this site as something that would be desirable to do. (You can find more examples here.) Well, it turned out to be easy to do ... at the cost of a server connection. I used the example given to embed a graphics inside a page and ... voilà, it is done. As long as one has a live internet connection (and assuming the websequencediagram server is not down), a graphics is generated as requested.

Eventually, I still would like to implement my own parser to create svg code for uml sequence diagrams rather than relying on an external service.

Monday, October 27, 2008

docpicture: initial release

The subject line says it all. It's a small download: less than 22 kB, available from here. Feedback and suggestions are definitely welcome.

Sunday, October 26, 2008

docpicture: working ... and a query.

docpicture (see previous posts) is now working as a full prototype. By this, I mean that instead of doing


>>> help(some_object)

at the Python prompt, one can do


>>> from docpicture import view
>>> view(some_object)

and some_object's docstring will be displayed in your webbrowser, with any docpicture directive being translated so as to embed a nice picture. Well, by "any", I mean any turtle directive conforming to the limited syntax I have included.

When I compare the output of help() with that of docpicture.view(), I am struck at how much more information than simply the object's docstring is included. I have tried (briefly) to play with the pydoc module to see if I could redirect the output of help() to a string that I could process with docpicture.view() ... but to no avail.

If anyone knows how I could do this simply, I would be very grateful.

docpicture is going to be released (version 0.1) as soon as I complete a decent "readme" file.

--UPDATE-- Ok, after playing some more with pydoc, I found out how to do this.

In my module, I do the following:


import pydoc
from StringIO import StringIO
my_stdin = StringIO()
def my_pager(text):
   my_stdin.write(pydoc.plain(text))
   return
pydoc.pager = my_pager

pydoc.help(obj)
retrieved = my_stdin.getvalue()
my_stdin.close()

and use the retrieved text as I wish.

Friday, October 24, 2008

docpicture: getting closer

This is just a progress report for the curious among you: the previous two images were generated automatically from the docpicture code written above them. If everything goes well, by the end of the weekend I'll be ready to give a sneak preview of the code to anyone interested. Feel free to contact me.

Sunday, October 19, 2008

docpicture + svg generation: first prototype working

As outlined in a previous post, I have decided to use svg to embed pictures in html pages generated from docstrings. Of course, this could be generalized to other cases than docstrings; for example, this could be implemented as a reStructuredText directive. In the course of playing with generating such pages with inline svg code, I observed the following:

If a file is saved locally and loaded within Firefox, it should be saved with a ".xml" (or possibly ".xhtml") extension.
If a file is served dynamically from a server, all that is needed is that its content be identified as "application/xhtml+xml" [as I had mentioned previously].
If the file is put on a "generic webserver" that can't be configured by the user, Firefox will ignore the svg code if the extension of the file is ".xml" or ".html". However, I did find a workaround: use a ".xhtml" extension and, when prompted by Firefox as to what application to use to open such file, select Firefox itself. The file will be downloaded locally and displayed correctly. At least, this is what happens on a Mac with Firefox 3.

There might be another way to do this; if so, I would be interested in knowing how. In the meantime, for those interested, here is the output of a first working test case.

Update: The test case has been improved with styling.

Update 2: A new picture perhaps gives a better idea of a more realistic use case.

Seeking advice on parsing

Dear Lazyweb,

As a follow-up to my previous post, I have a question...

Suppose you wanted to design an application that used parsing as a core element and you wanted this application to be easily extended by users. Furthermore, you were hoping that the users would contribute back some parsers that could be included in future versions. Would you:

1a) Use pyparsing and require all potential users of your application to download it separately.
1b) Use pyparsing but include it bundled with your application.
2) Use regular expressions (re module in standard library) and expect everyone to do the same.
3) Use some other module in the standard library.
4) Use some other 3rd party parsing package.

Saturday, October 18, 2008

More on docpictures and (almost) minimal example of web server with inline svg

After playing some more with the idea of embedding pictures in docstrings, I've settled on using dynamically created svg images instead of png images. The basic idea (which I'll write about in more details later) is to have something like


.. docpictures:: some_type
highly readable description

and have the docpicture module parse the "highly readable description" based on the syntax defined in some_type. Note that I chose this notation to be compatible with reStructuredText directives. The "highly readable description" will depend on the context. For example, the mathematically inclined will be able to read this:


.. docpictures:: equation
e^{i\pi} = -1

while the following might be easy to understand by programmers:


..docpicture:: uml_sequence
User --> Browser: clicks on a link
Browser -> Crunchy: requests file
Crunchy --> Browser: Authentication Request
Browser --> Crunchy: Authentication Response
note over Crunchy: Retrieves and processes file
Crunchy -> Browser: Sends processed file
Browser --> User: Displays file

In this last example using the syntax of this site which generates the following picture:

(Btw, the picture above is generated automatically each time this page is loaded - so it depends on the availability of the server. I have used a static version of this picture in the documentation for Crunchy.)

Using svg to do graphics is fairly easy. Using svg as embedded objects in an html document requires a bit of searching on the internet. Creating such documents and displaying dynamically requires even more searching (or perhaps more careful reading...). The thing important to remember is to serve the document as "application/xhtml+xml" instead of the usual "text/html". I thought it would be useful to share an almost minimal working example (tested only on Firefox) and perhaps save some time for others that would like to do the same. Feel free to adapt it as you like.

svg_test = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg">
<head></head>
<body>
<h1>SVG embedded inline in XHTML</h1>
<svg:svg width="300px" height="200px">
<svg:circle cx="150px" cy="100px" r="50px" fill="%s"
stroke="#000000" stroke-width="5px"/>
</svg:svg>
</body>
</html>
"""


import BaseHTTPServer
import webbrowser

colors = ["#330000", "#660000", "#990000",  "#cc0000", "#ff0000"]
index = 0
class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
 global index
 self.send_response(200)
 self.send_header('Content-type', 'application/xhtml+xml')
 self.end_headers()
 self.wfile.write(svg_test% colors[index % 5])
 index += 1

port = 8000
server = BaseHTTPServer.HTTPServer(('',port), WebRequestHandler)
webbrowser.open("http://127.0.0.1:%s"%port)
server.serve_forever()

And, if you know how to suppress the output of the webserver, feel free to leave a comment.

Tuesday, October 14, 2008

Viewing embedded pictures within docstrings

In a recent post on the Python mailing list, it was suggested that it would be useful if "small" pictures could be embedded within docstrings as additional information. As is often the case, many words were written ... but little code was produced. As I have been guilty of this myself in the past, I decided that it was time to do things differently. After a quick prototype sent to the Python list, I wrote this recipe which gives a simple way to embed and display images inside a docstring in a totally transparent way.

Is this something that people would find useful? Any suggestions for improvements? Should I implement this within Crunchy? (Btw, Crunchy 1.0 alpha 1 has been released just last week).

Sunday, June 22, 2008

Monkeypatching doctest

The Python doctest module rocks. Lately, I have been using it to write unit tests for Crunchy: for each module, I write a reStructuredText file which contains sample tests written as simulated interpreter sessions, using doctest.testfile(). This has worked really well in general ... however, I have encountered one small annoyance, which I managed to get rid of in an "elegant" way using Monkeypatching.

doctests allow the use of directives. One "powerful" directive is the ELLIPSIS directive. Quoting from the documentation:

When specified, an ellipsis marker (...) in the expected output can match any substring in the actual output. This includes substrings that span line boundaries, and empty substrings, so it's best to keep usage of this simple. Complicated uses can lead to the same kinds of "oops, it matched too much!" surprises that .* is prone to in regular expressions.

Unfortunately, I encountered a case where the ellipsis marker did not allow enough matching! Consider the following situation: I have a program (Crunchy!) that saves the user's preferences (including the language) in a configuration file each time its value is changed. It also gives some feedback to the user whenever this happens.


>>> original_value = crunchy.language
>>> set_language('en')  # setting this value for some standardized tests
Language has been set to English

At the end of the test, I want to restore the original value.

>>> set_language(original_value) #doctest: +ELLIPSIS
...

Here I want the ellipsis (...) to match the string that is going to be printed out in the original language as I have no idea what this string will look like. The problem is that the ellipsis in this case is thought to be a Python (continuation) prompt and not a string that is "matched". One workaround that I had been using was to modify set_language to add a parameter ("verbose") that was set to True by default but that I could turn off when running tests. While this is simple enough that it surely would never (!) introduce spurious bugs, it does not feel right; one should not modify functions only for the purpose of making them satisfy unit tests.

According to the documentation,

register_optionflag(name)

Create a new option flag with a given name, and return the new flag's integer value. register_optionflag() can be used when subclassing OutputChecker or DocTestRunner to create new options that are supported by your subclasses. register_optionflag should always be called using the following idiom:

  MY_FLAG = register_optionflag('MY_FLAG')

This is great ... except that I want to used doctest.testfile() which does not allow me to specify a subclass of OutputChecker to use instead of the default. Also, I wanted to use as much of possible of the existing doctest module, with as little new code as possible.

This is where monkeypatching comes in.

After a bit of work, I came up with the following solution:


from doctest import OutputChecker
original_check_output = OutputChecker.check_output
import doctest

IGNORE_ERROR = doctest.register_optionflag("IGNORE_ERROR")

class MyOutputChecker(doctest.OutputChecker):
    def check_output(self, want, got, optionflags):
       if optionflags & IGNORE_ERROR:
           return True
       return original_check_output(self, want, got, optionflags)

doctest.OutputChecker = MyOutputChecker

failure, nb_tests = doctest.testfile("test_doctest.rst")
print "%d failures in %d tests" % (failure, nb_tests)

And here's the content of test_doctest.rst


Test of the new flag:

>>> print 42
42
>>> print 2 # doctest: +IGNORE_ERROR
SPAM!

This yields a test with no failures. There might be a more elegant way of doing this; if so, I would be very interested in hearing about it.

Friday, April 18, 2008

Thoughts on Google Summer of Code 2008 - part 1

In just a few days, Google will make some announcements that will please many hundreds of students and disappoint even more. I think we should all focus on the positive side when the announcements are made. It is, after all, a fantastic thing that a company is spending millions of dollars so that some students get the chance to program on Open Source software as a summer job.

Just think of it. Who could have predicted, just five years ago, that a company would spend that kind of money on students who would work on someone else's project?

This is amazing - and many are now taking it for granted.

I find it great that the Python Software Foundation is an organization that can mentor SoC students. With the excellent supporting work of James Tauber as coordinator, many promising students are going to be paired with a mentor, hopefully leading to great projects to be completed this summer.

I have seen some grumblings on some SoC related lists that have made me thought about some of the "problems" I have seen. Note that these are very minor compared with the strong positive points. I will be discussing those in part 2, after the official announcements are made.

Because the PSF is an umbrella organization, most students work on different projects, unrelated with each other. As a result, they tend to have limited interactions with the greater Python community. I think there should be a "meeting place" where all the students would meet - perhaps a mailing list to which they have all to contribute once a week, sharing their progress, etc.
Not enough positive publicity is given to "successful students", i.e. those that continue to contribute to the Python community after the summer is over. For Crunchy, it has been one success (Johannes Woolard) out of a total of 3 students over the past 2 years. I don't know of many other success stories from other projects ... Alex Holkner comes to mind ... but I feel I should know more names of successful students. (I know there's another Alex or Alessandro who has contributed to the Python core and was involved with GHOP....)
With the exception of a few people like James Tauber and Titus Brown with whom I have had a few email exchanges, I do not feel that as a past/potential mentor I am as much part of a community as I feel should be the case. There is a mentor discussion list, but it does not seem to be the kind of place to generate community building discussions.

In terms of projects submitted, I would describe them to belong in the following categories:

1. Contributions to the "standard" core (cpython code, or standard library)

2 a. Contributions to "non-standard core" (e.g. Jython, PyPy, TinyPy?)
2 b. Contributions to 3rd party libraries (e.g. Numpy, Pygame)
2 c. Contributions to major projects whose end users have to use Python (e.g. SAGE)
2 d. Contributions to projects that can be used to teach Python [Crunchy, of course ;-), but there are others ... that will be for part 2]

3 a. Contributions that propose some new "standards" for Python programmers, never discussed before in the Python community.
3 b. Projects that happen to be written in Python, but whose end users are not exposed (or minimally exposed) to Python.
3 c. Projects that are not written in Python, that may or may not be usable in all OS, and that aren't more useful to Python programmers than they would be to people using other languages.
3 d. et cetera

Assuming that all projects are well-thought of (which is not always the case), I feel that:

Projects in category 1 deserve to be fully supported. The Python community need more capable people contributing to the core to prevent burnout for the current contributors. Perhaps, in a few years, after working some more on Crunchy, I'll feel capable of joining that group and contributing effectively (and have the time to do so).
Projects in categories 2 a-d are worthy of support. There are of course more such projects submitted than can possibly be supported, so some difficult choices had to be made. (Kudos to James for guiding this process.) Many people are going to be disappointed, but this was unavoidable.
Projects in categories 3 a-d are a puzzle to me. I don't understand their appeal for the PSF (and I know I am not the only one), but it seems that very few people are willing to take a public stance on this and debate the issue. Note that this comment is made as an observation on the discussions that took place so far and does not necessarily reflect on any decision that has been made.

This is it for the negative comments. I can't wait for the announcements from Google to focus entirely on the more positive side.

Sunday, April 13, 2008

Firefox 3b5: the pain of using the bleeding edge

After seeing so many positive reviews of the upcoming Firefox 3, I decided to try the latest beta (5) version. It seems indeed to be fast when dealing with complex javascript. While there are a few features I am not too keen about [1], I liked the extra speed (and the reduced RAM usage) so much that I have been using it almost exclusively. That is until now, since I can't rely on it to test Crunchy. Update: this is no longer true, thanks to a reader's comment. The fix was to move the onblur event to the file input, indicated by HERE.

To load a local html file [2] into Crunchy, a two-step process has to be used due to normal javascript security:


<form name="browser_local"
onblur="document.submit_local.url.value =
    document.browser_local.filename.value">
<input name="filename"  type="file" HERE >
</form>

<form action="/local" method="get" name="submit_local">
 <input name="url" type="hidden">
 <input class="crunchy" type="submit">
        value="Load local html tutorial" />
</form>

The first form allows to browse the local drive for a particular file. The second one sends the chosen file's path to the browser as an argument to the "/local" action, something like /local?url=file_path. Unfortunately, when using Firefox 3 beta 5, no argument is passed and we get /local?url= instead. And of course no file can be loaded.

This file browser feature is not something I test regularly when working on Crunchy, nor is it something that can be tested via standard Python unit tests. [3] When I noticed the new bug, it never crossed my mind that this could be a "new Firefox feature" and thought it was something I had broken in Crunchy's code. [4] It was only after I tried a few old releases of Crunchy (to figure out when "I" broke the code) that I figured out that the problem was not due to anything I wrote.

I have not been able to find any note about this new behavior of Firefox. Since this is still a beta, I guess I'll have to wait until the final Firefox 3 release to figure out if I need to change the way I load files. [5]

====
[1] One change I don't like is the rather gaudy auto-suggest list when typing a url.

[2] The same method is used to load reStructuredText files and others.

[3] I really need to investigate twill for this.

[4] One more reason to have a complete unit test coverage. Since I don't, I automatically assumed it was something I had done.

[5] If anyone has any lead as to how to do so reliably in Firefox 3b5 as well as with other browsers, I'd be keen to hear about it.

Friday, April 11, 2008

Shell meme

I'm responding to peer pressure. I pretty much only use the shell for one thing and rarely restart it...


andre$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head
322 python
76 cd
26 ls
11 grep
8 pwd
8 find
5 sudo
3 rm
3 def
2 svn

Saturday, March 22, 2008

Using Clone Digger

There's a new tool available for Python programmers: Clone Digger. While it has not been officially released, it is available from the svn repository. Clone Digger finds code duplications in a given project, and creates a fairly comprehensive report (html file). Seeing the duplications on a screen is a powerful motivation for refactoring.

Check it out!

Update:

Just to make it clear: I had nothing to do with this project; I just found out about it via the gsoc-python-mentors list.

Inspiration and persistence

While ~~mindlessly wasting time instead of programming~~ selectively reading the internet, I came across this gem by Seth Godin which I reproduce in its entirety:

Persistence

Persistence isn't using the same tactics over and over. That's just annoying.

Persistence is having the same goal over and over.

That's it.

A wiser person would most likely leave it at that. However, this lead me thinking of my goals when it comes to programming which I thought I should write down if only to help me reflect upon them again at a later time. I can sum them up as follows:

Do something that is fun but that gives me some sense of accomplishment rather than just wasting time.
Find ways to make it easier for others to learn programming (in Python).

In doing so, I have found myself oscillating between two extremes:

Trying to follow the "release early, release often" philosophy.
Trying to get everything "just perfect" before releasing anything.

Trying to get things "just perfect" is something that can lead to procrastination and delays. As an example, rur-ple's version 1.0 release candidate 3 has not been updated since July 2007. The next version should be the final 1.0 ... but somehow, I am not happy with many details and I'd like to get everything right for 1.0. Too often I read about (usually commercial) software which is officially released and is considered by its users to be a Beta version. All open source programmers I have met have a sense of pride in their work that I share. So I postpone the final release and end up working on something else...

I went the other way with a little utility called lightning compiler (now at version 2.1), whose version 1.0 was released as a recipe in the online Python cookbook. Much of the rapid evolution of lightning compiler came from user feedback, as expected from the "release early, release often" philosophy. Yet, following the same philosophy generated relatively little feedback for rur-ple of for crunchy to date. I did get some feedback for rur-ple which has been used at an elementary school in Austria, at a high school and a university in the U.S., among others, but it has often been very indirect.

Still, I am persistent. Following Seth Godin's definition of persistence, my second goal written above can be described as finding a solution as to Why Johnny can't code. Or, as I have written elsewhere

My goal is to provide an introduction to programming which is as "smooth" as possible. We sometime hear the phrase "steep learning curve" to characterize some difficult to grasp concept. I think it is important to have as few "steep learning curves" as possible in the learning process. GvR [Guido van Robot] uses a slightly easier syntax than Python ... but at the expense of having a "step-like learning curve" when one wants to go from GvR's world to Python programming. Since Rur-ple uses Python, there is no transition to speak of.

Both rur-ple and Crunchy, and to a lesser extent lightning compiler (which has been incorporated within rur-ple) have been inspired by that goal.

However, sometimes I stray from that goal. For example, inspired by an earlier post on Georg Brand's remarkable Sphinx, Crunchy now includes a prototype for an automated documentation testing framework along the lines of sphinx.ext.doctest which was released yesterday. My intention is to update Crunchy's implementation so that it can be totally compatible with Sphinx's. And while I believe that this is a neat (and fun!) thing to include in Crunchy, it only very indirectly contribute to my overall goal and ends up delaying the 1.0 release for Crunchy.

Blogging too can be a distraction. However, it is my hope that it may generate a few comments that will contribute to inspire me to make Crunchy even more useful.

Success is the result of inspiration and persistence.

Friday, March 07, 2008

Crunchy: Pycon 2008 release

Crunchy is getting really close to a 1.0 version. To mark the Pycon 2008 event (that I won't be able to attend), I just did a new release (0.9.9). It has a few new goodies that I won't list here, leaving Johannes do the demonstration. As for me, I am heading down South for a vacation with my kids.

Note: the opening Crunchy page indicates that this is version 0.9.8.6 - which is incorrect.

What is left to be done for version 1.0 is cleaning up the existing documentation (proofreading, proofreading, proofreading) and adding a few more pages to it. New features will have to wait until after 1.0.... unless we get feedback from Pycon attendees for "must have" features that we could implement quickly.

As far as I know, there are no bugs (famous last words). If you find any, please let us know.

Friday, February 29, 2008

Pycon and Crunchy

This year's Pycon program looks very interesting. I wish I could be there but, alas, the timing was just wrong for me this year. This is doubly disappointing as I would have been able to meet with Johannes Woolard in the flesh. Yes, forget Guido van Rossum, Alex Martelli and other famous names: the one person I wanted to meet is Johannes. For more than a year an a half, I have had the pleasure of collaborating with Johannes on Crunchy, without ever meeting him. This year, Johannes will be the one showing Crunchy off. I'm sure he'll do a great job.

And, if anyone is looking to hire a bright, young, hard-working programmer, Johannes will graduate from Oxford this year.