Sunday, December 30, 2007

More about Crunchy running under Python3.0a1

Just in time for the new year, and after a major reorganization of Crunchy's code to make it easier to write more unit tests for it, I did manage to get all of Crunchy's core functionality working under Python 3.0a1. Crunchy can now import html files that are not conforming to strict xml convention (i.e. not having closing tags for some elements) and thus create problems for ElementTree. The way it does that is to launch an external process under Python 2.5 (or whatever default version that is invoked via typing "python" at a terminal - but it has to be a 2.x version for it to work). This process imports the file, "cleans" it (using a combination of BeautifulSoup, ElementTree and Crunchy's security module), and saves it locally. After a delay, Crunchy loads up the cleaned up file and display it. In this process, most of the styling is unfortunately removed. :-(

However, the good news is that I was able to load up the official 3.0 Python tutorial (work in progress) and try it out using Crunchy. I did find out one limitation of using Crunchy to do so. Crunchy encodes the Python output using utf-8 before forwarding to the browser. So, instead of having things like b'Andr\xc3\xa9' appearing on the screen, it would be converted to André. Thus, Crunchy is not a very good platform to teach about encoding/decoding of strings. For other aspects though, it is an ideal tool (if I may say so) for going through the Python tutorial: there is no need to switch back and forth between the browser and a separate Python environment to try things out. I still hope to have the time and energy to go through the entire 3.0 tutorial (something I have never done before, for 2.x) and see if I can find any bugs or come up with useful suggestions.

All I have left to do for the next release is to write up some documentation/tutorial on the new Turtle module and on launching Crunchy under Python3.0a1. With a bit of luck, this will all be finished before the end of the year.

In addition to the code reorganization mentioned above, I did fix a few bugs and made an improvement on Crunchy's Borg interpreters. For those that aren't familiar with it, Crunchy allows to embed a number of interpreters (html input box communicating with the Python backend) within a single page. These interpreters can either be isolated one from another (meaning that a variable defined in one interpreter is only known by that interpreter) or can share a common environment (aka Borg interpreters). Normally, in a single user mode, using a single open tab in Firefox, every time a new page is displayed, the Borg interpreters are effectively reset (the old ones are garbage collected and the new ones are created from an empty slate).

Previously, if one were to have multiple users (or multiple tabs open from the same user) on the same Crunchy server, all Borg interpreters ended up sharing the exact same state. This is not very convenient if two users are trying to use the same variable names! I had planned to address this "feature" at some point after the 1.0 release but was forced into it earlier due to the 3.0a1 work. The reason is the following.

To create "Borg interpreters", I was using the Borg idiom invented by Alex Martelli. It goes as follows:

class Borg(object):
'''Borg Idiom, from the Python Cookbook, 2nd Edition, p:273

Derive a class form this; all instances of that class will share the
same state, provided that they don't override __new__; otherwise,
remember to use Borg.__new__ within the overriden class.
'''
_shared_state = {}
def __new__(cls, *a, **k):
obj = object.__new__(cls, *a, **k)
obj.__dict__ = cls._shared_state
return obj

When using this idiom in a standard program under 3.0a1/2, a deprecation warning is raised about object.__new__() not taking any argument. When I was trying to make use of this idiom in Crunchy running under 3.0a1/2, the deprecation warning was actually replaced by an exception. Rather than trying to silence this exception, I decided to take a different approach and used instead the following:

class BorgGroups(object):
'''Inspired by the Borg Idiom, from the Python Cookbook, 2nd Edition, p:273
to deal with multiple Borg groups (one per crunchy page)
while being compatible with Python 3.0a1/2.
Derived class must use a super() call to work with this properly.
'''
_shared_states = {}
def __init__(self, group="Borg"):
if group not in self._shared_states:
self._shared_states[group] = {}
self.__dict__ = self._shared_states[group]

# The following BorgConsole class is defined such that all instances
# of an interpreter on a same html page share the same environment.

class BorgConsole(BorgGroups, SingleConsole):
'''Every BorgConsole share a common state'''
def __init__(self, locals={}, filename="Crunchy console", group="Borg"):
super(BorgConsole, self).__init__(group=group)
SingleConsole.__init__(self, locals, filename=filename)
The "group" variable is taken to be a unique id generated by Crunchy each time it processes a given html page. Thus, each page loaded by a different user (or the same user, at a different time) from the same Crunchy server will result in a unique set of Borg interpreters.

To be fair, I must admit that I did not come up with this solution totally on my own. A while ago, I asked for something like this on comp.lang.python. (Those interested in the details should search for "Borg rebellion".) I just derived the above solution from some suggestions made at that time.

Finally, in addition to all this, I found out a bug in code.py for Python 3.0a1/2. I tried to send an email to the python-3000 mailing list about it, but it was held up, waiting for a moderator approval for a few days. So, I canceled it and filed a bug report instead (which I should have done in the first place) on the bug tracker. I still haven't seen any follow up - perhaps due to the title I gave it. The bug is actually very easy to fix - three lines of code need to be replaced by a single one. The solution is related to my only other "official" contribution to Python to date. Hopefully, by this time next year, I'll have learned enough to contribute more to Python.

Thursday, December 27, 2007

Crunchy and Python 3.0a2

Continuing with my experiment of adapting Crunchy to Python 3.0, I managed to get Crunchy to start with Python 3.0a2 and get some code running from the editor - but not from the interpreter, nor the doctest. Most of the problems I have are dealing with bytes-to-string conversion and string-to-bytes. As mentioned by Guido van Rossum last June
  • We're switching to a model known from Java: (immutable) text strings are Unicode, and binary data is represented by a separate mutable "bytes" data type. In addition, the parser will be more Unicode-friendly: the default source encoding will be UTF-8, and non-ASCII letters can be used in identifiers
Later on, in a comment from that post, we find:
  • > In your presentation last night you had one slide which
    > talked about the "str" vs "bytes" types in Python 3000. On
    > the bottom of that slide was something like:
    >
    > str(b"asdf") == "b'asdf'"
    >
    > However, in discussing this slide (very briefly) you said
    > that a type constructors like "str" could be used to do
    > conversion. It seems like "str" is behaving more like
    > "repr" in this case, which seems unusual and less useful
    > to me. Was this a typo, or is this actually the way it's
    > supposed to work? What's the rationale?

    To be honest, this is an open issue. The slide was wrong compared to the current implementation; but the implementation currently defaults to utf8 (so str(b'a') == 'a'), which is not right either. The problem is that there are conflicting requirements: str() of any object should ideally always return something, but we don't want str() to assume a specific default encoding.

    To be continued...
This change seems innocuous enough...

As a web server, Crunchy sends to and receives information from the browser as "binary data" or "bytes". As a generalized Python interpreter, Crunchy manipulates the information as "strings". It appears that the "bytes" implementation is done much more completely in Python 3.0a2 than it was in Python 3.0a1. And this is the source of many problems.

For example, Crunchy sends from the browser some information about the path to which a Python file should be saved and its content as follows:

'/Users/andre/.crunchy/temp.py_::EOF::_from Tkinter import *\nroot = Tk()\nw = Label(root, text="Crunchy!")\nw.pack()\nroot.mainloop()'

This is sent as a binary stream which needs to be converted to the string written above. This conversion is done via str(...). Using Python 3.0a1 (and 2.4 and 2.5), the result was as above; splitting the string gave the following:

['/Users/andre/.crunchy/temp.py', 'from Tkinter import *\nroot = Tk()\nw = Label(root, text="Crunchy!")\nw.pack()\nroot.mainloop()']

Now, with Python 3.0a2, it gets slightly more complicated. The first string acquires a "b" prefix upon conversion (as mentioned in the comment from Guido's blog mentioned before). After splitting, the result is

["b'/Users/andre/.crunchy/temp.py", 'from Tkinter import *\\nroot = Tk()\\nw = Label(root, text="Crunchy!")\\nw.pack()\\nroot.mainloop()\'']

So, we now have a first string with a "b'" prefix embedded in it, and a second one without. It seems that each case will have to be handled carefully on its own. And I suspect more problems will show up as we get closer to the final 3.0 release.

I know, I know, I'm really not following the "recommended" practice, as quoted on Guido's blog. I should probably wait first for Python 2.6 to come out. Then, I should have a complete unit test coverage and use the conversion tool to create a Python 3.0 version .... However, I am not convinced that the conversion tool will be smart enough to know when a function (that I write) expect a "str" object and when it expect a "byte" one. Furthermore, the few unit tests I had worked fine under both Python 2.5 and 3.0 ... but some functions that I had written with the expectation that they would receive some string arguments did not work in "production code", as they were getting some bytes arguments. And this failed completely silently...

If I had to give some advice to someone about creating Python programs that can work with both Python 2.x and Python 3.x, I would say like Guido: don't. :-) Unless of course you are like me and are doing this for fun and to get to learn about the differences between Python 2.x and 3.x along the way. But then, "be prepared for the unexpected" like the following: turning on a few print statements (via a "debug flag") can result in breaking some code; turn them off and the code works again... Yes, it did happen to me - I still have to figure out how...

Crunchy and Python 3.0a1

crunchy running under Python 3.0a1

It is often said that a picture is worth a thousand words...

I have managed to make Crunchy run under Python 3.0a1. Some of the features are not working but the interpreter and the editor work. The turtle module I have been working on also works "nicely" (read: as slow as before) with this new Python version. Unfortunately, when it is run under Python 3.0a1, Crunchy can not load most pages - including those of the official Python 3.0 tutorial. The reason is that is uses ElementTree to parse pages and it is unforgiving when it comes to having unclosed tags (as in <link> and <meta...> for example); it also seems to not be able to handle the <script>s that are included on the page. I have not yet found a way to reliably "clean" the pages before parsing them with ElementTree. While I believe that I should be able to do so with a bit more work, there is a bigger problem...

Unfortunately, Crunchy does not run under Python 3.oa2, and the error messages I get have not been too helpful in figuring out the error. However, perhaps this is due to a faulty installation. What makes me think so is that when I start a 3.0a2 session at a terminal, I get an error message when I use exit(). This is most unexpected.

In any even, the next release should include the new crunchy turtle module and be usable with 3.0a1. Perhaps Johannes, or some curious user, will be able to figure out how to make it run under 3.0a2 as well.

Tuesday, December 25, 2007

Slow turtle ... in time for Xmas

One of the task assigned in Google's HOP contest was to design a simple turtle graphics module for Crunchy.  This was done successfully by a student as a prototype.  This prototype had some unfortunate limitations in terms of number of turtles and simultaneous graphics canvas existing on the same page, but it did give me the impetus to use the student code as a proof-of-concept and implement a more complete turtle module for Crunchy.

Playing with turtles, and trying to draw fairly complex shapes, made me realize that the combination of using an html canvas and the Crunchy comet communication makes for an extremely slow turtle. It would be really nice to  find a better (faster) way.

The next Crunchy release should include that turtle module ... and an additional bonus: Crunchy can now be launched successfully using either Python 2.5 (or 2.4) and Python 3.0a1.  And the turtle module works with both.

At the moment, not all of Crunchy's features are supported when using Python 3.0.  However, this should no longer be the case by the time version 1.0 comes out.

And, for those that might be tempted to point out Guido's blog entry about not making programs compatible with both Python 2.x and 3.x, please don't bother.  I realize that it is not wise in general to try to do so.  However, given Crunchy's design philosophy to make it as easy for students/teachers/tutorial writers to use - it just does make sense: download, unzip, double-click; nothing else should be needed to start having fun with Python - no matter what new Python version gets installed.


Tuesday, December 18, 2007

(NOT) Bitten by PEP 3113

UPDATE: The comments left on this post (1 and 3) in particular corrected my misreading of PEP 3113. There is no such wart as I describe in Python 3.0. I should have known better than to question GvR and friends. :-) I'm leaving this post as a reference.

In trying to make Crunchy useful & interesting for beginning programmers to learn Python, I designed a small graphics library following some "natural" notation. As an aside, Johannes Woolard is the one who made sure that this library could be easily used interactively within Crunchy. I mention his name since too many people seem to assume that I am the only one involved in Crunchy's design. Anyway, back to the library...

In that library, the function used to draw a line between two points uses the syntax

line((x1, y1), (x2, y2))

for example: line((100, 100), (200, 200))


which should be familiar to everyone. Unfortunately, following the implementation of PEP 3113 in Python 3.0, this syntax is no longer allowed. This is ... annoying! There are two alternatives I can use:

line(x1, y1, x2, y2)

for example: line(100, 100, 200, 200)


or

line(point_1, point_2)

where point_a = (x_a, y_a). Update: with this second definition, it will be possible to invoke the function as
line((100, 100), (200, 200))

Of course, either of these two option is easy to implement (and is going to be backward compatible with Python 2k). However, I don't find either one of them particularly clear for beginners (who might be familiar with the normal mathematical notation) and do not consider this a (small) wart of Python 3k.

reStructuredText files and Crunchy

Crunchy can now handle reStructuredText (.rst) files in the same way it can process plain html ones! This requires the user to have docutils installed - which is normally the case for anyone that writes .rst files.

The test coverage for Crunchy is slowly improving. Currently, 10 modules are mostly covered by doctest-based unit tests, out of approximately 40. Since I make use of .rst files to keep the unit tests, these can now be browsed "pleasantly" using Crunchy itself.

Furthermore ... all the unit tests written so far work under Python 2.4, Python 2.5, and ... Python 3.0a1! This required some tedious rewriting of some parts of the code but the end result is well worth it - if only to really learn about differences between Python 2.5 and Python 3.0.

One thing that I found, which will be no surprise to TDD aficionados, is that code written without testing in mind can be quite tricky to write comprehensive tests for. Add to this the extra complication of making that code run under two incompatible Python versions, and you are on your way to major headaches. It's a good thing I am doing this only for fun!

Saturday, December 08, 2007

Launching Python 3.0 program from Crunchy running under Python 2.5

As part of Google's Highly Open Participation contest, Michele Mazzoni completed the task of creating a new option for Crunchy: one can now launch (starting with the next release of Crunchy - 0.9.8.5) a program using a different version of Python than the one used by Crunchy itself. While I had suggested that the alternate Python version could be set via the configuration options for Crunchy (usually accessible from a Python interpreter), Michele had the brilliant idea to add a simple input box where one can specify the path (or 'alias') of the Python version used right on the page where the program is launched from. This makes it extremely easy to change the interpreter version used to launch a user written program.

Michele has prepared a screencast demonstrating this, which should appear on ShowMeDo hopefully soon.

Thank you Michele - and thank you Google!

Tuesday, December 04, 2007

More results from GHOP

Google's Highly Open Participation (GHOP) contest is attracting a lot of attention from the right people: pre-university students. The PSF is one of ten organizations mentoring students working on Python-related projects. Since I submitted tasks suggestions early on and volunteered to help following a call for volunteers from Titus Brown, Crunchy has benefited from many students contributions. Crunchy's messages have been translated in Estonian, Macedonian, Polish and Italian with, hopefully, more translations to come. Some new unit tests have been added with more to come. There may be a couple of nice surprises coming out soon too :-)

While other projects have also benefited from GHOP's students contributions, there could be more. If you have some good ideas for mini-projects (doable in 3-5 days, at a couple of hours per day with perhaps one full day), your suggestions would most likely be most welcome. Just check out the GHOP Python discussion group. And, if you would like to join the (too small) ranks of Python mentors, please do; we need all the help we can get.