Thursday, December 27, 2007

Crunchy and Python 3.0a1

crunchy running under Python 3.0a1

It is often said that a picture is worth a thousand words...

I have managed to make Crunchy run under Python 3.0a1. Some of the features are not working but the interpreter and the editor work. The turtle module I have been working on also works "nicely" (read: as slow as before) with this new Python version. Unfortunately, when it is run under Python 3.0a1, Crunchy can not load most pages - including those of the official Python 3.0 tutorial. The reason is that is uses ElementTree to parse pages and it is unforgiving when it comes to having unclosed tags (as in <link> and <meta...> for example); it also seems to not be able to handle the <script>s that are included on the page. I have not yet found a way to reliably "clean" the pages before parsing them with ElementTree. While I believe that I should be able to do so with a bit more work, there is a bigger problem...

Unfortunately, Crunchy does not run under Python 3.oa2, and the error messages I get have not been too helpful in figuring out the error. However, perhaps this is due to a faulty installation. What makes me think so is that when I start a 3.0a2 session at a terminal, I get an error message when I use exit(). This is most unexpected.

In any even, the next release should include the new crunchy turtle module and be usable with 3.0a1. Perhaps Johannes, or some curious user, will be able to figure out how to make it run under 3.0a2 as well.

2 comments:

Anonymous said...

You can use BeautifulSoup to parse nasty html and get back a nice datastructure that resembles that of elementtree.

You second choice is to use Tidy (a tool that turns nasty html into well formed xml), I think there are python bindings for that as well.

André Roberge said...

I already use BeautifulSoup (via ElementSoup) when Crunchy is running under Python 2.x - the problem is that it does not work under 3.x.

Tidy might be a possible choice. I have been trying to keep the dependencies to a minimum, but may make a temporary exception (until BeautifulSoup is available for Python 3.x).

Thanks for the suggestions.