Saturday, March 22, 2008

Using Clone Digger

There's a new tool available for Python programmers: Clone Digger. While it has not been officially released, it is available from the svn repository. Clone Digger finds code duplications in a given project, and creates a fairly comprehensive report (html file). Seeing the duplications on a screen is a powerful motivation for refactoring.

Check it out!

Update:

Just to make it clear: I had nothing to do with this project; I just found out about it via the gsoc-python-mentors list.

Inspiration and persistence

While mindlessly wasting time instead of programming selectively reading the internet, I came across this gem by Seth Godin which I reproduce in its entirety:

Persistence

Persistence isn't using the same tactics over and over. That's just annoying.

Persistence is having the same goal over and over.

That's it.

A wiser person would most likely leave it at that. However, this lead me thinking of my goals when it comes to programming which I thought I should write down if only to help me reflect upon them again at a later time. I can sum them up as follows:

  1. Do something that is fun but that gives me some sense of accomplishment rather than just wasting time.
  2. Find ways to make it easier for others to learn programming (in Python).
In doing so, I have found myself oscillating between two extremes:
  1. Trying to follow the "release early, release often" philosophy.
  2. Trying to get everything "just perfect" before releasing anything.
Trying to get things "just perfect" is something that can lead to procrastination and delays. As an example, rur-ple's version 1.0 release candidate 3 has not been updated since July 2007. The next version should be the final 1.0 ... but somehow, I am not happy with many details and I'd like to get everything right for 1.0. Too often I read about (usually commercial) software which is officially released and is considered by its users to be a Beta version. All open source programmers I have met have a sense of pride in their work that I share. So I postpone the final release and end up working on something else...

I went the other way with a little utility called lightning compiler (now at version 2.1), whose version 1.0 was released as a recipe in the online Python cookbook. Much of the rapid evolution of lightning compiler came from user feedback, as expected from the "release early, release often" philosophy. Yet, following the same philosophy generated relatively little feedback for rur-ple of for crunchy to date. I did get some feedback for rur-ple which has been used at an elementary school in Austria, at a high school and a university in the U.S., among others, but it has often been very indirect.

Still, I am persistent. Following Seth Godin's definition of persistence, my second goal written above can be described as finding a solution as to Why Johnny can't code. Or, as I have written elsewhere
My goal is to provide an introduction to programming which is as "smooth" as possible. We sometime hear the phrase "steep learning curve" to characterize some difficult to grasp concept. I think it is important to have as few "steep learning curves" as possible in the learning process. GvR [Guido van Robot] uses a slightly easier syntax than Python ... but at the expense of having a "step-like learning curve" when one wants to go from GvR's world to Python programming. Since Rur-ple uses Python, there is no transition to speak of.
Both rur-ple and Crunchy, and to a lesser extent lightning compiler (which has been incorporated within rur-ple) have been inspired by that goal.

However, sometimes I stray from that goal. For example, inspired by an earlier post on Georg Brand's remarkable Sphinx, Crunchy now includes a prototype for an automated documentation testing framework along the lines of sphinx.ext.doctest which was released yesterday. My intention is to update Crunchy's implementation so that it can be totally compatible with Sphinx's. And while I believe that this is a neat (and fun!) thing to include in Crunchy, it only very indirectly contribute to my overall goal and ends up delaying the 1.0 release for Crunchy.

Blogging too can be a distraction. However, it is my hope that it may generate a few comments that will contribute to inspire me to make Crunchy even more useful.

Success is the result of inspiration and persistence.

Friday, March 07, 2008

Crunchy: Pycon 2008 release

Crunchy is getting really close to a 1.0 version. To mark the Pycon 2008 event (that I won't be able to attend), I just did a new release (0.9.9). It has a few new goodies that I won't list here, leaving Johannes do the demonstration. As for me, I am heading down South for a vacation with my kids.

Note: the opening Crunchy page indicates that this is version 0.9.8.6 - which is incorrect.

What is left to be done for version 1.0 is cleaning up the existing documentation (proofreading, proofreading, proofreading) and adding a few more pages to it. New features will have to wait until after 1.0.... unless we get feedback from Pycon attendees for "must have" features that we could implement quickly.

As far as I know, there are no bugs (famous last words). If you find any, please let us know.

Friday, February 29, 2008

Pycon and Crunchy

This year's Pycon program looks very interesting. I wish I could be there but, alas, the timing was just wrong for me this year. This is doubly disappointing as I would have been able to meet with Johannes Woolard in the flesh. Yes, forget Guido van Rossum, Alex Martelli and other famous names: the one person I wanted to meet is Johannes. For more than a year an a half, I have had the pleasure of collaborating with Johannes on Crunchy, without ever meeting him. This year, Johannes will be the one showing Crunchy off. I'm sure he'll do a great job.

And, if anyone is looking to hire a bright, young, hard-working programmer, Johannes will graduate from Oxford this year.

Friday, February 22, 2008

99 problems: looking for volunteers

Some time ago, Dr. Werner Hett created a list of 99 Prolog problems that could be used to skills in logic programming. More recently, a Ruby learner posted a Ruby version of the first 10 problems, and his solutions. This seemed to be a good idea, especially if one makes use of doctests ... and Crunchy :-). So, I've started my own version of these which you can get as a zip file (containing 6 problems and their solutions) from the Crunchy main page. If you have never done so before, to load a local html file within Crunchy, you simply click on the "Browsing" menu on the left hand side and scroll down until you reach the "Closer to home" section and follow the instructions.

Note that with the next version of Crunchy (the current one is 0.9.8.6) you will be able to start Crunchy with an arbitrary file using something like

python crunchy.py --url=full_local_path_or_url

It would be nice if there could be a complete Python version of the 99 Prolog problems. If anyone is interested in helping, please do not hesitate to contact me.

Friday, February 01, 2008

Automated documentation code testing - part 2

Thanks to Crunchy's simple plugin architecture, after only a few hours of coding the automated documentation testing (for html files) described here has been implemented (the "first approach" described, that is.) It will be part of the next Crunchy release. In theory, code samples for a complete book could be all tested at the click of a button, provided that the book is available as an html document. The next step will be to define a few new directives so that reStructuredText documents can be used as well.

Now, while I have a few sample test files, it would be nice is to find someone who has a real life document with embedded Python code samples as a test user...

Tuesday, January 29, 2008

Automated documentation code testing

During the first year or so of development work on Crunchy, I probably got a nickname of "Dr. NO!" by early Crunchy adopters as I often resisted suggestions for adding new capabilities. At the time, Crunchy required that html pages have additional markup added (vlam = very little additional markup) so that Crunchy could process them properly. I wanted Crunchy-enabled tutorials to be very easily created, without much additional work from tutorial writers, so that Crunchy would be adopted by many people. Most of the suggestions that were made, including some by Johannes, both while he was sponsored by Google as a Summer of Code student and afterwards when he became a co-developer, were rejected by me for that reason. Since then, the situation has changed, mainly for two reasons:
  1. Johannes created a new basic infrastructure for Crunchy where we can introduce new capabilities via plugins, without needing to change a single line of the core in most instances.
  2. Based on the new architecture, I came up with a new way to process pages so that no additional markup was needed for Crunchy to do its magic. This is what makes it possible, for example, to interact with the official Python tutorial on the python.org site.
Now, that it is so easy to implement new capabilities, I am revisiting some ideas I had rejected or ignored before. The struggle I have is to decide when enough is enough before finally having a version 1.0 officially released.

In any event, after reading some comments on this post by Georg Brandl, I started thinking about adding a new option to test code embedded in documentation. To quote from the comments on that post:

One thing that is seriously needed is the ability to run and test code snippets in some fashion. It's just too easy for documentation to get out of date relative to the code, and if you can effectively "unit test" your docs, you're in much better shape.

And I don't mean like doctests, because not everything lends it self well to that style of testing. If it's possible to mark up some code as being for test fixture and some code as being what belongs in the doc, that would be good.
Alternatively, from another reader:

For me a key is being able to test code in the docs, and think the key is being able to "annotate" a code snipit with information about the context in which it should run, and the output it should give.

I think that Crunchy is a very good platform to implement this. There are currently three complementary options I am considering, one of which I have started to implement.


The first option is to have something like the following [note that while I use html notation, Crunchy is now capable of handling reStructuredText, including having the possibility of dealing with additional directives]:

Some normally hidden code, used for setup:
<pre title="setup_code name=first">
a=42
</pre>

Followed by the code sample to be tested:
<pre title="check_code name=first">
print a
</pre>

And the expected output:
<pre title="code_output name=first">
42
</pre>

Upon importing a document containing such examples, Crunchy would insert a button for each code sample allowing the user to test the code by clicking on the button, invoking the appropriate setup, and comparing with the expected output. Alternatively, all such code samples in a document could be run by a single click on a button inserted at the top of a page. A javascript alert could be used to inform the user that all tests passed - otherwise error messages could be inserted in the page indicating which tests failed or passed.

This type of approach could, in theory, be used for other languages than Python; code could be executed by passing information to a separate process launched in a terminal window, with the result fed back into Crunchy as described above.

A second approach is to use the same method used by doctest to combine code sample and expected output; the setup code could still be used as described above.

A third approach, this one completely different, could be used for more general situation than simply for documentation code testing.

Currently, the Python code needs to be embedded inside an html (or rst) document. However, one could create links to code that lives inside separate Python files. For example, one could have the following:

<pre title="python_file">
<span title="python_file_name"> file_path </span>
<span title="python_file_linenumbers"> some_range </span>
</pre>

When viewing the above using a normal browser, one would see something like (using a fictitious example)

../crunchy_dir/crunchy.py
[1-3, 5, 7, 10-15]
However, when viewing the same page with Crunchy, the appropriate lines would be extracted from the file and displayed in the browser. Alternatively, instead of specifying the line numbers, one could have a directive to extract a specific function/method/class as in

<span title="python_file_function"> function_name </span>

which would instruct Crunchy to extract all the code for the function definition, and inserting it in the document. By using such links, the code in the documentation would always (by definition) be kept in sync with the real code. I realize that this is not exactly a novel idea but one whose potential could be extended by using Crunchy in ways never seen before. However, this last approach will have to wait until after Crunchy version 1.0 has been released.

What do you think of these ideas?