Friday, October 24, 2008

docpicture: getting closer



This is just a progress report for the curious among you: the previous two images were generated automatically from the docpicture code written above them. If everything goes well, by the end of the weekend I'll be ready to give a sneak preview of the code to anyone interested. Feel free to contact me.

Sunday, October 19, 2008

docpicture + svg generation: first prototype working

As outlined in a previous post, I have decided to use svg to embed pictures in html pages generated from docstrings. Of course, this could be generalized to other cases than docstrings; for example, this could be implemented as a reStructuredText directive. In the course of playing with generating such pages with inline svg code, I observed the following:
  1. If a file is saved locally and loaded within Firefox, it should be saved with a ".xml" (or possibly ".xhtml") extension.
  2. If a file is served dynamically from a server, all that is needed is that its content be identified as "application/xhtml+xml" [as I had mentioned previously].
  3. If the file is put on a "generic webserver" that can't be configured by the user, Firefox will ignore the svg code if the extension of the file is ".xml" or ".html". However, I did find a workaround: use a ".xhtml" extension and, when prompted by Firefox as to what application to use to open such file, select Firefox itself. The file will be downloaded locally and displayed correctly. At least, this is what happens on a Mac with Firefox 3.
There might be another way to do this; if so, I would be interested in knowing how. In the meantime, for those interested, here is the output of a first working test case.

Update: The test case has been improved with styling.

Update 2: A new picture perhaps gives a better idea of a more realistic use case.

Seeking advice on parsing

Dear Lazyweb,

As a follow-up to my previous post, I have a question...

Suppose you wanted to design an application that used parsing as a core element and you wanted this application to be easily extended by users. Furthermore, you were hoping that the users would contribute back some parsers that could be included in future versions. Would you:

1a) Use pyparsing and require all potential users of your application to download it separately.
1b) Use pyparsing but include it bundled with your application.
2) Use regular expressions (re module in standard library) and expect everyone to do the same.
3) Use some other module in the standard library.
4) Use some other 3rd party parsing package.

Saturday, October 18, 2008

More on docpictures and (almost) minimal example of web server with inline svg

After playing some more with the idea of embedding pictures in docstrings, I've settled on using dynamically created svg images instead of png images. The basic idea (which I'll write about in more details later) is to have something like

.. docpictures:: some_type
highly readable description

and have the docpicture module parse the "highly readable description" based on the syntax defined in some_type. Note that I chose this notation to be compatible with reStructuredText directives. The "highly readable description" will depend on the context. For example, the mathematically inclined will be able to read this:

.. docpictures:: equation
e^{i\pi} = -1

while the following might be easy to understand by programmers:

..docpicture:: uml_sequence
User --> Browser: clicks on a link
Browser -> Crunchy: requests file
Crunchy --> Browser: Authentication Request
Browser --> Crunchy: Authentication Response
note over Crunchy: Retrieves and processes file
Crunchy -> Browser: Sends processed file
Browser --> User: Displays file
In this last example using the syntax of this site which generates the following picture:



(Btw, the picture above is generated automatically each time this page is loaded - so it depends on the availability of the server. I have used a static version of this picture in the documentation for Crunchy.)

Using svg to do graphics is fairly easy. Using svg as embedded objects in an html document requires a bit of searching on the internet. Creating such documents and displaying dynamically requires even more searching (or perhaps more careful reading...). The thing important to remember is to serve the document as "application/xhtml+xml" instead of the usual "text/html". I thought it would be useful to share an almost minimal working example (tested only on Firefox) and perhaps save some time for others that would like to do the same. Feel free to adapt it as you like.

svg_test = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg">
<head></head>
<body>
<h1>SVG embedded inline in XHTML</h1>
<svg:svg width="300px" height="200px">
<svg:circle cx="150px" cy="100px" r="50px" fill="%s"
stroke="#000000" stroke-width="5px"/>
</svg:svg>
</body>
</html>
"""


import BaseHTTPServer
import webbrowser

colors = ["#330000", "#660000", "#990000", "#cc0000", "#ff0000"]
index = 0
class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
global index
self.send_response(200)
self.send_header('Content-type', 'application/xhtml+xml')
self.end_headers()
self.wfile.write(svg_test% colors[index % 5])
index += 1

port = 8000
server = BaseHTTPServer.HTTPServer(('',port), WebRequestHandler)
webbrowser.open("http://127.0.0.1:%s"%port)
server.serve_forever()

And, if you know how to suppress the output of the webserver, feel free to leave a comment.

Tuesday, October 14, 2008

Viewing embedded pictures within docstrings

In a recent post on the Python mailing list, it was suggested that it would be useful if "small" pictures could be embedded within docstrings as additional information. As is often the case, many words were written ... but little code was produced. As I have been guilty of this myself in the past, I decided that it was time to do things differently. After a quick prototype sent to the Python list, I wrote this recipe which gives a simple way to embed and display images inside a docstring in a totally transparent way.

Is this something that people would find useful? Any suggestions for improvements? Should I implement this within Crunchy? (Btw, Crunchy 1.0 alpha 1 has been released just last week).

Sunday, June 22, 2008

Monkeypatching doctest

The Python doctest module rocks. Lately, I have been using it to write unit tests for Crunchy: for each module, I write a reStructuredText file which contains sample tests written as simulated interpreter sessions, using doctest.testfile(). This has worked really well in general ... however, I have encountered one small annoyance, which I managed to get rid of in an "elegant" way using Monkeypatching.

doctests allow the use of directives. One "powerful" directive is the ELLIPSIS directive. Quoting from the documentation:
When specified, an ellipsis marker (...) in the expected output can match any substring in the actual output. This includes substrings that span line boundaries, and empty substrings, so it's best to keep usage of this simple. Complicated uses can lead to the same kinds of "oops, it matched too much!" surprises that .* is prone to in regular expressions.
Unfortunately, I encountered a case where the ellipsis marker did not allow enough matching! Consider the following situation: I have a program (Crunchy!) that saves the user's preferences (including the language) in a configuration file each time its value is changed. It also gives some feedback to the user whenever this happens.

>>> original_value = crunchy.language
>>> set_language('en') # setting this value for some standardized tests
Language has been set to English

At the end of the test, I want to restore the original value.
>>> set_language(original_value) #doctest: +ELLIPSIS
...
Here I want the ellipsis (...) to match the string that is going to be printed out in the original language as I have no idea what this string will look like. The problem is that the ellipsis in this case is thought to be a Python (continuation) prompt and not a string that is "matched". One workaround that I had been using was to modify set_language to add a parameter ("verbose") that was set to True by default but that I could turn off when running tests. While this is simple enough that it surely would never (!) introduce spurious bugs, it does not feel right; one should not modify functions only for the purpose of making them satisfy unit tests.

According to the documentation,
register_optionflag(name)

Create a new option flag with a given name, and return the new flag's integer value. register_optionflag() can be used when subclassing OutputChecker or DocTestRunner to create new options that are supported by your subclasses. register_optionflag should always be called using the following idiom:
  MY_FLAG = register_optionflag('MY_FLAG')

This is great ... except that I want to used doctest.testfile() which does not allow me to specify a subclass of OutputChecker to use instead of the default. Also, I wanted to use as much of possible of the existing doctest module, with as little new code as possible.

This is where monkeypatching comes in.

After a bit of work, I came up with the following solution:

from doctest import OutputChecker
original_check_output = OutputChecker.check_output
import doctest

IGNORE_ERROR = doctest.register_optionflag("IGNORE_ERROR")

class MyOutputChecker(doctest.OutputChecker):
def check_output(self, want, got, optionflags):
if optionflags & IGNORE_ERROR:
return True
return original_check_output(self, want, got, optionflags)

doctest.OutputChecker = MyOutputChecker

failure, nb_tests = doctest.testfile("test_doctest.rst")
print "%d failures in %d tests" % (failure, nb_tests)

And here's the content of test_doctest.rst

Test of the new flag:

>>> print 42
42
>>> print 2 # doctest: +IGNORE_ERROR
SPAM!


This yields a test with no failures. There might be a more elegant way of doing this; if so, I would be very interested in hearing about it.

Friday, April 18, 2008

Thoughts on Google Summer of Code 2008 - part 1

In just a few days, Google will make some announcements that will please many hundreds of students and disappoint even more. I think we should all focus on the positive side when the announcements are made. It is, after all, a fantastic thing that a company is spending millions of dollars so that some students get the chance to program on Open Source software as a summer job.

Just think of it. Who could have predicted, just five years ago, that a company would spend that kind of money on students who would work on someone else's project?

This is amazing - and many are now taking it for granted.

I find it great that the Python Software Foundation is an organization that can mentor SoC students. With the excellent supporting work of James Tauber as coordinator, many promising students are going to be paired with a mentor, hopefully leading to great projects to be completed this summer.

I have seen some grumblings on some SoC related lists that have made me thought about some of the "problems" I have seen. Note that these are very minor compared with the strong positive points. I will be discussing those in part 2, after the official announcements are made.

  • Because the PSF is an umbrella organization, most students work on different projects, unrelated with each other. As a result, they tend to have limited interactions with the greater Python community. I think there should be a "meeting place" where all the students would meet - perhaps a mailing list to which they have all to contribute once a week, sharing their progress, etc.
  • Not enough positive publicity is given to "successful students", i.e. those that continue to contribute to the Python community after the summer is over. For Crunchy, it has been one success (Johannes Woolard) out of a total of 3 students over the past 2 years. I don't know of many other success stories from other projects ... Alex Holkner comes to mind ... but I feel I should know more names of successful students. (I know there's another Alex or Alessandro who has contributed to the Python core and was involved with GHOP....)
  • With the exception of a few people like James Tauber and Titus Brown with whom I have had a few email exchanges, I do not feel that as a past/potential mentor I am as much part of a community as I feel should be the case. There is a mentor discussion list, but it does not seem to be the kind of place to generate community building discussions.
In terms of projects submitted, I would describe them to belong in the following categories:

1. Contributions to the "standard" core (cpython code, or standard library)

2 a. Contributions to "non-standard core" (e.g. Jython, PyPy, TinyPy?)
2 b. Contributions to 3rd party libraries (e.g. Numpy, Pygame)
2 c. Contributions to major projects whose end users have to use Python (e.g. SAGE)
2 d. Contributions to projects that can be used to teach Python [Crunchy, of course ;-), but there are others ... that will be for part 2]

3 a. Contributions that propose some new "standards" for Python programmers, never discussed before in the Python community.
3 b. Projects that happen to be written in Python, but whose end users are not exposed (or minimally exposed) to Python.
3 c. Projects that are not written in Python, that may or may not be usable in all OS, and that aren't more useful to Python programmers than they would be to people using other languages.
3 d. et cetera

Assuming that all projects are well-thought of (which is not always the case), I feel that:

  • Projects in category 1 deserve to be fully supported. The Python community need more capable people contributing to the core to prevent burnout for the current contributors. Perhaps, in a few years, after working some more on Crunchy, I'll feel capable of joining that group and contributing effectively (and have the time to do so).
  • Projects in categories 2 a-d are worthy of support. There are of course more such projects submitted than can possibly be supported, so some difficult choices had to be made. (Kudos to James for guiding this process.) Many people are going to be disappointed, but this was unavoidable.
  • Projects in categories 3 a-d are a puzzle to me. I don't understand their appeal for the PSF (and I know I am not the only one), but it seems that very few people are willing to take a public stance on this and debate the issue. Note that this comment is made as an observation on the discussions that took place so far and does not necessarily reflect on any decision that has been made.
This is it for the negative comments. I can't wait for the announcements from Google to focus entirely on the more positive side.