Sunday, October 19, 2008

Seeking advice on parsing

Dear Lazyweb,

As a follow-up to my previous post, I have a question...

Suppose you wanted to design an application that used parsing as a core element and you wanted this application to be easily extended by users. Furthermore, you were hoping that the users would contribute back some parsers that could be included in future versions. Would you:

1a) Use pyparsing and require all potential users of your application to download it separately.
1b) Use pyparsing but include it bundled with your application.
2) Use regular expressions (re module in standard library) and expect everyone to do the same.
3) Use some other module in the standard library.
4) Use some other 3rd party parsing package.

6 comments:

Cory said...

Use simpleparse, and include it as part of your application.

Simpleparse uses a language syntax that is very similar to (E)BNF, and therefore will be familiar with many of your contributors. If you are writing something to do with language parsing (e.g. an editor), some of your contributors may have already contributed their parsers to other applications, and it would be convenient for them if you allowed them to use something that looked like familiar territory.

Pyparsing is powerful (I've used it) but I think simpleparse gives a slight edge when it comes to encouraging contributions, and is no less powerful.

Doug Hellmann said...

Another route you could take is to define your own API and let the extension author use whatever library they want. If your API passes text to the extension and expects an application-specific object as a return value, you don't have to care what parsing technology they use.

dowski.com said...

These are both good suggestions.

I like Cory's better though. simpleparse is a great library. It is reasonably fast and quite Pythonic.

I would think that supporting only one parsing solution will make it easier on you, the project maintainer.

Also, do you see contributors writing completely new grammars for your application to consume or just new commands?

André said...

@dowski.com: At them moment (the program is very embryonic), I simply call a parser with the code given as an argument, and expect to receive a processed string (svg code for the picture) in return. Thus contributors would write their own grammar. In this sense, it follows Doug Hellmann's suggestion.

The prototype I am working on (a turtle module example) is simple enough that it can apparently be handled quite easily by regular expressions. Once I complete it, I will have a better idea.

Paddy3118 said...

If you provided a parsing library, would that mean that regexp based 'parsers' wouldn't be allowed?
I would like the choice of implementation by regexp even though it would be a strong incentive to learn a 'proper' parsing framework.

- Paddy.

André said...

@paddy: Yes, one could always use regular expressions. In fact, this is what I am starting with.

The idea of choosing/providing a "standard" parser library is to make it (potentially) easier to develop new parsers. However, all the use cases I've thought of, so far, could be handled reasonably well with regex.