Sunday, January 09, 2005

Python as pseudo-code and Optional Static Typing

In my last post, I gave a series of examples intended to show how Python's pseudocode syntax could be preserved while adding optional static typing. While I felt that the syntax was self-explanatory (as pseudocode should be), I thought it might be useful to examine in more details the why's and the how's of my suggestion.

Before proceeding further, it might be argued that this is a case of premature optimization: Python does not currently have (optional) static typing so the concept should be discussed first, before implementation details and syntax can be looked at objectively. Yet, one of the strength of Python is its syntax, and many objections (NIMPY as GvR described it) to the proposed addition(s) to the language came from an unease with the suggested syntax.

1. Functions and optional typing information

I will use GvR's version of a Greatest Common Divisor finding function as an example and build from it; the complexity of the examples will increase as we go in an attempt to fully illustrate the suggested syntax. This function is defined as follows:

01 def gcd(a, b):
02 '''Returns the Greatest Common Divisor,
03 implementing Euclid's algorithm.'''
04 while a:
05 a, b = b%a, a
06 return b
I have included an optional docstring which, following Python's normal syntax, is indented at the same level as the function's body. The docstring is intended to be easily extractable for documentation purpose and is intended for human readers only. Suppose that it is required to provide static typing information. Using the notation I used in my previous post, here's how one might write additional syntax information.
01    def gcd(a, b):

02 where:
03 a: int, b: int
04 return c where:
05 c: int
06 '''Returns the Greatest Common Divisor,
07 implementing Euclid's algorithm.
08 Input arguments must be integers;
09 return value is an integer.'''
10 while a:
11 a, b = b%a, a
12 return b
The suggested where keyword, followed by a mandatory colon, follows the usual Python syntax of introducing a bloc of code identified by its indentation level. So we now have three structural items at the same indentation level: the static typing information, the docstring, and
the function body.

The docstring is between the typing information and the actual code, allowing to better separate visually the two of them. The docstring has been updated as well to inform a potential user
reading the documention as to the particular of this function. I have given some thoughts about including the typing information within the docstring (à la doctest) in order to avoid writing about the type information twice, as in the following:
01    def gcd(a, b):

02 '''Returns the Greatest Common Divisor,
03 implementing Euclid's algorithm.
04 where:
05 a: int, b: int
06 return c where:
07 c: int'''
08 while a:
09 a, b = b%a, a
10 return b
However, I find 1) that it is not easier to read; 2) that it would complicate extraction of the docstring for inclusion in a standard documentation (i.e. spaces are significant for only part
of the docstring); 3) it doesn't "scale" well with added features like pre-conditions and post-conditions.

2. Pre- and post-conditions

Building on the previous examples, here is how pre- and post-conditions might might be added within the suggested new syntax with, in this case, somewhat artificial conditions (not required by this algorithm).
01    def gcd(a, b):

02 where:
03 a: int, b: int
04 assert a != 0 and b != 0
05 return c where:
06 c: int
07 assert c <= abs(a) and c <= abs(b)
08 '''Returns the Greatest Common Divisor,
09 implementing Euclid's algorithm.
10 Input arguments must be integers,
11 and can not be both zero;
12 return value is an integer.'''
13 while a:
14 a, b = b%a, a
15 return b

The post-condition assertion

07 assert c <= abs(a) and c <= abs(b)

is an internal verification that the code is correct. It is irrelevant to the documentation which is one more reason to separate it from the docstring, even though such information is sometimes including in docstring. This notation, using the standard assert statement is different from what I suggested in my previous post and respects Python's current syntax.

One can easily imagine that such typing information and conditions might be shared by many functions. Rather than replicating code, it might be possible to define abstract functions, which are functions that have no body, only type information and/or conditions. For example, we could have
01    def two_integer_input(i, j):  # abstract function: no body

02 where:
03 i: int, j: int
04 '''Accepts only two integers as input.'''
05
06
07 def _gcd_return_info(a, b): # abstract function: no body
08 where:
09 return c where:
10 c: int
11 assert c <= abs(a) and c <= abs(b)
12
13
14 def gcd(a, b):
15 where:
16 two_integer_input(a, b)
17 assert a != 0 and b != 0
18 _gcd_return_info(a, b)
19 '''Returns the Greatest Common Divisor,
20 implementing Euclid's algorithm.
21 Input arguments must be integers,
22 and can not be both zero;
23 return value is an integer.'''
24 while a:
25 a, b = b%a, a
26 return b

I realise that this might be starting to look like a return to past suggestions during the decorator debate. However, given that the optional static typing proposal is new, it should be looked at objectively, without rejecting off-hand proposals that were rejected in a different context.

3. Interfaces

Instead of dealing with functions, in oop we have classes and methods. Classes can implement interfaces, which share some similarities with what we called abstract functions above. Here is how, using the notation we have suggested, we might introduce interfaces in the language. Once again I am using examples suggested by Python's BDFL in his blog.
01    interface I:

02 def foo(x):
03 where:
04 x: int
05 x > 0
06 return r where:
07 r: str
08
09 class C(object; I): # derived from object; implements I
10 def foo(self, x):
11 return str(x)

I suggest using a semi-colon to separate based classes from interfaces for clarity. Thus we could have:
01    interface I1(I2, I3):   # derived from I2 and I3, both interfaces

02 def foo(a: t1, b: t2):
03 where:
04 a: t1, b: t2
05 return c where:
06 c: t3
07
08 class C(C1, C2; I1): # derived from C1, C2; implements I1
09 def foo(a, b):
10 return a+b
For a class that does not implement interfaces, the following statements would be equivalent
class C(object):  # preferred

# and
class C(object;): # allowed

Classes that do not derived from a base class (aka 'old-style') might still be accommodated via
class C(; I1, I2)

although presumably they will have disappeared from the language by the time static typing information will have been implemented.


5 comments:

Anonymous said...

For this to work reliably, the dynamic nature of Python should be reduced a bit. Like, what does `abs' refer to? The built-in function? Or perhaps a user-defined function with the same name?

What if `int' is subclassed, and the `__mod__' or `__lt__' method overridden?

Anonymous said...

>What if `int' is subclassed, and the `__mod__' or >`__lt__' method overridden?

When a programmer overides a method, it is to get a specific comportment, and you (library programmer) got to trust he knows what he does.
Overriding definitions doesn't mean your code is not coherent anymore, it just means it acquires another signification. And Euclide PGCD algorithm keeps beeing Euclide PGCD algorithm.

Anonymous said...

So, really, "var: expr" is a short-hand for "assert istypeof(var, expr)".

--Chris Ryland, cpr@emsoftware.com

André Roberge said...

Chris Ryland wrote:

So, really, "var: expr" is a short-hand for "assert istypeof(var, expr)".
===
Essentially; I'm not aware of the function istypeof(), but I would say that "var: expr" is a short hand for
"assert isinstanceof(var, expr)"
which I read to be the same thing.
Note that "var: expr" is also the notation proposed by Guido van Rossum in his blog.

Anonymous said...
This comment has been removed by a blog administrator.