Thursday, January 29, 2015

Type hinting in Python: focus on readability


tl;dr: the proposed type hinting for Python is done to help tools analyze code better (which can be very useful for programmers) but at the cost of reduced readability.  A different idea is discussed which focuses on readability.
----

So, Python is going to have some type hinting [PEP484].

I agree with the idea that lead to this addition to Python; however, I find that the syntax proposed is less than optimal. Thinking about how it came about, it is not entirely surprising.


  1. Functions annotations were introduced in 2006 [PEP3107].
  2. Various libraries worked within the constraints and limitations imposed by this new addition, including mypy [mypy].
  3. PEP 484 is "strongly inspired by mypy" essentially using it as a starting point. However, it indirectly acknowledges that the syntax chosen is less than optimal:

If type hinting proves useful in general, a syntax for typing variables may be provided in a future Python version. [PEP484]

What if [PEP3107] had never been introduced nor accepted and implemented, and we wanted to consider type hinting in Python?...

Why exactly is type hinting introduced?

As stated in [PEP484]:


"This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and performance optimizations utilizing type information."

The way I read this is as follows:  type hinting is primarily introduced to help computerized tools analyze Python code.

I would argue that, a counterpart to this would be that type hinting should not be a hindrance to humans; in particular, it should have a minimal impact on readability.   I would also argue that type hinting, as it is proposed  and discussed, does reduce readability significantly. Now, let's first have a look at some specific examples given, so that you can make your own opinion as to how it would affect readability.

Simple example (from [PEP484]):

I will start with a very simple examples for those who might not have seen
the syntax discussed.

def greeting(name: str) -> str:
    return 'Hello ' + name

Within the function arguments, the type annotation is denoted by 
a colon (:); the type of the return value of the function is done
by a special combination of characters (->)  that precede the colon which
indicates the beginning of a code block.

Slightly more complicated example (from [mypy])

def twice(i: int, next: Function[[int], int]) -> int:
    return next(next(i))

We now have two arguments; it becomes a bit more difficult to see at a
glance what the arguments for the function are.  We can improve upon this
by formatting the code as follows:

def twice(i: int, 
          next: Function[[int], int]) -> int:
    return next(next(i))


What about keyword-based arguments?

From [PEP483]


"There is no way to indicate optional or keyword arguments, nor varargs (we don't need to spell those often enough to complicate the syntax)"

From [PEP484]


"Since using callbacks with keyword arguments is not perceived as a common use case, there is currently no support for specifying keyword arguments with Callable."

However, some discussions are taking place;  here is an example taken from https://github.com/ambv/typehinting/issues/18 (formatted in a more readable way than found on that site)

def retry(callback: Callable[AnyArgs, Any[int, None]], 
          timeout: Any[int, None]=None, 
         retries=Any[int, None]=None) -> None:

Can you easily read off the arguments of this function?  Did I forget one argument when I split the argument lists over three lines?  Can you quickly confirm that the formatting is not misleading?

Type Hints on Local and Global Variables

The following is verbatim from [PEP484]


No first-class syntax support for explicitly marking variables as being of a specific type is added by this PEP. To help with type inference in complex cases, a comment of the following format may be used: 

x = []   # type: List[Employee] 

In the case where type information for a local variable is needed before if was declared, an Undefined placeholder might be used: 

from typing import Undefined
x = Undefined   # type: List[Employee]
y = Undefined(int)
 
 If type hinting proves useful in general, a syntax for typing variables may be provided in a future Python version.(emphasis added)  

Edit: why not bite the bullet and do it now? Considering what this syntax, if it were introduced, should look like, might reassure people who see type information in comments as problematic and ensure that the limited syntax decided upon in this PEP will not have to be changed to be made coherent with the new addition(s).

What about class variables?  

I have yet to see them being discussed.  I assume that they would be treated the same as local and global variables, with an as yet undefined syntax.


A different proposal for type hinting.

Let's forget for a moment the syntax introduced by [PEP3107] for functions annotations, and imagine that we are considering everything from the beginning.

Type hinting is introduced for tools (linters, etc.).  As such, I would assume the following:

When reading/parsing code:

  1. type hinting information should be easily identifiable by tools
  2. type hinting information should be easily ignorable by humans (if they so desire)


By this, I mean that the type hinting information should not decrease significantly the readability of the code.


Let me start with an obvious observation about Python:  indentation based code-blocks indicate the structure.  Code blocks are identified by their indentation level, which is the same within a code block.

Tools, like the Python interpreter, are very good at identifying code blocks. Adding an new type of code block to take into account by these tools should be straightforward.


A secondary observation is that comments, which are ignored by Python, are allowed to deviate from the vertical alignment within a given code block as illustrated below.

def f():
    x = 1
    y = 2
       # this comment is not aligned 
       # with the rest of the code.
    if z:
        pass
  

Now, suppose that we could use a syntax where type annotation was  structured around code-blocks defined by their indentation. Since type annotation are meant to be ignored by the interpreter  (non executable, like comments), let us also give the freedom to have additional indentation for those ignorable code-blocks, like we do for comments.

The specific proposal
  
Add where as a new Python keyword; the purpose of this keyword is to introduce a code block in which type hinting information is given.

To see how this would work in practice, I will show screenshots of code from a syntax-aware editor containing type hinting information as it is proposed and contrasted with how it might look when using the "where code blocks".  I'm using screenshots as it provides a truer picture of what code would really look like in real life situations.

First, the example from [mypy] shown above:



Now, the same example using the "where" code-block.











I used "return" instead of "->"  as I find it more readable; however, "->could be used just as well.

Having a code-block, I can use the code-folding feature of my editor to reduced greatly the visibility of the type hinting information; such code-folding could presumably be done automatically for all type-hinting code blocks.








Moving on to a more complicated example, also given above.  First, the screenshot with the currently proposed syntax.









Next, using the proposed code-block; notice how keyword-based arguments are treated just the same as any other arguments. [Note: I was not sure if timeout above was a keyword based argument assigned a default value or not, since it used a different syntax from retries which is a keyword based argument.]







Again, using code-folding, the visual noise added by the type-hinting information essentially disappears.

Finally, the example with the "global variable", first with the proposed type hinting information added in a comment:











Next, using a code block; no need to import a special "type" to assign a special "Undefined" value: the standard None does the job.








A similar notation could easily be used for class variables, something which is not addressed by the current type-hinting PEP.

Type hinting information is going to be a useful feature for future Python programmers.  I believe that using indentation based code blocks to indicate type hinting information would be much more readable that what has been discussed so far.  Unfortunately, I also believe that it has no chance of being accepted, let alone considered seriously.  

[PEP483] https://www.python.org/dev/peps/pep-0483/
[PEP484] https://www.python.org/dev/peps/pep-0484/
[PEP3107] https://www.python.org/dev/peps/pep-3107/
[mypy] http://mypy-lang.org/

This blog post is licensed under CC0 and was written purely with the intention to entertain.






15 comments:

  1. Your proposal seems so much more pythonic than the PEP proposals. The stuff they are showing looks like someone is trying to remake python in the image of go or c.

    ReplyDelete
  2. In the examples your proposal looks better than the mypy style.

    Still, if you add a sphinx docstring, both seems to add redundancy.

    I guess I would prefer a clean signature with a sensible annotation in the docstring.

    Cheers, and thanks for sharing.

    ReplyDelete
  3. +1, it's kind of like decorators and bolts onto the language instead of trying to change it. Along the wins, pylint can be easily extended to parse this, and the errors won't all have the same line numbers.

    ReplyDelete
  4. I am not sure what the original proposal is but for somebody like me who codes in other languages as well, why can't we have a syntax like below:

    def foo(str: x) -> str:

    why the following is proposed:

    def foo(x: str) -> str:

    To me the first syntax makes more sense.

    ReplyDelete
  5. I have two very, very small problems with this proposal. Mind you, I am far from a Python expert.

    Python is coded in blocks to make it easy to differentiate one block from another, and it works just fine. There is no need nest the type definitions two levels further from the rest of the function. It breaks current syntax for no good reason.

    I also don't like that with this syntax, type definitions go from being expressions into statements. I feel as though as a Python statement should do something to affect the running program. I don't see a good workaround to that though, and your proposal is far superior to the proposed PEP.

    ReplyDelete
  6. I like it. Just one thing, where: doesn't need to be over-indented. It could be a simple block keyword/instruction.

    Another thing I would like to see is a more coherent way of indicating types, like Any(list(str), int) or function(float, float, float).

    ReplyDelete
  7. @Jose: I agree that the where block does not (in principle) need to be overindented; however, it was required by my editor so that I could use code-folding to hide it.

    As to the exact way the types are declared: I figured that I should use the same as in the PEP since the point was to illustrate that the code could be made more readable if the type information was put in a code block rather than mixed in with the function arguments.

    ReplyDelete
  8. I wholly agree that the proposed syntax is abominable. It's the opposite of readable. I do really like the idea of a 'where' keyword to denote typing, but I think a slight refinement of your idea would prove even better:

    def retry(callback, timeout, retries=None) where
    ........callback: Callable[AnyArgs, Any[int, None]],
    ........timeout: Any[int, None],
    ........retries: Any[int, None],
    ........return None:
    ....pass

    def greeting(name) where name: str, return str:
    ....return 'Hello ' + name

    x = [] where x: List[Employee]

    To me, this orders of magnitude more readable than the proposed nonsense.

    PS. Note the 8-space indent above would only a convention, not requirement.

    ReplyDelete
  9. Please pardon the spam, but I thought of more refinement when I was composing the following message to python-dev:

    The proposed syntax is abominable. It's the opposite of readable.

    The function annotation syntax is ugly, but potentially useful for things like documentation. While it may very well have been created with the idea of type-checking, actually using it for such quickly turns into an unreadable morass of information that is far more difficult for human brains to parse, which makes this usage the antithesis of pythonic.

    I much prefer the idea of a 'where' keyword to denote typing, as discussed [here], but I think a refinement of their idea would prove even better:

    def retry(callback, timeout, retries=None) where
    ........callback is Callable[AnyArgs, Any[int, None]],
    ........timeout is Any[int, None],
    ........retries is in [int, None], # 'is in' construct is more readable, dunno about complexity
    ........return is None:
    ....pass

    def greeting(name) where name is str, return is str:
    ....return 'Hello ' + name

    x, y = [], [] where x, y is List[Employee], List[Manager]

    To me, this orders of magnitude more readable than the proposed nonsense.

    PS. Obviously the 8-space indent above would only a convention, not requirement.

    ReplyDelete
  10. @Benjamin:

    I like your version; in some ways, it is cleaner than mine. The one relatively minor drawback is that, with your version, it becomes impossible for code-folding (which is driven by indentation in all editors I have used) to be used to hide the type declaration.

    ReplyDelete
  11. I've only ever used vim or PyCharm as editors and I'm fairly certain each of them could be programmed to handle it. As for purely indent-based folders, I believe this would still work and would be syntactically identical:

    def retry(callback, timeout, retries=None)
    ........where
    ............callback is Callable[AnyArgs, Any[int, None]],
    ............timeout is Any[int, None],
    ............retries is in [int, None],
    ............return is None:
    ....pass

    ReplyDelete
  12. I like the syntax, but I am unsure... is it compatible with the python parser fundamentals?

    ReplyDelete
  13. I like the syntax also but why use a new keyword? May be use something like :
    def foo (a, b):
    as:
    a:int
    b:float
    return:float

    ReplyDelete
  14. You are absolutely right. PEP484 was a bad idea, and it attracts all the wrong people (e.g. Java folks who seem to think it's a good practice to type-hint _every_ single argument on _every_ method), then complain about the runtime not asserting the types and raising exceptions. Take this one step further and we will not only have less readable programs but also another incompatible Python version. IMO PEP484 is just another good reason not to engage with Python 3...

    ReplyDelete

Spammers: none shall pass.

Note: Only a member of this blog may post a comment.