Friday, May 13, 2022

Python 🐍 fun with emojis

At EuroSciPy in 2018, Marc Garcia gave a lightning talk which started by pointing out that scientific Python programmers like to alias everything, such as

import numpy as np
import pandas as pd

and suggested that they perhaps would prefer to use emojis, such as

import pandas as 🐼

However, Python does not support emojis as code, so the above line cannot be used.

A year prior, Thomas A Caswell had created a pull request for CPython that would have made this possible. This code would have allowed the use of emojis in all environments, including in a Python REPL and even in Jupyter notebooks. Unsurprisingly, this was rejected.

Undeterred, Geir Arne Hjelle created a project called pythonji (available on Pypi) which enabled the use of emojis in Python code, but in a much more restricted way. With pythonji, one can run modules ending with 🐍 instead of .py from a terminal. However, such modules cannot be imported, nor can emojis be used in a terminal.

When I learned about this attempt by Geir Arne Hjelle from a tweet by Mike Driscoll, I thought it would be a fun little project to implement with ideas.  Below, I use the same basic example included in the original pythonji project.


As you can see, it works in ideas' console, when importing module. It can also work when running the 🐍 file as source - but leaving the extension out.



And, it works in Jupyter notebooks too!


All of this without any need to modify CPython's source code!

😉


Sunday, April 10, 2022

Natural syntax for units in Python

In the past week, there has been an interesting discussion on Python-ideas about Natural support for units in Python. As I have taught introductory courses in Physics for about 20 of the 30 years of my academic career, I am used to stressing the importance of using units correctly, but had never had the need to explore what kind of support for units was available in Python. I must admit to have been pleasantly surprised by many existing libraries.

In this blog post, I will give a very brief overview of parts of the discussion that took, and is still taking place, on Python-ideas about this topic. I will then give a very brief introduction to two existing libraries that provide support for units, before showing some actual code inspired by the Python-ideas discussion.

But first, putting my Physics teacher hat on, let me show you some partial Python code that I find extremely satisfying, and which contains a line that is almost guaranteed to horrify programmers everywhere, as it seemingly reuse the variable "m" with a completely different meaning.

>>> g = 9.8[m/s^2]
>>> m = 80[kg]
>>> weight = m * g
>>> weight
<Quantity(784.0, 'kilogram * meter / second ** 2')>
>>> tolerance = 1.e-12[N]
>>> abs(weight - 784[N]) < tolerance
True

Discussion on Python-ideas

The discussion on Python-ideas essentially started with the suggestion that "it would be nice if Python's syntax supported units".  That is, if you could basically do something like:

length = 1m + 3cm
# or even
length = 1m 3cm

and it just worked as "expected". Currently, identifiers in Python cannot start with a number, and writing "3cm" is a SyntaxError. So, in theory, one could add support for this type of construct without causing any backward incompatibility.

While I never thought of it before, as I use Python as a hobby, I consider the idea of supporting handling units correctly to be an absolute requirement for any scientific calculations. Much emphasis is being made on adding type information to ensure correctness: to my mind, adding *unit* information to ensure correctness is even more important than adding type information.

During the course of the discussion on Python-ideas, other possible suggestions were made, some of which are actually supported by at least a couple of existing Python libraries. These suggestions included constructs like the following:

length = 1*m + 3*cm
speed = 4*m / 1*s  # or speed = 4 * m / s

length = m(1) + cm(3)
speed = m_s(4)

length = 1_m + 3_cm
speed = 4_m_s

length = 1[m] + 3[cm]
speed = 4[m/s]

length = 1"m" + 3"m"
speed = 4"m/s"

density = 1.0[kg/m**3]
density = 1.0[kg/m3]
# No one suggested something like the following
density = 1.0[kg/m^3]

I will come back to looking at potential new syntax for units, as it currently my main interest in this topic. But first, I want to highlight one other main point of the discussion on Python-ideas, namely: Should the units be defined globally for an entire application, or locally according to the standard Python scopes?

My first thought was "of course, it should follow Python's normal scopes". 

Thinking of the opposite argument, what happen if one uses units other than S.I. units in different module, including those from external libraries?  Take for example "mile", and have a look at its Wikipedia entry. If one uses units with the same name but different values in different parts of an application, any pretense of using quantities with units to ensure accuracy goes out the window. Furthermore, many units libraries make it possible for users to define they own custom units. What happens if the same name is used for different custom units in different modules, with variables or functions using variables with units in one module are used in a second module?

Still, as long as libraries do not, or cannot change unit definitions globally, and if they provide clear and well-documented access to the units they use, then the normal Python scopes would likely be the best choice.

[For a detailed discussion of these two points of view, have a look at the thread on Python-ideas mentioned above. There doesn't seem to be a consensus as to what the correct approach should be.]

A brief look at two unit libraries

There are many unit libraries available on Pypi. After a brief look at many of them, I decided to focus on only two: astropy.units and pint. These seemed to be the most complete ones currently available, with source code and good supporting documentation available.

I will first look at an example that shows how equivalent description of units are easily handled in both of them. First, I use the units module from astropy:

>>> from astropy import units as u
>>> p1 = 1 * u.N / u.m**2
>>> p1
<Quantity 1. N / m2>
>>> p2 = 1 * u.Pa
>>> p1 == p2
True

Next, doing the same with pint.

>>> import pint
>>> u = pint.UnitRegistry()
>>> p1 = 1 * u.N / u.m**2
>>> p1
<Quantity(1.0, 'newton / meter ** 2')>
>>> p2 = 1 * u.Pa
>>> p1 == p2
True

In astropy, all the units are defined in a single module.  Instead of prefacing the units with the name of the module, one can import units directly

>>> from astropy.units import m, N, Pa
>>> p1 = 1 * N / m**2
>>> p2 = 1 * Pa
>>> p1 == p2
True

The same cannot be done with pint.

A custom syntax for units

As I was reading posts from the discussion on Python-ideas, I was thinking that it might be fun to come up with a way to "play" with some code written in a more user-friendly syntax for units. After reading the following, written by Matt del Valle, I decided that I should definitely do it.

My personal preference for adding units to python would be to make instances of all numeric classes subscriptable, with the implementation being roughly equivalent to:

def __getitem__(self, unit_cls: type[T]) -> T: return unit_cls(self)

We could then discuss the possibility of adding some implementation of units to the stdlib. For example:

from units.si import km, m, N, Pa

3[km] + 4[m] == 3004[m] # True 5[N]/1[m**2] == 5[Pa] # True

My first thought was to create a custom package building from and depending on astropy.units, as I had looked at it before looking at pint and found it to have everything one might need.  However, as I read its rather unusual license, I decided that I should take another approach: I chose to simply add a new example to my ideas library, making it versatile enough so that it could be used with any unit library that uses the standard Python notation for multiplication, division and power of units, which both pint and astropy do. Note that my ideas library has been created to facilitate quick experiments and is not meant to be used in production code.

First, here's an example that mimics the example given by Matt del Valle above, with what I think is an even nicer (more compact) notation.

python -m ideas -t easy_units

Ideas Console version 0.0.29. [Python version: 3.9.10]

>>> from astropy.units import km, m, N, Pa
>>> 3[km] + 4[m] == 3004[m]
True
>>> 5[N/m^2] == 5[Pa]
True

In addition to allowing '**' for powers of units (not shown above), I chose to also recognize as equivalent the symbol '^' which is more often associated with exponentiation outside of the (Python) programming world.

Let's do essentially the same example using pint instead, and follow it with a few additional lines to illustrate further.

Ideas Console version 0.0.29. [Python version: 3.9.10]

>>> import pint
>>> unit = pint.UnitRegistry()
>>> 3[km] + 4[m] == 3004[m]
True
>>> 5[N/m^2] == 5[Pa]
True
>>> pressure = 5[N/m^2]
>>> pressure
<Quantity(5.0, 'newton / meter ** 2')>
>>> pressure = 5[N/m*m]
>>> pressure
<Quantity(5.0, 'newton / meter ** 2')>

In the last example, I made sure that "N/m*m" did not follow the regular left-to-right order of operation which might have resulted in unit cancellation as we first divide and then multiply by meters.

A look at some details

Using ideas with a "verbose" mode (-v or --verbose), one can see how the source is transformed prior to its execution.  Furthermore, in the case of easy_units, sometime a "prefix" is "extracted" from the code, ensuring that the correct names are used.  Here's a very quick look.

python -m ideas -t easy_units -v

Ideas Console version 0.0.29. [Python version: 3.9.10]

>>> import pint
>>> un = pint.UnitRegistry()
===========Prefix============
un.
-----------------------------
>>> pressure = 5[N/m^2]
===========Transformed============
pressure = 5 * un.N/(un.m**2)
-----------------------------
>>> pressure
<Quantity(5.0, 'newton / meter ** 2')>

Conclusion

Prior to reading the discussion on Python-ideas, I was only vaguely aware of the existence of some units libraries available in Python, and had no idea about their potential usefulness. Many unit libraries are, in my opinion, much  less user-friendly than astropy and pint. Still, I do find the requirements to add explicit multiplication symbols to be more tedious and much less readable than the alternative that I have shown.  While introducing a syntax like the one I have shown would not cause any backward incompatibilities, I doubt very much that anything like it would be added to Python, as it would likely be considered to be too specific to niche applications. I find this unfortunate ... However, I know that I can use ideas in my own projects if I ever want to use units together with a friendlier syntax.

I wrote the easy_units module in just a few hours. It is likely to contain some bugs [1], and is most definitely written as a quick hack not following the best practice. If you do try it, and find some bugs, feel free to file an issue; don't bother looking at the code. ;-)

[1] Indeed, I found and fixed a couple while writing this post.

Tuesday, February 08, 2022

Friendly-traceback and IPython: update

In my previous post, I mentioned that, unlike IPython, friendly/friendly-traceback included values of relevant objects in a traceback.  As I wrote in the update, Alex Hall pointed out that one could get this information by using a verbose mode in IPython.  Here is the previous example when using the verbose mode.



In (1) I enabled the verbose mode. In (3), we see its effect.   (2) is a reminder of the highlighting when it spans many lines.  Regarding the highlighting, here's what I had in the previous blog post:


Alex Hall (yes, him again), the author of stack_data used by both IPython and friendly-traceback, suggested that perhaps a better way would be to have a common indentation. This is what I implemented next:



In my code, this is done in a rather convoluted way. Following a suggestion by Alex, I implemented a change in stack_data itself which yields the correct result, at least when using carets (^) as marker for the location.  If Alex can confirm that it works for stack_data in all cases, this new way of highlighting consecutive lines would likely be automatically incorporated into IPython.

The reason I go into all these datails is as follows: I'm really interested in getting feedback from users so as to make friendly/friendly-traceback even more useful.  So, don't be shy! :-)


Friday, February 04, 2022

Friendly-traceback: trying to stay ahead of IPython

UPDATE: Alex Hall pointed out that IPython can display the values of variables in the highlighted sections using %xmode verbose. He also suggested a different highlighting strategy when the problematic code spans multiple lines.  I go into more details about these two issues in a future blog post.

======

I'm writing this blog post in the hope that some people will be encouraged to test friendly/friendly-traceback with IPython/Jupyter and make suggestions as to how it could be even more useful.

However, before you read any further...

Important clarification: IPython is a professionally developed program which is thoroughly tested, and is an essential tool for thousands of Python programmers.  By contrast, friendly/friendly-traceback is mostly done by a hobbyist (myself) and is not nearly as widely used nor as reliable as IPython. Any comparison I make below should be taken in stride.  Still, I can't help but draw your attention to this recent tweet from Matthias Bussonnier, the IPython project leader:



I don't believe that friendly/friendly-traceback is mature and stable enough to become part of IPython's distribution. However, it is because of this endorsement that I decided to see what I could do to improve friendly/friendly-traceback's integration with IPython.

IPython news

The recent release of IPython included many traceback improvements. One of these changes, shown with the screen capture below:



is something that I am happy to have implemented many months ago as mentioned in this blog post. I have no reason to believe that my idea was the impetus for this change in IPython's formatting of tracebacks. Still, I think it validates my initial idea.

However, there have been other changes introduced in this latest IPython release, such as using colour instead of ^^ to highlight the location of the code causing a traceback is something that I had done only for IDLE but not for other environments such as IPython/Jupyter.  So, I felt that I had to catch up with what IPython has implemented and, if possible, do even better.  Of course, I must recognize that this work is greatly facilitated since I use Alex Hall's excellent stack_data (as well as some other of his packages on which stack_data depends) in friendly-traceback: stack_data is now used by IPython to generate these tracebacks. So, in principle, there is no reason why I shouldn't be able to implement similar features in friendly/friendly-traceback.

Again, I must note that the way I use stack_data is a bit hackish, and definitely not as elegant as it is used within IPython. 

Enough of a preamble, time to provide some actual examples.

First example

Consider the following module which will generate a traceback when imported in IPython:



Here is the result:


We can see not only the lines of code that caused the traceback, but actually the specific parts of each line that caused a problem.  Notice how the display jumps from line 6 to line 8: this is because line 7 is an empty line. Such empty lines are removed to reduce the vertical space require for the display.

I could replicate this example using the friendly console but, instead, I will use the specific IPython integration to see what else we could do at this point. 


We see a traceback that is somewhat similar to a standard CPython traceback, but with an additional hint at the end which gives us an additional clue as to what the cause of the error might be.  Friendly/friendly-traceback can give more information about what() a particular exception means in general, why() it might have happened in this particular case and where() it occurred:


By design, the information provided by where() only focus on the beginning and the end of the traceback, so as to not overwhelm beginners with often irrelevant steps in between. However, notice that in addition to the highlighted parts (new feature!), we also see the values of some objects from these highlighted regions.

Until recently, this was all the information that one could get. However, it is now possible to get more details, in a way similar to that provided by IPython, but with the addition of the values of various objects.  (Note that the syntax shown below to obtain this information is subject to change; it is just a proof of concept.)


Other than the different colour chosen for highlighting (both IPython and this example are done in a Windows Terminal), I also chose to ensure that one line per frame was highlighted, such as the line "import example1".  Do you think this is a good choice, or should I do like IPython?
Finally, I included line 7 which is an empty line, so that beginners (my original target audience) might better recognize their own code instead of seeing a more vertically compressed version. If more than one blank line would be included, they would be replaced by "(...)" indicating that some lines
were skipped.

If the highlighting is not adequate, it can be changed by using either named colours (converted to lowercase with spaces removed) or hexadecimal values; the name of the function and its arguments are subject to change:


Any such change is written to a settings file so that it is remembered for future sessions. Those that prefer traditional Python's notation with ^^ can do so by using None as an argument:


Finally, one can go back to the defaults by specifying no argument:


Example 2


In the previous example, all highlighting regions were part of a single line. However, sometimes the code at fault will spill over two lines. Here's how IPython does its highlighting:


Instead of highlighting each line from the beginning, I chose to not highlight the indentation; is this a better choice?



Jupyter notebook


When IPython is used in a Jupyter notebook (or lab), I chose yet again a different way to present the result. First, let's have a look at a simple example using the Jupyter default.



In this example, only two frames are highlighted.  Let's see the result, using friendly.

We get a basic error message with a button to click if we want to have more details.



We already have seen the output of what() and why() before; this time let's just click on where():



Since we only had two frames in the traceback, where() gives us all the relevant information.

What happens if we have more than two frames in the traceback?  First, let's give an example with the Jupyter default.



What happens if we use friendly in this case?  Below I show the result after clicking "more"


An additional button has appeared.  Note that this is something new that I just did earlier today (before writing this blog post). It is quite possible that there might be bugs if you try it.


Conclusion

These new features are simple proofs of concept that have not been thoroughly tested.  If you read this far, and hopefully tried it on your own, I would really appreciate getting your feedback regarding the choices I made and any improvement you might be able to suggest.

Wednesday, January 05, 2022

Python 101: enabling a restricted subset of Python

I decided to submit to the the Ideas category of Discuss-Python a proposal which I have summarized as follows:

Summary: I propose that a new compile time directive be available to restrict the Python syntax to a strict subset. This would facilitate the teaching of Python to beginners as well as the work of people that write tools intended to help beginners learning Python.

Here is a link to that post.

This is something I have been thinking about for more than a year but always hesitated to submit. It is now done ... feel free to comment over there.

Comments posted on this blog about this particular topic will be deleted so that the discussion can take place at a single location.

Tuesday, December 28, 2021

New milestone for friendly: version 0.5

 Friendly (previously at 0.4.41) and friendly-traceback (previously at 0.4.111) are now at version 0.5. The joint documentation for both projects has not yet been updated.  In addition to the many new cases added for which friendly/friendly-traceback can help with, which includes close to 400 test cases, I am very excited to report to three new important features

  1. Getting help when a traceback is generated before friendly is imported
  2. Not having to set non-default configurations each time friendly is used
  3. The addition of two new languages.

1. Getting help after the fact


Let's start with the first.  Previously, if one wanted help from friendly/friendly-traceback, it had either to be used to run a program, via something like "python -m friendly user_program.py", or it had to be imported and installed (either implicitly or explicitly) before any other code was executed. This still works as before and is the best way to use friendly.

Now, it can be imported *after* a traceback has been generated, and can provide its usual help when using:
  • IPython in a terminal
  • Jupyter notebooks, and Jupyter lab
  • Mu
  • Programs run with cPython using "python -i user_program.py"
  • Code entered in a cPython terminal, with the caveat that it only works in a limited fashion for some SyntaxErrors but almost never for run time errors.
    • The same when using pypy with the exception that using languages other than English may yield some undesirable results.
  • Code saved in files and run from IDLE  (Python 3.10 and possibly later versions of Python 3.9) -- but excluding SyntaxErrors
  • Code entered in IDLE's shell - but excluding SyntaxErrors.
Before explaining the origin of the (different) limitations when using cPython's interactive interpreter or IDLE, let me show the results using IPython, both for SyntaxErrors and run time errors starting with a very unlikely example of SyntaxError


Of course, we can ask for more details

Instead of a SyntaxError, let's see an example of a run time error.

Again, it just works. :-)

Moving on to SyntaxErrors with the cPython interpreter. Let's use the same example as above, with Python 3.10.1:


This works. However, let's have a more detailed look at the information available:


Python does not store the content of the code entered in the interpreter; for SyntaxErrors, it does include the very last line of code where the error was located. This will not work in other situations where a statement spans multiple lines; in some cases, if the error message is precise enough, friendly might still be able to guess the cause of the error.



By contrast, friendly does store the entire code entered in its interpreter.


Let's have a look at a run time error with cPython.

Notice how the traceback contains no information about the code in the file(s) named "<stdin>".
Let's see what information we can get from friendly.


If you use friendly, you would never see the log message (1) as it is something that is enabled by default on my computer. Note that, in spite of not having access to the exact code that produced the exception, in this case friendly is still able to provide some help. This information is similar to what is available with Python 3.10+; however, you can use friendly with Python 3.6 and still get the same information!
Of course, it is better still if you use friendly from the start:





Let's now have a look at IDLE. Recently, IDLE has added support for custom exception hook. Instead of requiring the use of its own console when using IDLE, friendly can make use of this new capability of IDLE to provide help with run time errors - but not SyntaxErrors as those are handled in a peculiar way by IDLE.



For this type of error, trying to use friendly after the fact yields very little useful information.

For SyntaxErrors, the situation is even worse: IDLE does not make any information available.

1.a) A possible improvement to fix the problem with cPython

If you look at the tracebacks from IDLE for runtime errors, you will see "files" with names like 
"<pyshell#4>": each code block entered by the user is saved in such a "file", each file having a different name.  IDLE works around a limitation of Python's linecache module to store the content of these files so that they can be retrieved and analyzed.  By contrast, cPython shows the code entered by a user as belonging to files with identical names "<stdin>" whose content can never be retrieved.

For code executed using exec("code"), the content is shown to belong to a file named "<string>" whose content is also not available.  If cPython were to store the code in files whose named included a different integer each time, like IDLE does, then it could be retrieved by programs like friendly and provide additional help.  This was suggested on Python-ideas for code run using exec, but got not traction, even though related questions are often asked on StackOverflow.

Moving on ...

2. Friendly saves settings

Previously, each time that friendly was used, it started with default values for the preferred language (French and English only in previous versions), color scheme (light or dark, depending on the background of the terminal or other application), formatter type, etc.

Now, friendly saves the values specified which are then used by default when a new session starts. For the language choice, this is a global settings, that carries in all environments.  For other settings, friendly (at least on Windows) can determine if it is running in a PowerShell terminal or an old-fashion cmd, in a Visual Studio Code terminal, in a PyCharm terminal, if it is run with IPython, or in a Jupyter notebook, etc.   Here's an example of adjusting the background color so that the information provided by friendly blends in better.

Friendly only includes two different color scheme: one that is designed to work with a white (or similar) background and another with a black (or similar) background.  Anyone working with terminals (or notebooks) with background colors that do not work well with either of the two existing color schemes is welcome to provide different color schemes to be added to friendly.

So far, I have only tested this with Windows. Mac and Linux users are encouraged to try it out and see if their different environments can be detected correctly so that friendly can work well in all the environment they use it.


3. New languages

In addition to English and French, friendly is available in Spanish (approximately 99% of the translation is done, as I keep adding new information) and about 10% has been translated into Italian.

Conclusion

There is much more I could write about new but smaller additions to friendly since version 0.4.  However, this blog post is already too long and this will have to wait until later - perhaps after I update the existing documentation.

Saturday, November 20, 2021

Friendly-traceback en español

 Friendly and Friendly-traceback are now partially available in Spanish thanks to the work of Mrtín René (https://github.com/martinvilu).

You can have a look at the Spanish translations in context for SyntaxErrors and for other exceptions.

If you are interested in contributing to translations, please join this discussion and have a look at this online collaborative site.


Update: Someone just volunteered to help with the Italian translation. Note that there are more than 600 pieces of text to translate and that more volunteers can help!