Thursday, October 27, 2022

Better NameError messages for Python

Python 3.11 is barely out and already the 3.12 alpha has some improvements for NameError messages. I suspect that these will be backported to 3.11 in time for the next release. 

On Ideas Python Discussion, Pamela Fox suggested that it might be useful to consider potential missing import when a NameError was raised. Thus, instead of having

>>> stream = io.StringIO()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'io' is not defined. Did you mean 'id'?

one would see

>>> stream = io.StringIO()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'io' is not defined. Did you mean 'id'? Or did you forget to import 'io'?

Of course, something like this was already done by friendly/friendly-traceback (aka Friendly). However, in this particular case, the information provided by Friendly contained too much information; this has been since fixed.

To no one's surprise, Pablo Galindo Salgado came up with a version of this for Python, where names found in sys.stdlib_module_names were considered and potentially added, with the result as initially suggested by Pamela Fox. Pamela then made a second suggestion to see if names of popular third-party libraries could also be considered. This, for now, appears to be out of scope for Python.

This set the stage for a friendly (pun intended) competition...

I decided to revise what I had done for Friendly in such cases and found some room for improvements. First, let's look at a couple of examples (with screenshots) of the new behaviour for Python.


As we can see with the first example, Python first makes suggestions about potential typos ('io' instead of 'id') followed by the suggestion about a missing import. Note that 'id' is a builtin who is never used with a dotted attribute.

The second example suggest a missing import only. However, as I am using Windows, this module does not exist.

Can Friendly do better?  Note that Friendly can be used with Python 3.6+ (including Python 3.12), all of which would show the same output. I've chosen to use Python 3.10 for this example, as I will explain near the end of this post.


The message included in the Python traceback does not include the additional hint about a missing import in this case. However, Friendly adds it on its own.  Note that it does not suggest 'id' as a potential typo. But what if we had made such a typo?


Here, Friendly does make the suggestion about a potential typo.  What about the second example given above?



Friendly also uses sys.stdlib_module_names initially, but also check with importlib.util.find_spec() to see if the module can be located.

It can also find potentially relevant third-party modules that are installed, but not yet imported.


Using importlib.util.find_spec() allows us to implement Pamela Fox's suggestion about suggesting third-party modules that are installed.

However, we can do even better with some dedicated code. To demonstrate this, I need to use the latest addition to the "friendly-traceback family" - which I have only tested with Python 3.10 so far.


I'll likely have more to say about friendly_pandas in the near future.

Final thoughts

For those excited about the improved traceback with Python 3.11 and PEP 657: Fine-grained error locations in tracebacks, but cannot yet install Python 3.11, please note that Friendly can already something similar, if not better with any Python version 3.6.1+



I say "better" because, unlike Python's traceback, the information is not limited to a single line of code:




Tuesday, October 18, 2022

pandas' SettingWithCopyWarning: did I get it right?

 I am just beginning to learn pandas and am looking to provide some automated help. From what I read, it appears that SettingWithCopyWarning is something that confuse many people. Is the following correct?

In [2]:
df = pd.DataFrame([[10, 20, 30], [40, 50., 60]],
                  index=list("ab"),
                  columns=list("xyz"))
In [3]:
df.loc["b"]["x"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [4]:
# What is SettingWithCopyWarning ?
what()
Pandas occasionally emits a SettingWithCopyWarning when you use       
'chained indexing', either directly or indirectly,and you then attempt
to assign a value to the result. By 'direct chained indexing', we mean
that your code contains something like:                               

...[index_1][index_2] = ...                                           

During the first extraction using [index_1], pandas found that the    
series to be created contained values of different types. It          
automatically created a new series converting all values to a common  
type. The second indexing, [index_2] was then done a this copy instead
of the original dataframe. Thus, the assigment was not done on the    
original dataframe, which caused Pandas to emit this warning.         

An 'indirect chained indexing' essentially amount to the same problem 
except that the second indexing is not done on the same line as that  
which was done to extract the first series.                           
In [5]:
# Can I get more specific information for what I just did?
why()
You used direct chained indexing of a dataframe which made a copy of  
the original content of the dataframe. If you try to assign a value to
that copy, the original dataframe will not be modified. Instead of    
doing a direct chained indexing                                       

df.loc["b"]["x"] ...                                                  

try:                                                                  

df.loc["b", "x"] ...                                                  
In [6]:
# What about if I tried to use indirect chaining. 
# There are two possibilities
series = df.loc["b"]
series["x"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [7]:
where()
Warning issued on line 4 of code block [6].                                                         

       1| # What about if I tried to use indirect chaining.  
       2| # There are two possibilities
       3| series = df.loc["b"]
     > 4| series["x"] = 99
In [8]:
why()
I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:              

series = df.loc[...]                                                  

This made a copy of the data contained in the dataframe. Next, you    
indexed that copy                                                     

series["x"]                                                           

This had no effect on the original dataframe. If your goal is to      
modify the value of the original dataframe, try something like the    
following instead:                                                    

df.loc[..., "x"]                                                      
In [9]:
# What if I do things in a different order
series_1 = df["x"]
series_1.loc["b"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [10]:
where()
Warning issued on line 3 of code block [9].                                                         

       1| # What if I do things in a different order
       2| series_1 = df["x"]
     > 3| series_1.loc["b"] = 99
In [11]:
why()
I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:              

series_1 = df[...]                                                    

This made a copy of the data contained in the dataframe. Next, you    
indexed that copy                                                     

series_1.loc["b"]                                                     

This had no effect on the original dataframe. If your goal is to      
modify the value of the original dataframe, try something like the    
following instead:                                                    

df.loc[..., "b"]                                                      
In [12]:
# What if I had multiples data frames?
df2 = df.copy()
series = df.loc["b"]
series["x"] = 99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [13]:
where()
Warning issued on line 4 of code block [12].                                                        

       2| df2 = df.copy()
       3| series = df.loc["b"]
     > 4| series["x"] = 99
In [14]:
why()
In your code, you have the following dataframes: {'df2', 'df'}. I do  
not know which one is causing the problem here; I will use the name   
df2 as an example.                                                    

I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:              

series = df2.loc[...]                                                 

This made a copy of the data contained in the dataframe. Next, you    
indexed that copy                                                     

series["x"]                                                           

This had no effect on the original dataframe. If your goal is to      
modify the value of the original dataframe, try something like the    
following instead:                                                    

df2.loc[..., "x"]                                                     

Monday, September 19, 2022

New milestone: friendly/friendly-traceback version 0.6 (and why not 1.0)

Just a few minutes ago, @isidentical tweeted that PyPy 3.9 had implemented the new enhanced tracebacks that are going to be part of cPython 3.11.  Of course, I had to reply to show that friendly/friendly-traceback have been able to do the same with all cPython version starting with 3.6.1.

A few hours before this tweet, I had bumped the minor version number of both friendly and friendly-traceback from 0.5 to 0.6. I always keep them in sync; previously, friendly-traceback was at 0.5.63 and friendly was at 0.5.42.

friendly builds on friendly-traceback and is the package you want to install as an end-user. If you're just interested at retrieving the data produced by friendly-traceback and format it your own way, as do https://www.hackinscience.org/ and https://futurecoder.io/, then you only need to install friendly-traceback.  Both these websites have been making use of friendly-traceback quite successfully for quite a while. From that point of view, friendly-traceback is really mature enough to be considered as being a 1.0 version.  However, other  than always including more cases being covered, I have some interesting new additions planned, which makes me postpone giving it a 1.0 version number.

Quite a bit has been done since version 0.5. In particular, Tamil and Russian have been added as supported language. The syntax highlighting of the traceback location has been improved in friendly. Support for a new project, friendly_idle has been added. I've previously described


Note that, when some new highlighted code is shown with friendly_idle, the highlighting done on previous line of codes is removed: this is by design, to help focus the attention on the latest area with problems. 

I could say more about friendly ... but, why don't you try it out by yourself and see what you think. Feedback is always appreciated!


Friday, June 17, 2022

friendly_idle is done!

friendly_idle is done!

I've found a better solution for the remaining issue I had mentioned in the previous blog post.

I also found a fix for an "annoyance" mentioned by Raymond Hettinger on Twitter!

I could have changed the version to 1.0 ... but decided to wait until I get more feedback from users.


Tuesday, June 14, 2022

Friendly IDLE

friendly_idle is now available.  This is just a quick announcement. Eventually I plan to write a longer blog post explaining how I use import hooks to patch IDLE and to provide seamless support for friend/friendly_traceback.  Before I incorporated "partial" support for IDLE within friendly, I had released a package named friendly_idle ... but this is really a much better version.


When you launch it from a terminal, the only clue you get that this is not your regular IDLE is from the window title.


Since Python 3.10 (and backported to Python 3.8.10 and 3.9.5), IDLE provide support for sys.excepthook() (see announcement).  Actually, in the announcement, it is not pointed out that this is only partial support: exceptions raised because of syntax errors cannot be captured by user-defined exception hooks.  However, fear not, friendly_idle is perfectly capable to help you when your code has some syntax errors.


And, of course, it can also do so for runtime errors.


The same is true for code run from a file as well:



If the code in a file contains some syntax error, friendly_idle is often much more helpful than IDLE. Here's an example from IDLE:
And the same example run using friendly_idle

Unfortunately, the tkinter errorbox does not use a monospace font (assumed by friendly/friendly_traceback for the formatting), and does not allow customization.  I might have to figure out how to create my own dialog, hopefully with support for monospace font and colour highlighting. If anyone has some experience doing this, feel free to contact me! ;-)







Saturday, June 11, 2022

Nicer arithmetic with Python

Beginning programmers are often surprised by floating point arithmetic inaccuracies. If they use Python, many will often write posts saying that Python is "broken" when the see results as follows:

>>> 0.1 + 0.2
0.30000000000000004

This particular result is not limited to Python. In fact, it is so common that there exists a site with a name inspired by this example (0.30000000000000004.com/), devoted to explaining the origin of this puzzling result, followed by examples from many programming languages.

Python provides some alternatives to standard floating point operations. For example, one can use the decimal module to perform fixed point arithmetic operations. Here's an example.

>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 7
>>> Decimal(0.1) + Decimal(0.2)
Decimal('0.3000000')
>>> print(_)
0.3000000

While one can set the precision (number of decimals) with which operations are performed, printed values can carry extra zeros: 0.3000000 does not look as "nice" as 0.3.

Another alternative included with Python's standard library is the fractions module: it provides support for rational number arithmetic.

>>> from fractions import Fraction
>>> Fraction("0.1") + Fraction("0.2")
Fraction(3, 10)
>>> print(_)
3/10

However, the fractions module can yield some surprising results if one does not use string arguments to represent floats, as was mentioned by Will McGugan (of Rich and Textual fame) in a recent tweet.

>>> from fractions import Fraction as F
>>> F("0.1")
Fraction(1, 10)
>>> F(0.1)
Fraction(3602879701896397, 36028797018963968)

In the second case, 0.1 is a float which means that it carries some intrinsic inaccuracy. For the first case, some parsing is done by Python to determine the number of decimal places to use before converting the result into a rational number. A similar result can be achieved using the limit_denominator method of the Fraction class:

>>> F(0.1).limit_denominator(10)
Fraction(1, 10)

In fact, we do not have to be as restrictive in the limitation imposed on the denominator to achieve the same result

>>> F(0.1).limit_denominator(1_000_000_000)
Fraction(1, 10)

While we can achieve some "more intuitive" results for floating point arithmetic using special modules from Python, the notation that one has to use is not as simple as "0.1 + 0.2". As Raymond Hettinger often says: "There has to be a better way."

Using ideas

As readers of this blog already know, I created a Python package named ideas to facilitate the creation of import hooks and to enable easy experimentation with modified Python syntax. ideas comes with its own console that support modified Python syntax. It can also be used with IPython (and thus with Jupyter notebooks).

Using ideas, one can "instruct" python to perform rational arithmetic.  For example, suppose I have a Python file containing the following:

# simple_math.py

a = 0.2 + 0.1
b = 0.2 + 1/10
c = 2/10 + 1/10
print(a, b, c)

I can run this with Python, getting the expected "unintuitive" result:

> py simple_math.py
0.30000000000000004 0.30000000000000004 0.30000000000000004

Alternatively, using ideas, I can execute this file using rational arithmetic:

> ideas simple_math -a rational_math
3/10 3/10 3/10

Using a different import hook, I can have the result shown with floating point notation.

> ideas simple_math -a nicer_floats
0.3 0.3 0.3

Instead of executing a script, let's use the ideas console instead, starting with "nicer_float"

ideas> 0.1 + 0.2
0.3

ideas> 1/10 + 2/10
0.3
For "nicer_float", I've also adopted the Pyret's notation: floating-point number immediately preceded by "~" are treated as "approximate" floating points i.e. with the regular inaccuracy.
ideas> ~0.1 + 0.2
0.30000000000000004

And, as mentioned before, I can use ideas with IPython. Here's a very brief example

IPython 8.0.0b1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ideas.examples import rational_math

In [2]: hook = rational_math.add_hook()
   The following initializing code from ideas is included:

from fractions import Fraction

In [3]: 0.1 + 0.2
Out[3]: Fraction(3, 10)
  

Final thoughts

Given how confusing floating point arithmetic is to beginners, I think it would be nice if Python had an easy built-in way to switch modes and do calculations as done with ideas in the above examples. However, I doubt very much that this will ever happen. Fortunately, as demonstrated above, it is possible to use import hooks and modified interactive console to achieve this result.

Friday, May 13, 2022

Python 🐍 fun with emojis

At EuroSciPy in 2018, Marc Garcia gave a lightning talk which started by pointing out that scientific Python programmers like to alias everything, such as

import numpy as np
import pandas as pd

and suggested that they perhaps would prefer to use emojis, such as

import pandas as 🐼

However, Python does not support emojis as code, so the above line cannot be used.

A year prior, Thomas A Caswell had created a pull request for CPython that would have made this possible. This code would have allowed the use of emojis in all environments, including in a Python REPL and even in Jupyter notebooks. Unsurprisingly, this was rejected.

Undeterred, Geir Arne Hjelle created a project called pythonji (available on Pypi) which enabled the use of emojis in Python code, but in a much more restricted way. With pythonji, one can run modules ending with 🐍 instead of .py from a terminal. However, such modules cannot be imported, nor can emojis be used in a terminal.

When I learned about this attempt by Geir Arne Hjelle from a tweet by Mike Driscoll, I thought it would be a fun little project to implement with ideas.  Below, I use the same basic example included in the original pythonji project.


As you can see, it works in ideas' console, when importing module. It can also work when running the 🐍 file as source - but leaving the extension out.



And, it works in Jupyter notebooks too!


All of this without any need to modify CPython's source code!

😉