Monday, March 02, 2020

True constants in Python - part 2, and a challenge

Like the title says, this is the second post on this topic. If you have not done so, you should really read the first one, which is much shorter, before continuing to read here.

The first clue


The first clue is that, rather than executing test.py as the main module, I imported it. Can you think of a situation where this would make a difference?

I'll leave a bit of space below to give you the opportunity to possibly go back to part 1, without reading about the solution below.















PEP 302


Back in 2002, with the adoption of PEP 302, Python enabled programmers to modify what happens when a module is imported; this is known as an import hook.  For example, it is possible to modify the source code in a module prior to it being executed by Python. This is NOT what I have done here - but I have done this for other examples that I will refer to below.

If the only thing required would be to modify the source, one could use what is described in PEP 263 and define a custom encoding that would transform the source. In this case, by adding an encoding declaration, it would have been possible to run test.py directly rather than executing it.  I thought of it a while ago but, in order to cover all possible cases, one would pretty much have to write a complete parser for Python that could be used to identify and replace the various assignments statement done by print statements so as to show what you saw in part 1.  However, this still would not be enough to protect against the reassignement done externally, like I did with

test.UPPERCASE = 1

The actual solution I used required three separate steps. The challenge I will mention at the end is to reduce this to two steps - something that I think is quite possible but that I have not been able to do yet - and to remove one left-over "cheat" which would allow one to redefine a constant by monkeypatching.  I think that this is possible but I have not actually sat down to actually do it. I thought of waiting for a few days to give an added incentive for anyone who would like to try and get the bragging rights of having it done first! ;-)

Step 1

Step 1 and 2 involve an import hook. They are independent one of another and can be done in any order.

When importing a module, Python roughly does the following:


  1. Find the source code
  2. Create a module object
  3. Execute the source code in the module object's dict.
The module object created by Python comes with a dict that is, in some sense, "read-only": you cannot write code to modify its behaviour, nor replace it by a custom dict. (However, see the challenge.)  However, one can define a custom dict, which is designed so that its various methods (__setitem__, __delitem__, etc.) prevent the reassignment of variables we intend to be constants. In the example I have chosen, these are variables whose names are in UPPERCASE.  (Not shown in the example of part 1: I have also added a scan of the code to identify any variable that used the type hint Final and add them automatically to the list of variables intended to be constants.)

Instead of executing the code in the module object's dict, it is executed in this special dict. The content of that dict is then copied into the module object's dict.

Doing this ensures that code run directly in the module is guaranteed to prevent variable reassignement. At least, I have not found a way to cheat from within a module and change the value of variables intended to be a constant.

Step 2

Step 2 is to define a custom class that prevent changes of attributes. This custom class is used to replace the module's own class, something that can be done.

Step 3

Step 3 is to make Python use our import hook. To do so, we must have some code being executed earlier than what is shown. There are a couple of ways to do this as describe in the Site-specific configuration hook section of the Python documentation. The method I have chosen is one that is easily done on an ad-hoc basis.  I created a file named usercustomize.py whose content is the following:

from ideas.examples import constants
constants.add_hook()

This calls my code that sets up an import hook as described above. To have Python execute this code, I set the environment variable PYTHONPATH to be equal to the directory where usercustomize is located. On Windows (which is what I use), this is most easily achieve by navigating to that directory in the terminal and entering the following:

set PYTHONPATH=%CD%

Doing so will ensure that the code in usercustomize.py is executed before any user code.

The challenge


As mentioned in part 1, attempting to modify the value of a constant from outside, as in:

This leaves one possible cheat. From an external module, instead of writing

import test
test.UPPERCASE = "new value"

which is prevented, one can use the following cheat

import test
test.__dict__["UPPERCASE"] = "new value"

This is because the module's __dict__ is a "normal" Python dict. 

However, instead of using a module object created by Python, it should be possible to create a custom module object that uses something like the special dict mentioned before. Thus one would not need to change the way that Python execute code in the module's dict.

The challenge is to write code that creates such a module object.   I would not be surprised if there remained some other ways to cheat after doing so, but hopefully none as obvious as the one shown above.

Resources


The code I have written is part of my project named ideas.  The actual code for the constants example is given by this link.  See also the documentation for the project.  Note that, token_utils mentioned in the documentation has been put in a separate project; I need to update the documentation.

Both ideas and token-utils can be installed from Pypi.org as usual.




3 comments:

Veky said...

You really need to define your rules more strictly. For example, how about defining a module test2, copying over everything from test to test2 except UPPERCASE, putting test2.UPPERCASE = whatever, and then test = test2 (or even sys.modules['test'] = test2?

André Roberge said...

The challenge is to create an import hook that creates a custom module instead of relying on Python's semantics, and have that custom module use something like the FinalDict of the module constants. See https://github.com/aroberge/ideas/blob/master/ideas/import_hook.py#L131 for the step where that custom module would be used.

AlbertMietus said...

Nice idea . **Good challenge**

Have to think a bit about it (deeply) to *invent* a elegant solution...

Thanks