Monday, January 07, 2008

More encoding pains...

It seems like every early January brings some new encoding pains...

Eons
ago (less than a year) I used to program on a Windows based computer. My Windows user name was simply André, which is not surprising since it's my first name. However, the observant reader may have noticed that my name requires one non-ASCII character. Such a small detail... that can cause so much annoyance.

Two years ago, on January 6th, I wrote about using site customization so that my favourite Python editor at the time (SPE) could deal with my user name.

Last year, on January 3rd, I wrote about how Crunchy dealt with encoding issues in a way that was independent of any site customizations. At the time, an astute reader made the following comment (which I had forgotten until today, when I decided to write this blog entry):

Without having looked at the rest of your code, so I might be completely off here, this somehow looks wrong:

result = result.decode(sys.getdefaultencoding()).encode('utf-8')

The reason I say this is that you're decoding and encoding in the same place. Since Python unicode support is so good, it's generally a good idea to decode to unicode any use input you get as early as possible, and to encode only as late as possible when outputting strings. Since you're doing complicated web ui stuff here, so it may be that you're not doing anything with 'result' between input and returning it to the browser, but if you are, the string should have already been decoded by the time it gets to this line. Otherwise this will bite you anytime you try to do anything with the string like simple concatenation. [emphasis added]
Of course, since Crunchy was working properly, I quickly dismissed that comment. With the way everything was implemented, Crunchy was working just fine... In fact, to this day, if you download the latest public release (0.9.8.6), everything works just fine - even if your user name includes non-ASCII characters.

However ... in adapting Crunchy to works with Python 3.0, I do things with the various strings like simple concatenation ... and, of course, this cause some problems as I found out when I "borrowed" the old Windows computer from my daughter to try the latest changes I had made.

It. Did. Not. Work.

Ok, so I have two possible solutions:
  1. Trade with my daughter for a while, letting her use my MacBook (which she loves!) while I use the "old" Windows desktop.
  2. Create an account with an accented name on my MacBook.
Well, as much as I love my daughter, I could not face the pain of going back to using the Windows desktop as my main computer. So, solution 2 was an easy choice.

Except that it wasn't....

My account name on the mac is my full name (André Roberge). Hmmm... this has already a non-ASCII character. But my home directory is "andre" - no accent. Under Mac OS, a user has a full name that appears in the login window, and a short name used for the root directory (/Users/andre in my case). When I tried to create a new account with a non-ASCII character in the short name, it just beeped and refused to enter it.

In search for an answer, I posted on three Mac related forums, got either no answer or some unhelpful and wrong answer on two of them ... I was considering posting on the Python list, but, fortunately, I eventually got a useful answer.

So, if you are thinking of writing i18n applications that either can run from any directory or that save information in the user's default directory, or both, and you want to make sure that it will work on a Windows computer, here's how you can do it on a Mac (under OS X 10.5):
  1. Create a test account user the account manager under any name of your choosing; however, the "short name" will have to be ASCII only (at this stage).
  2. From the account manager, ctrl-click on the account after it has been created; this will bring an advanced dialog.
  3. Edit the advanced dialog to change the short name, and the home directory, to the desired value. I chose the name "accentué" (self-referencing name, if you know French). Note that doing so does not change the actual name of the directory.
  4. Go to a terminal window and do "sudo mv old_name new_name" to change the name of the home directory that was created at step 1.
After I did all this, the development version of Crunchy did not work from the new account. This pleased me very much: it likely will meant that I do not have to trade computers with my daughter to continue working on Crunchy. ;-)

No comments: