Saturday, December 31, 2005

Journey to 117

This blog entry is dedicated to Michael Spencer whose 120 character-long entry in the PyContest was submitted less than 90 minutes after the official beginning of submissions. Michael's entry gave a target to aim for which few people managed to reach. I strongly suspect that, in large part because of this, various hints have been posted on the web, enabling more people to slowly crawl towards ever-shorter solutions. I must admit that without the various hints I have read, I would never have been able to challenge Michael's solution.


After being asked a few questions here and elsewhere, I've decided that I should try and document the reasoning that eventually led me to my final solution to the PyContest. Of course, it is going to be more linear here (and with fewer dead-end explorations) than it was in real life. I definitely welcome comments from others' journeys.


First solution


After writing on paper the desired output for an input that included all digits, I quickly wrote down the following program to produce the desired output:



def seven_seg(i):
~~n=[[" _ "," "," _ "," _ "," "," _ "," _ "," _ "," _ "," _ "],
~~~~~["| |"," |"," _|"," _|","|_|","|_ ","|_ "," |","|_|","|_|"],
~~~~~["|_|"," |","|_ "," _|"," |"," _|","|_|"," |","|_|"," _|" ]]
~~m=''
~~for j in 0,1,2:
~~~~for d in i:
~~~~~~~~m+=n[j][int(d)]
~~~~m+='\n'
~~return m
print seven_seg('0123456789')

In an attempt to preserve the visual appearance, I have replaced all the leading spaces by "~". In the future, whenever you see a "~", take it to be a space. After removing the print statement, the number of characters is 310, which is quite a ways from 117. One can reduce the number of characters by removing many unnecessary spaces but I will not do that until we are much closer to the "final" solution. Furthermore, I will keep the explicit for loops rather than using generators [with the ''.join() method] until the very end as I find the code easier to read that way. I will, however, move the [m=''] assignment within the function argument, thus removing a line (and at least two extra characters!) from the solution.


My next step was to look at the various "triplets" (three-character combinations) that appeared in "n". Quite a few are repeated. In fact, there are 7 unique triplets; they are:



"~~~", "~_~", "~_|", "|_~", "|_|", "~~|", "|~|"

With this ordering, and giving each triplet a number from 0 to 6, we have each input digit can be given by the following three number combination:
0=164, 1=055, 2=123, 3=133, 4=065, 5=132, 6=134, 7=155, 8=144, 9=142.


Note that the first digit of each triplet is either a 0 or a 1; this will be important later. Using these triplets leads to a second, shorter, solution:



def seven_seg(i, m=''):
~~n=["~~~","~_~","~_|","|_~","|_|","~~|","|~|"]
~~for j in '1011011111', '6622433544', '4632524542':
~~~~for d in i:
~~~~~~m+= n[int(j[int(d)])]
~~~~m+= '\n'
~~return m

Note that the string '1011011111' represents the first digit of each triplet, and that the order in which they appear is '0123456789'.


By moving the inner for loop statement (5th line) at the end of the 4th line, using a single space for the first indentation and a tab character (expanded by Python to 8 spaces) for the second level of indentation and removing other non-required spaces, it is possible to reduce the length of the code to 171 characters -- still a long way from our final goal.


The next step is to eliminate "n" altogether; we can do this in two steps.
The first step is simply use it as is in the line



def seven_seg(i, m=''):
~~for j in '1011011111', '6622433544', '4632524542':
~~~~for d in i:
~~~~~~m+= ["~~~","~_~","~_|","|_~","|_|","~~|","|~|"][int(j[int(d)])]
~~~~m+= '\n'
~~return m

The second step is to try to replace the list of strings by a single string and extract, when assigning, a 3-character substring. The shortest such string that we can construct is 14 character long. There are many such strings; here is one of many examples:



'~_~_|_~~~|~|_|'
~01234567890123

Using this string, we see that the last 3-character substring, '|_|', starts at index 11. To encode the position of each beginning index, we would need at least 2 digits for some. A better way appears to lengthen the string a bit so that each required 3-character substring starts on an even index -- the last character of one substring being the first character of the next. Here is one such solution -- which, other than for some unnecessary spaces, is the exact first solution that I submitted at the exact opening of the contest at 14:00 UTC; for one minute it was in first place. :-)



def seven_seg(i, m=''):
~for s in '5455455555', '2600133611', '1630601610':
~~for k in i:
~~~~j=int(s[int(k)])*2;
~~~~m+='~_|_|~|_~~~_~~|'[j:j+3]
~~m+='\n'
~return m

Not having done any programming for a while, I didn't realise at the time that 'some_string'[j:j+3] could be written as 'some_string'[j][:3], thereby saving one assignment (to j) and some precious characters all around.


The next step is to try to shorten the three long numbers: '5455455555', '2600133611' and '160601610'. This can be done by encoding them in a base other than 10 and extracting the result when needed. A better way however is to re-arrange the number as a series of triplets and then encode the result. For example, the three long numbers just mentioned can be re-ordered as follows:



0=521, 1=466, 2=503, 3=500, 4=416, 5=530, 6=531, 7=566, 8=511, 9=510

This gives rather large integers (ex: 521) to try to encode in a base different from 10. A better choice would be to use the ordering



0=164, 1=055, 2=123, 3=133, 4=065, 5=132, 6=134, 7=155, 8=144, 9=142

that we had mentioned earlier, as it gives smaller integers. Now, Python's built-in function int() allows conversion to base 10 from other bases up to 36. Still, that would require a 2 character encoding from each triplet. A better choice is to take each triplet to be the decimal representation of an ascii character. As each triplet is a number less than 255, this is certainly possible (barring some unfortunate coincidence when a given triplet would correspond to a newline character.)


Actually, since none of the individual digits in the triplets is greater than 6, we can take them to be numbers in a base less than 10; from what I gather, most people chose 8 (as they try to do stuff with bit shifts and the like) and I chose 7 (for one of my submissions). Thus, a number like 164 is taken to mean

164 (in base 7)= 1*49 + 6*7 + 4 = 95 (in base 10). This will help further reduce the number of characters. Let's assume that everything is correct with these choices (i.e. still no newline character) and that the encoding of our triplets can be represented as



0=z, 1=o, 2=t, 3=T, 4=f, 5=F, 6=s, 7=S, 8=e, 9=n

where I have use the first letter of each number as a representation of its encoding.


As an aside, some useful comments about this (and other aspects of this problem) can be found on the following two blog entries (including comments):

Guyon Morée's blog




Roberto Alsina's blog


The corresponding program, still written with the explicit for-loops, can be written schematically as follows:



def seven_seg(i, m=''):
~for a in 49, 7, 0:
~~for k in i:
~~~~m+='~_|_|~|_~~~_~~|'[ord("z0tTfFsSen"[int(k)])/a%7*2:][:3]
~~m+='\n'
~return m

Let's examine what the innermost loop does. We convert each character in the input string into a digit, extract the corresponding ascii-encoded triplet, decode it into an actual number, divide by the appropriate power of our base and take the modulo in that base, thus extracting the relevant digit in the triplet; the result is multiplied by 2 to find the relevant index in our string of "~|_". So you can see the advantage of using a base less than 10 as it takes fewer characters to write.


To make the solution as short as possible, we rewrite it as a lambda and use join() and generators instead of explicit for loops. This also eliminates the use of an extra variable ("m"). The result can be written as:



j=''.join
seven_seg=lambda i:j(
j('~_|_|~|_~~~_~~|'[ord("z0tTfFsSen"[int(k)])/a%7*2:][:3]for k in i)
+'\n' for a in(49,7,0))

When all the extra spaces are removed, this gives a 121-character long solution. I'll refer you to the already mentioned blogs for a 120 character long solution!


To get to 117 characters, we need a slightly different approach.


We need to go back to some assumptions we made earlier. Do we really need to have even-numbered indexes? If not, we could use a shorter "~|_" type string which included all relevant 3-character substrings, thereby possibly saving one character. Let us look at such a possible string and the various indices. I will take the string of my posted 117 character long solution '~_~~~|_|_~_|~|', and extract the location of the indices of the relevant substrings:
"~~~":2, "~_~":0, "~_|":9, "|_~":7, "|_|":5, "~~|":3, "|~|":11.
With these 3-character substrings, our triplets are:


0=0-11-5, 1=233, 2=097, 3=099, 4=253, 5=079, 6=075, 7=033, 8=055, 9=059


For each triplet, instead of encoding it in a given base, we will look at a number which, when taking the modulo with three different bases, will give us back the relevant digit. By this, I mean something like the following:



165%3 = 0
165%14 = 11
165%10 = 5

i.e., the triplet for "0" could be represented by the integer 165 when taking the modulo with 3, 14 and 10 respectively - as I have done in the 117 solution. Using brute force, which is what one does when one is too tired, one can loop over the integers less than 255 and find out if all the relevant triplets can be found. Or, you notice that 11*5*3 = 165 [11%14=11, 5%10=5, 3%3=0], and you proceed from there. With this information (and the comments on the previous blog entry), you should now have all the relevant information to understand how this solution was obtained.


On that final note, I wish you a very Happy New Year!

11 comments:

  1. Anonymous9:17 PM

    Just one remark about the final step where you said you used bruce force to calculate the 10 needed bytes and it was sheer luck that they turned out to be < 256.

    Actually, it's not really sheer luck but mathematics. There is a famous old theorem about the existence and uniqueness of the solutions, and a fast algorithem for finding these numbers. It's called "Chinese remainder theorem." This theorem says there is a unique solution modulo the least common divisor, which is in this case 210.

    That's why you can simply add 210 to get rid of the characters lower than 32. I.e. you can simply use chr(213) = Õ instead of the ugly 0x3 character.

    With a little math you can spare a lot of programming - but ok, with a little programming you can spare a lot of math as well ;-)

    ReplyDelete
  2. Thanks Christoph, I should have been able to figure that one out. It's been too long since I've had to think about math.

    ReplyDelete
  3. Christoph:

    Actually, it was (almost) sheer luck. As it turned out, the 10 "triplets", which I will denote by (a,b,c) were such that both "b" and "c" were odd numbers. If, for at least one of the triplets, they would have been such that one of (b|c) would have been even while the other onen would have been odd, then they could never have been generated simultaneously through %14 and %10.

    ReplyDelete
  4. Anonymous10:37 PM

    congratulations Andre, well deserved winner!

    ReplyDelete
  5. Anonymous5:38 AM

    Andre:

    You're right; there was some luck involved as well. Actually, the Chinese Remainder Theorem guarantees a solution only in the case when the three numbers are coprime (have a gcd of 1). For instance, you will always find a solution mod 3, 13, 10. The only problem is that the solution varies in the range of the lcm; in this case 3*10*13=390 which may not fit into one byte. So 3,14,10 was a better triple because its lcm is only 3*2*5*7=210. Because these numbers are not coprime (both 10 and 14 are even), the existence of the solution is not guaranteed, but a certain condition needs to be met (see http://en.wikipedia.org/wiki/Chinese_remainder_theorem): The right sides of the equations must be pairwise congruent modulo the respective gcd's. In the case of

    x%2 = i1
    x%10 = i2
    x%14 = i3

    the condition is that i2 and i3 must be congruent modulo the gcd of 10, 14 which is 2, i.e. i2 and i3 must be either both even or both odd which explains your observation.

    ReplyDelete
  6. Anonymous2:42 PM

    Hi André,

    nice work done. Congratulations for the win.

    >> I definitely welcome comments from others' journeys.

    Maybe for the moment it's a bit more fruitful to prolong your journey a bit.

    You like to see a 116 character-long solution in your blog that says journey to 117?

    Don't be afraid, it's still your solution with less optimizations ;-)

    First I want to repeat your 117 character version from [http://aroberge.blogspot.com/2005/12/pycontest-challenge-117-character-long.html]

    I also removed the single quotes inside the double quotes of "'\xa5\x8f\xb1\xdb\xad\xbdi\x03K\x9f'" to make it work correctly:

    j=''.join;seven_seg=lambda z:j(j(' _   |_|_ _| |'[ord("\xa5\x8f\xb1\xdb\xad\xbdi\x03K\x9f"[int(a)])%u:][:3]for a in z)+"\n"for u in(3,14,10))

    And now let's get rid of another character. Just remove the ''.join optimization in conjunction with the newline string concatenation. And here it is, the 116 character-long solution:

    seven_seg=lambda z:'\n'.join(''.join(' _   |_|_ _| |'[ord("\xa5\x8f\xb1\xdb\xad\xbdi\x03K\x9f"[int(a)])%u:][:3]for a in z)for u in(3,14,10))

    With best greetings from Germany
    *Markus Schramm

    ReplyDelete
  7. So, the characters
    j=''.join;jj+'\n'
    are replaced by
    ''.join'\n'.join

    Ingenious!

    ReplyDelete
  8. Anonymous5:22 PM

    That 116 character solution is not valid: it doesn't end with a newline character, so it won't pass the tests.

    ReplyDelete
  9. Anonymous3:44 PM

    Yes. Unfortunately the test required a final '\n'. So 117 still seems to be the philosopher's stone.

    BTW, in my last comment there was a typo. The set of congruence equations should be

    x%3 = i1
    x%14 = i2
    x%10 = i3

    The rest was correct. The gcd of the modulus of the last two equations is 2, so i2 and i3 must be both odd or both even; if this is the case you will always get a unique solution x<210.

    ReplyDelete
  10. Wow, kinda cool stuff. May require brain surgery for me to understand it a short period of time. I try to think, but nothing happens. Hey, Moe! Hey, Larry!

    ReplyDelete
  11. The function
    seven_seg=lambda z:'\n'.join(''.join(' _ |_|_ _| |'[ord("\xa5\x8f\xb1\xdb\xad\xbdi\x03K\x9f"[int(a)])%u:][:3]for a in z)for u in(3,14,10))

    can be reduced by 2 characters using the trick that '\x03' can be written as octal like '\3':

    seven_seg=lambda z:'\n'.join(''.join(' _ |_|_ _| |'[ord('\xa5\x8f\xb1\xdb\xad\xbdi\3K\x9f'[int(a)])%u:][:3]for a in z)for u in(3,14,10))

    :-)

    ReplyDelete

Spammers: none shall pass.

Note: Only a member of this blog may post a comment.