Simulated Annealing

A way forward was found in the mid-1980’s in the form of the Simulated Annealing algorithm. This accepts a less-fit key every so often, equivalent to taking a downward step that leads away from being marooned on a local maximum. How to determine when to accept the worse key?

The SA algorithm follows three rules, which intuitively seem reasonable:

(1) only accept a worse key occasionally;
(2) greatly improve the chances of accepting if the child is close to
the parent;
(3) be less likely to accept as the parent improves, because the
parent is then more likely to be correct.

The way to implement these rules is somewhat arcane, and so takes a little effort to grasp.

When presented with a worse key, the SA algorithm does two things:

first, it calculates a value 'p' from the equation

                               p = e(dS/T)


meaning p is made equal to e raised to the power (dS/T)

    where T stands for Temperature, which at the beginning of SA is set high, let us say at 100;
    dS is the negative quantity (child score - parent score)
    e  is the exponential constant 2.718.

If you do a few sums, you will find that p is always between zero and 1.

secondly, it gets a number at random between zero and 1. if p is greater than this number, the child passes and becomes the new parent.

Now you can see from the equation that the size of p gets greater as dS gets smaller – in other words as the child score gets closer to the parent. So the chance of accepting the worse child improves as the child gets closer to the parent –- which is what we want.

Also the exponential nature of the equation ensures that this improvement in acceptance is not linear, but mounts rapidly as the child gets closer to the parent (see the chart below) which again is what we want.



As implementation proceeds, the Temperature is dropped in steps. The impact of a smaller value for ‘T’ is to make p smaller, and thus to make less likely the acceptance of a worse child key. This is what we want to meet the 3rd consideration mentioned above.

So now I can put together the SA algorithm as follows:

Set Temperature = 100
Do this loop
    make a key at random; decipher and score;
    call this the parent;
    set the counter to zero;
    Do
        make a change to the parent to create a child;
        decipher with this child key and score;
        calculate diff = (child score – parent score);
        if diff is positive, make the child the parent;
        if diff is negative
            calculate p from equation;
            get a random number from 0 to 1;
            if p is greater than the random number make the child the parent;
            else increment the counter;
    while the counter is less than cycle length of 50,000
    Reduce Temperature by 5.
    If Temperature=0 then stop.

During the process the highest scoring keys are displayed, together with their plaintexts, and hopefully the solution will appear. Otherwise that annealing run has failed and the whole process must be repeated from scratch by running the program again.

This algorithm gets its name because it resembles the process of metal annealing in which the metal is heated high enough to rupture its crystal structure and then is cooled in a number of steps -- with the temperature of the whole mass being allowed to reach equilibrium at each step. The intention is to form uniform sized crystals, and the probability of so doing is governed by an equation very similar to the SA equation above.

Now you can see that SA is a whole lot more complicated than hillclimbing. The simple addition of taking an occasional downward step has introduced these new aspects:

1. a more difficult program to write. But this is not a major problem, even for the amateur;
2. the need to define values for three variables before we can begin: Temperature, Temperature step, cycle length. This is an important consideration. These values cannot be calculated or intuitively guessed. They are going to vary from one application to another, from one cipher to another and indeed from one cryptanalyst to another, dependant on their scoring systems. The best values for these variables can only be found by trial and error.

So it’s the same old story: there’s no such thing as a free lunch. If you want a better cipher solver than hillclimbing, you have to put in some work to get SA on its feet. Is the improvement worth the effort?
For simple, mono-substitution ciphers there is little to be gained. But for more difficult ciphers, such as Transpositions and Polygraphic Substitutions, it is very much worthwhile. Solving times are cut from many hours to a matter of minutes.

There are a number of interesting and useful aspects of SA to discuss like the impact of different scoring systems and the nature of the ciphertext. But since this article is on the Churn algorithm, I am going to leave these for the moment.

There are some programs written in C and a set of log tetragram frequencies in a zipped package that you can download by clicking here.