Churn algorithm

Stochastic algorithms

2012-07-16T12:25:06+00:00

The next few pages give an outline of traditional stochastic algorithms, such as hillclimbing, preparatory to a full description of the Churn algorithm and an illustrative computer program. Those familiar with stochastic algorithms may want to jump directly to the Churn algorithm.

I will illustrate the use of these algorithms for solving Classical ciphers – by which I mean those used up to the 2nd World War. One or two of them were also used in the 1939-45 period, for example the Double Playfair cipher. You will find descriptions of all Classical ciphers at the web site of the American Cryptogram Association.

Stochastic algorithms are based on random processes. The simplest is a random search. When applied to cipher solving, it keeps cycling as below:
do forever:

-select a key at random,
-decipher,
-rate the ‘fitness’ of the plaintext in some way and display it if it is the best to date.

Hopefully the correct plaintext will eventually appear – much as a monkey with a typewriter might eventually produce a passage from Shakespeare. Neither is an efficient or practical approach to these tasks.

I need to explain how ‘fitness’ can be assessed.

A wrong key will produce poor plaintext like “qstxindujsolzltdsfwh”. The quality of this plaintext can be judged by looking at each four-letter group (tetragraph) and giving it a score dependant on its frequency in English text. Such frequencies can be determined by looking through a number of books. On this basis, ‘qstx’ gets a score of zero because it is never found. ‘stxi’ scores 3, ‘txin’ 2, 'xind' 4, 'indu' 11 and all the rest zero. The whole of this plaintext scores 20. In comparison, the first 20 letters of this current sentence score 218. You can see that tetragraph scoring clearly differentiates between the quality, or fitness, of these two plaintexts.

In actual fact, the frequencies used are log frequencies. Using the logs reduces the impact of very frequent tetragraphs, like the-, that, -ing, -ion and so on which otherwise have too great an effect on the score.

Hillclimbing

2012-07-16T12:25:51+00:00

A change to the random search procedure, first developed 50 years ago, produces a useful algorithm called hillclimbing, which can solve simple ciphers. In this method a random key is generated and its fitness is determined. Some change is made to the key and if that causes an improvement in fitness, the changed key replaces the original – otherwise the original is kept. The change cycle is repeated, until no more improvements in fitness are possible. Then a new random key is created. This algorithm can be summarised as:

1. make a key at random & determine fitness. Call it the ‘parent’;
2. make some change to the parent to produce a ‘child’ key,
and measure its fitness;
3. if the child is fitter than the parent, make the child
the new parent & dispense with the old;
4. go back to (2), unless no improvements in the last 1000
changes in which case go back to (1)

As an example, hillclimbing rapidly solves Cryptarithm puzzles such as the one below:

ABHORE + CHAINS = OIBANAC. ABHORE - CHAINS = OAOINO

where each of the letters stands for a different integer from zero to 9. The hillclimber makes a key at random and assesses the two differences:

difference 1 = (ABHORE + CHAINS) - OIBANAC

difference 2 = (ABHORE - CHAINS) - OAOINO

If the key is correct both differences will be zero. The sum of the absolute variations from zero is a measure of the 'goodness' of the key -- the smaller the variation the better is the key.

Changes are made to the key by swapping and a change is kept if it is for the better. Very quickly the hillclimber converges on the correct key. You can try this out in the example program here. The program is designed for equations of the type shown above, to any base.

The limiting feature of hillclimbing is that it can get stuck short of the solution at a position where no further improvements are possible. In the jargon, it reaches a local maximum. The situation is easy to see if you liken the hillclimbing algorithm to climbing Mount Everest. You start in the plain and must follow the hillclimbing rule to move upwards, choosing direction at random. Now you may be lucky and follow a route direct to the summit. But you also may find yourself climbing to the top of one of the foothills, where you can go no higher and will be marooned. That is the local maximum.

As ciphers become more difficult to solve, due to greater complexity, the hillclimber takes longer and longer to solve until it takes so long that its use becomes impractical. A better mousetrap is needed.

Simulated Annealing

2012-07-17T05:32:40+00:00

A way forward was found in the mid-1980’s in the form of the Simulated Annealing algorithm. This accepts a less-fit key every so often, equivalent to taking a downward step that leads away from being marooned on a local maximum. How to determine when to accept the worse key?

The SA algorithm follows three rules, which intuitively seem reasonable:

(1) only accept a worse key occasionally;
(2) greatly improve the chances of accepting if the child is close to
the parent;
(3) be less likely to accept as the parent improves, because the
parent is then more likely to be correct.

The way to implement these rules is somewhat arcane, and so takes a little effort to grasp.

When presented with a worse key, the SA algorithm does two things:

first, it calculates a value 'p' from the equation

p = e^(dS/T)

meaning p is made equal to e raised to the power (dS/T)

where T stands for Temperature, which at the beginning of SA is set high, let us say at 100;
dS is the negative quantity (child score - parent score)
e is the exponential constant 2.718.

If you do a few sums, you will find that p is always between zero and 1.

secondly, it gets a number at random between zero and 1. if p is greater than this number, the child passes and becomes the new parent.

Now you can see from the equation that the size of p gets greater as dS gets smaller – in other words as the child score gets closer to the parent. So the chance of accepting the worse child improves as the child gets closer to the parent –- which is what we want.

Also the exponential nature of the equation ensures that this improvement in acceptance is not linear, but mounts rapidly as the child gets closer to the parent (see the chart below) which again is what we want.

As implementation proceeds, the Temperature is dropped in steps. The impact of a smaller value for ‘T’ is to make p smaller, and thus to make less likely the acceptance of a worse child key. This is what we want to meet the 3rd consideration mentioned above.

So now I can put together the SA algorithm as follows:

Set Temperature = 100
Do this loop
make a key at random; decipher and score;
call this the parent;
set the counter to zero;
Do
make a change to the parent to create a child;
decipher with this child key and score;
calculate diff = (child score – parent score);
if diff is positive, make the child the parent;
if diff is negative
calculate p from equation;
get a random number from 0 to 1;
if p is greater than the random number make the child the parent;
else increment the counter;
while the counter is less than cycle length of 50,000
Reduce Temperature by 5.
If Temperature=0 then stop.

During the process the highest scoring keys are displayed, together with their plaintexts, and hopefully the solution will appear. Otherwise that annealing run has failed and the whole process must be repeated from scratch by running the program again.

This algorithm gets its name because it resembles the process of metal annealing in which the metal is heated high enough to rupture its crystal structure and then is cooled in a number of steps -- with the temperature of the whole mass being allowed to reach equilibrium at each step. The intention is to form uniform sized crystals, and the probability of so doing is governed by an equation very similar to the SA equation above.

Now you can see that SA is a whole lot more complicated than hillclimbing. The simple addition of taking an occasional downward step has introduced these new aspects:

1. a more difficult program to write. But this is not a major problem, even for the amateur;
2. the need to define values for three variables before we can begin: Temperature, Temperature step, cycle length. This is an important consideration. These values cannot be calculated or intuitively guessed. They are going to vary from one application to another, from one cipher to another and indeed from one cryptanalyst to another, dependant on their scoring systems. The best values for these variables can only be found by trial and error.

So it’s the same old story: there’s no such thing as a free lunch. If you want a better cipher solver than hillclimbing, you have to put in some work to get SA on its feet. Is the improvement worth the effort?
For simple, mono-substitution ciphers there is little to be gained. But for more difficult ciphers, such as Transpositions and Polygraphic Substitutions, it is very much worthwhile. Solving times are cut from many hours to a matter of minutes.

There are a number of interesting and useful aspects of SA to discuss like the impact of different scoring systems and the nature of the ciphertext. But since this article is on the Churn algorithm, I am going to leave these for the moment.

There are some programs written in C and a set of log tetragram frequencies in a zipped package that you can download by clicking here.

Genetic algorithms

2012-07-17T11:22:11+00:00

Another type of stochastic algorithm is the evolutionary or genetic algorithm. I have found it particularly useful for solving ciphers that have repeated characters in their key. The idea here is to make a large number of keys at random, perhaps 30,000 of them. This is called the ‘population’.

Then begins a continuous cycle of ‘reproduction’ with ‘survival of the fittest’. Two ‘parent’ keys are chosen at random from the population and from them is made a new key we call the ‘child’. This is done by taking the first ‘n’ characters from the 1st parent and the remainder from the 2nd parent – where ‘n’ is a random number, shorter than the key length. Scores from all three keys are compared – if the child is better than one of the parents, then it replaces that parent in the population.

As this cycle proceeds, the population gets fitter and fitter, the plaintexts get better and better until
either the solution appears;
or, the solution is not found, and the average score of the population approaches the best score seen to date. In this case the run has failed and must be repeated.

There are numerous modifications that can be made to the algorithm as described above, including a mutation step, but what I have described gives the essence of the genetic approach – ‘red in tooth and claw’ as it is.

I have found the genetic algorithm of use when there are numerous repeated letters, or integers, in the key. Example are Baconian, Myszkowski, Slidefair.

Churn algorithm

2012-07-17T11:31:34+00:00

I have described how Simulated Annealing is a considerably more effective algorithm than Hillclimbing, but is more complicated to program and – in particular – to setup. I have found a way to avoid these difficulties with a related algorithm that I call the Churn algorithm.

To develop Churn, I studied what happened in a typical SA run. In particular I wanted to find the probability that a worse key would be accepted (which is the downward step). This probability is related to how much worse the key is, as is shown in the chart below drawn from the data I collected.

To mimic this situation in an algorithm, I created a series of 100 numbers from the chart to represent probabilities of acceptance.
This series is just the worse scores from zero to 60, but with frequencies in the series to enable representation of probabilities of their acceptance.

1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,9,9,9,9,10,10,10,10,10,11,11,11,11,12,
12,12,12,13,13,14,14,15,15,16,16,16,16,17,17,18,18,19,19,20,21,21,22,23,23,24,25,26,27,28,29,30,31,33,34,37,39,41,43,48,57,60

For example, say a particular key has a score worse by 2. To decide whether to accept this key, the algorithm selects at random one of the 100 numbers from the series. If this chosen number is greater than 2, then the key is accepted. You can see from the series that 86 of the numbers are greater than 2, so the probability that this worse key will be accepted is 86/100, or 86% as the chart requires. Similarly if a key has a worse score by say 42, the probability it will be accepted is just 4%.

To implement the Churn algorithm is now very simple. First in your hillclimber add an array called value[] that contains the 100 integers above.

In your hillclimber, after a change is made to the key, you will currently have a line like:

if(score of changed key > original key)
accept the changed key;

As a second change alter the coding to this:

x=random(100);
if(score of changed key > original key – value[x] )
accept the changed key;

and you have changed your hillclimbing algorithm to Churn.

You will get an immediate boost in solving performance. This will not be noticeable for quick solving ciphers, like simple substitution, but will be very evident in more difficult Transposition and Polygraphic Substitution ciphers such as the three below:

(a) Incomplete Columnar (25 columns)
SECER ISEAS DTLAE HERHL JTSSH FWRTF LHASR EIPMM OMLAS ETNSO HEDSI EDOHO RENHD LAENT STTPO REAME KHRTD ATCVN ATHLP TTPAI WOAPI NEOEE SHITI ITBAT NFEST ANTES TCRFI OINAO ADNTU LEOUS SIFRO NDIOW BSIES TEUEO AHSSF GEESS GAHTH HOTAO BIBHW TEAIE SHTHA TEDIU NGIRE SOESU ITUTE BDHTS ORLFN ABTLE SAUYM SOORA

-Churn: solves in average of 0.5 million keys
-Hillclimb: solves in average of 11 million keys

(b) Cadenus
MHDNE ARNTU NARBO OYBHN NAIAO TYTTI NEDIT HBLIE NREEO WTDFE RIEAT NSAIB HENOC NNALE TEOEF IEATA NRGNC ETATU GUMBH ELDCW SIIND EONRS EPHAT IUOHE UDEPE MTLIA TAWEN DDBNE BIALG SISDR SAORL IPEOL TETAA EEDLH YDIOA MYEEB ADHEH INNTH AFHNL HITTM NWINR GSAOV NREAN HAKCO GEEHD ONENM RATBD RTLIE ROGTS SEICE

-Churn: solves every time, average 50,000 keys (less than half
a second on my computer).
-Genetic: solves every time, 5 million keys.
-Hillclimbing: in 10 runs of 10 million keys each, fails 6 times.

(c) difficult Playfair:

AK ZB BQ GR AK IO IW UY CD ZN EN NL UG NP OV DC RZ UC KW OI PL GU IU NE TO AU BL QW UG DC EW ZB OE KY WZ IK ZF GM BN GM DE DV FK YW KE ZV

-Churn: solves every time, in about 50 million keys
(2 minutes on my computer);
-Hillclimbing: no solution at 4,000 million keys (2.5 hours).

Churn programs for solving these ciphers are given in the 'Churn package' in the download section of this website. These programs illustrate different ways of implementing the Churn algorithm, as is explained in the ReadMe file included in the package. Also in the package is this article as a MS Word document. Log Tetragram frequency tables are also in the download section.

Limitations of the Churn algorithm.

The algorithm depends on the scoring system correctly distinguishing whether the plaintext from one key is better than plaintext from another. This is done, as explained earlier, by assessing the log tetragraph frequency of plaintexts, using a table of frequencies derived from normal English texts.

This system will fail if the plaintext of the cipher is not like normal English. For example the following plaintext was the solution to a recent cipher:

"Junk jewelry jingles, jolts jerky jackanapes. Jealous jades jot jabberwocky, jerboa jumps jaguarundi."

This is full of unusual tetragraphs, and therefore has a relatively low score of just 432. In contrast, a piece of garbled plaintext that makes no sense will score higher. For example, during the Churn process the garbled text below is formed, and this scores 776 -- much more than the solution:

"honyhepermchundrethirsthemychabyanafetheariothagethishallempibychemliahowfthadoamongu."

Consequently Churn (or any other algorithm) meeting a cipher based on the unusual plaintext will favour other plaintexts which are garbled and incorrect but still have higher scoring tetragraphs than the piece of unusual text.

As a result Churn will be led away from the true ciphertext and produce nothing but garbled plaintexts. Use of other n-graphs, such as digraphs or trigraphs, will suffer from the same problem and will also fail to solve.

A way round this problem is to score with words. Although slower, it holds out a high chance of finding the solution when plaintexts are unusual. For example, the correct plaintext above scores 629 while the garbled one scores just 165.

Scoring with words is best carried out by initially putting all the words from a dictionary into a Trie structure. Then the scoring algorithm looks in the Trie to see whether successive plaintext letters form a word. If they do, the score is incremented by the square of the word length –- and the search process continues from the letter following the discovered word.

Again there is a limit to the success of this method. It will not work unless most words in the plaintext are in the Trie. Plaintexts with unusual words will thus demand a very big dictionary, which can demand more memory than the computer possesses or alternatively can excessively slow down the scoring process. However the scoring method usually succeeds.

A second limitation of Churn, or indeed any other stochastic method, is cipher length. As ciphers get shorter so the amount of information available for scoring gets less, and the effectiveness of scoring diminishes. The result is that ciphers take longer to solve and eventually will not solve at all. This limiting aspect is reached at about 70 to 80 letters, depending on the cipher.

Adjusting for different scoring systems.

For ciphers up to 200 letters long, the best scoring system is based on tetragraphs. For very long ciphers, like the 526-letter Playfair challenge cipher of Simon Singh's 'The Code Book', digraphs provide a faster result. You will find a table of log digraph frequencies in the Downloads section.

The overwhelming number of ciphers I come across are 80 to 120 letters long, and here the best scoring system is provided by log tetragraphs. You will find a table of log tetragraph frequencies, made from 12 books by my friend Hank Gibson in the downloads section of this website, in binary format and also as a text file.

If you have made your own tetragraph table, it is likely that your frequencies differ from mine because of different lengths of texts. Consequently your score for a given plaintext is likely to be different to mine. To use my Churn array of value[] you will need to adjust your scores to be in line with mine. This is simple to do. Score a piece of plaintext with your log tetragraphs, and then work out the average tetragraph score. My average is 10. If yours is 8, then your factor is 1.25; multiply all your scores by this factor whenever you use the Churn algorithm.

Adjusting for ciphers of different lengths.

The 100 integers in the array called value[], that I introduced you to earlier, work well with ciphers with 100 to 120 letters. If you have a much longer cipher to solve, the scores of that cipher will be higher because of the extra length. It is then necessary to bring them back to what they would have been for a cipher of length 110 letters. That is simple to do. If the cipher is 220 letters long, then multiply all scores by a factor of 110/220 = 0.5

If your cipher is less than 100 letter, then scores will need increasing. So for a cipher of length 85 letters, increase the score by a factor of 110/85 = 1.3

Factors affecting solving speed.

The longer the cipher, the faster it will solve because there is more information available to score each key. With this additional information the score is more meaningful and thus decisions on whether to accept or reject a changed key are more likely to be correct.

The dependence of solving time with length is exponential, so as ciphers get shorter they get a lot more difficult to solve.

The other factor that influences solving time is how closely the plaintext resembles normal English text. The nearer the text, the faster will a cipher solve. The relationship is again exponential, with solving time extending markedly as plaintexts move further away from everyday English words.

To give you an idea of the size of this effect, a cipher in normal English of length 130 will solve in 1/50th of the time of a cipher of length 86 when using the Churn algorithm. This effect is normal whatever method is used to solve and, of course, is why the Military put limits on the length of cipher messages.