Churn algorithm http://www.cryptoden.com/index.php/algorithms/churn-algorithm 2015-07-03T08:40:47+00:00 Joomla! - Open Source Content Management Stochastic algorithms 2012-07-16T12:25:06+00:00 2012-07-16T12:25:06+00:00 http://www.cryptoden.com/index.php/algorithms/churn-algorithm/16-stochastic-algorithms Super User mikejcowan@me.com <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">The next few pages give an outline of traditional stochastic algorithms, such as hillclimbing, preparatory to a full description of the Churn algorithm and an illustrative computer program. Those familiar with stochastic algorithms may want to jump directly to the Churn algorithm.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">I will illustrate the use of these algorithms for solving Classical ciphers – by which I mean those used up to the 2nd World War. One or two of them were also used in the 1939-45 period, for example the Double Playfair cipher. You will find descriptions of all Classical ciphers at the web site of the <a href="http://www.cryptogram.org/cipher_types.html">American Cryptogram Association. </a></span></p> <p><span style="font-family: verdana, geneva; font-size: small;">Stochastic algorithms are based on random processes. The simplest is a random search. When applied to cipher solving, it keeps cycling as below:</span><br /><span style="font-family: verdana, geneva; font-size: small;">do forever:</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">-select a key at random,</span><br /><span style="font-family: verdana, geneva; font-size: small;">-decipher,</span><br /><span style="font-family: verdana, geneva; font-size: small;">-rate the ‘fitness’ of the plaintext in some way and display it if it is the best to date.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Hopefully the correct plaintext will eventually appear – much as a monkey with a typewriter might eventually produce a passage from Shakespeare. Neither is an efficient or practical approach to these tasks.</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">I need to explain how ‘fitness’ can be assessed.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">A wrong key will produce poor plaintext like “qstxindujsolzltdsfwh”. The quality of this plaintext can be judged by looking at each four-letter group (tetragraph) and giving it a score dependant on its frequency in English text. Such frequencies can be determined by looking through a number of books. On this basis, ‘qstx’ gets a score of zero because it is never found. ‘stxi’ scores 3, ‘txin’ 2, 'xind' 4, 'indu' 11 and all the rest zero. The whole of this plaintext scores 20. In comparison, the first 20 letters of this current sentence score 218. You can see that tetragraph scoring clearly differentiates between the quality, or fitness, of these two plaintexts.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">In actual fact, the frequencies used are log frequencies. Using the logs reduces the impact of very frequent tetragraphs, like the-, that, -ing, -ion and so on which otherwise have too great an effect on the score.</span></p></div> <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">The next few pages give an outline of traditional stochastic algorithms, such as hillclimbing, preparatory to a full description of the Churn algorithm and an illustrative computer program. Those familiar with stochastic algorithms may want to jump directly to the Churn algorithm.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">I will illustrate the use of these algorithms for solving Classical ciphers – by which I mean those used up to the 2nd World War. One or two of them were also used in the 1939-45 period, for example the Double Playfair cipher. You will find descriptions of all Classical ciphers at the web site of the <a href="http://www.cryptogram.org/cipher_types.html">American Cryptogram Association. </a></span></p> <p><span style="font-family: verdana, geneva; font-size: small;">Stochastic algorithms are based on random processes. The simplest is a random search. When applied to cipher solving, it keeps cycling as below:</span><br /><span style="font-family: verdana, geneva; font-size: small;">do forever:</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">-select a key at random,</span><br /><span style="font-family: verdana, geneva; font-size: small;">-decipher,</span><br /><span style="font-family: verdana, geneva; font-size: small;">-rate the ‘fitness’ of the plaintext in some way and display it if it is the best to date.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Hopefully the correct plaintext will eventually appear – much as a monkey with a typewriter might eventually produce a passage from Shakespeare. Neither is an efficient or practical approach to these tasks.</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">I need to explain how ‘fitness’ can be assessed.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">A wrong key will produce poor plaintext like “qstxindujsolzltdsfwh”. The quality of this plaintext can be judged by looking at each four-letter group (tetragraph) and giving it a score dependant on its frequency in English text. Such frequencies can be determined by looking through a number of books. On this basis, ‘qstx’ gets a score of zero because it is never found. ‘stxi’ scores 3, ‘txin’ 2, 'xind' 4, 'indu' 11 and all the rest zero. The whole of this plaintext scores 20. In comparison, the first 20 letters of this current sentence score 218. You can see that tetragraph scoring clearly differentiates between the quality, or fitness, of these two plaintexts.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">In actual fact, the frequencies used are log frequencies. Using the logs reduces the impact of very frequent tetragraphs, like the-, that, -ing, -ion and so on which otherwise have too great an effect on the score.</span></p></div> Hillclimbing 2012-07-16T12:25:51+00:00 2012-07-16T12:25:51+00:00 http://www.cryptoden.com/index.php/algorithms/churn-algorithm/17-hillclimbing Super User mikejcowan@me.com <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">A change to the random search procedure, first developed 50 years ago, produces a useful algorithm called hillclimbing, which can solve simple ciphers. In this method a random key is generated and its fitness is determined. Some change is made to the key and if that causes an improvement in fitness, the changed key replaces the original – otherwise the original is kept. The change cycle is repeated, until no more improvements in fitness are possible. Then a new random key is created. This algorithm can be summarised as:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">1. make a key at random &amp; determine fitness. Call it the ‘parent’;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> 2. make some change to the parent to produce a ‘child’ key,</span><br /><span style="font-family: verdana, geneva; font-size: small;"> and measure its fitness;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> 3. if the child is fitter than the parent, make the child</span><br /><span style="font-family: verdana, geneva; font-size: small;"> the new parent &amp; dispense with the old;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> 4. go back to (2), unless no improvements in the last 1000</span><br /><span style="font-family: verdana, geneva; font-size: small;"> changes in which case go back to (1)</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">As an example, hillclimbing rapidly solves Cryptarithm puzzles such as the one below:</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">ABHORE + CHAINS = OIBANAC.       ABHORE - CHAINS = OAOINO </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">where each of the letters stands for a different integer from zero to 9. The hillclimber makes a key at random and assesses the two differences:</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">difference 1 = (ABHORE + CHAINS) - </span><span style="font-family: verdana, geneva; font-size: small;">OIBANAC  </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">difference 2 = (ABHORE - CHAINS) - OAOINO </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">If the key is correct both differences will be zero. The sum of the absolute variations from zero is a measure of the 'goodness' of the key -- the smaller the variation the better is the key.</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">Changes are made to the key by swapping and a change is kept if it is for the better. Very quickly the hillclimber converges on the correct key. You can try this out in the <a href="equations/WW.html">example program here</a>. The program is designed for equations of the type shown above, to any base. </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">The limiting feature of hillclimbing is that it can get stuck short of the solution at a position where no further improvements are possible. In the jargon, it reaches a local maximum. The situation is easy to see if you liken the hillclimbing algorithm to climbing Mount Everest. You start in the plain and must follow the hillclimbing rule to move upwards, choosing direction at random. Now you may be lucky and follow a route direct to the summit. But you also may find yourself climbing to the top of one of the foothills, where you can go no higher and will be marooned. That is the local maximum.</span></p> <p><br /><span style="font-family: verdana, geneva; font-size: small;">As ciphers become more difficult to solve, due to greater complexity, the hillclimber takes longer and longer to solve until it takes so long that its use becomes impractical. A better mousetrap is needed.</span></p></div> <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">A change to the random search procedure, first developed 50 years ago, produces a useful algorithm called hillclimbing, which can solve simple ciphers. In this method a random key is generated and its fitness is determined. Some change is made to the key and if that causes an improvement in fitness, the changed key replaces the original – otherwise the original is kept. The change cycle is repeated, until no more improvements in fitness are possible. Then a new random key is created. This algorithm can be summarised as:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">1. make a key at random &amp; determine fitness. Call it the ‘parent’;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> 2. make some change to the parent to produce a ‘child’ key,</span><br /><span style="font-family: verdana, geneva; font-size: small;"> and measure its fitness;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> 3. if the child is fitter than the parent, make the child</span><br /><span style="font-family: verdana, geneva; font-size: small;"> the new parent &amp; dispense with the old;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> 4. go back to (2), unless no improvements in the last 1000</span><br /><span style="font-family: verdana, geneva; font-size: small;"> changes in which case go back to (1)</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">As an example, hillclimbing rapidly solves Cryptarithm puzzles such as the one below:</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">ABHORE + CHAINS = OIBANAC.       ABHORE - CHAINS = OAOINO </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">where each of the letters stands for a different integer from zero to 9. The hillclimber makes a key at random and assesses the two differences:</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">difference 1 = (ABHORE + CHAINS) - </span><span style="font-family: verdana, geneva; font-size: small;">OIBANAC  </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">difference 2 = (ABHORE - CHAINS) - OAOINO </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">If the key is correct both differences will be zero. The sum of the absolute variations from zero is a measure of the 'goodness' of the key -- the smaller the variation the better is the key.</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">Changes are made to the key by swapping and a change is kept if it is for the better. Very quickly the hillclimber converges on the correct key. You can try this out in the <a href="equations/WW.html">example program here</a>. The program is designed for equations of the type shown above, to any base. </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">The limiting feature of hillclimbing is that it can get stuck short of the solution at a position where no further improvements are possible. In the jargon, it reaches a local maximum. The situation is easy to see if you liken the hillclimbing algorithm to climbing Mount Everest. You start in the plain and must follow the hillclimbing rule to move upwards, choosing direction at random. Now you may be lucky and follow a route direct to the summit. But you also may find yourself climbing to the top of one of the foothills, where you can go no higher and will be marooned. That is the local maximum.</span></p> <p><br /><span style="font-family: verdana, geneva; font-size: small;">As ciphers become more difficult to solve, due to greater complexity, the hillclimber takes longer and longer to solve until it takes so long that its use becomes impractical. A better mousetrap is needed.</span></p></div> Simulated Annealing 2012-07-17T05:32:40+00:00 2012-07-17T05:32:40+00:00 http://www.cryptoden.com/index.php/algorithms/churn-algorithm/18-simulated-annealing Super User mikejcowan@me.com <div class="feed-description"><p style="text-align: left;"><span style="font-family: verdana, geneva; font-size: small;">A way forward was found in the mid-1980’s in the form of the Simulated Annealing algorithm. This accepts a less-fit key every so often, equivalent to taking a downward step that leads away from being marooned on a local maximum. How to determine when to accept the worse key?</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The SA algorithm follows three rules, which intuitively seem reasonable:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(1) only accept a worse key occasionally;</span><br /><span style="font-family: verdana, geneva; font-size: small;">(2) greatly improve the chances of accepting if the child is close to</span><br /><span style="font-family: verdana, geneva; font-size: small;"> the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">(3) be less likely to accept as the parent improves, because the</span><br /><span style="font-family: verdana, geneva; font-size: small;"> parent is then more likely to be correct.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The way to implement these rules is somewhat arcane, and so takes a little effort to grasp.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">When presented with a worse key, the SA algorithm does two things:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">first, it calculates a value 'p' from the equation</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">                               p = e<sup>(dS/T)</sup></span></p> <p style="text-align: left;"><br /><span style="font-family: verdana, geneva; font-size: small;">meaning p is made equal to e raised to the power (dS/T)</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">    where T stands for Temperature, which at the beginning of SA is set high, let us say at 100;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    dS is the negative quantity (child score - parent score) </span><br /><span style="font-family: verdana, geneva; font-size: small;">    e  is the exponential constant 2.718.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">If you do a few sums, you will find that p is always between zero and 1.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">secondly, it gets a number at random between zero and 1. if p is greater than this number, the child passes and becomes the new parent.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Now you can see from the equation that the size of p gets greater as dS gets smaller – in other words as the child score gets closer to the parent. So the chance of accepting the worse child improves as the child gets closer to the parent –- which is what we want.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Also the exponential nature of the equation ensures that this improvement in acceptance is not linear, but mounts rapidly as the child gets closer to the parent (see the chart below) which again is what we want.</span></p> <p style="text-align: center;"><span style="font-family: verdana, geneva; font-size: small;"><img src="images/churn1.gif" border="0" alt="" /><br /></span></p> <p style="text-align: left;"><br /><span style="font-family: verdana, geneva; font-size: small;">As implementation proceeds, the Temperature is dropped in steps. The impact of a smaller value for ‘T’ is to make p smaller, and thus to make less likely the acceptance of a worse child key. This is what we want to meet the 3rd consideration mentioned above.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">So now I can put together the SA algorithm as follows:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Set Temperature = 100</span><br /><span style="font-family: verdana, geneva; font-size: small;">Do this loop</span><br /><span style="font-family: verdana, geneva; font-size: small;">    make a key at random; decipher and score;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    call this the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    set the counter to zero;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    Do</span><br /><span style="font-family: verdana, geneva; font-size: small;">        make a change to the parent to create a child;</span><br /><span style="font-family: verdana, geneva; font-size: small;">        decipher with this child key and score;</span><br /><span style="font-family: verdana, geneva; font-size: small;">        calculate diff = (child score – parent score);</span><br /><span style="font-family: verdana, geneva; font-size: small;">        if diff is positive, make the child the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">        if diff is negative</span><br /><span style="font-family: verdana, geneva; font-size: small;">            calculate p from equation;</span><br /><span style="font-family: verdana, geneva; font-size: small;">            get a random number from 0 to 1;</span><br /><span style="font-family: verdana, geneva; font-size: small;">            if p is greater than the random number make</span><span> </span><span style="font-family: verdana, geneva; font-size: small;">the child the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">            else increment the counter;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    while the counter is less than cycle length of 50,000</span><br /><span style="font-family: verdana, geneva; font-size: small;">    Reduce Temperature by 5.</span><br /><span style="font-family: verdana, geneva; font-size: small;">    If Temperature=0 then stop.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">During the process the highest scoring keys are displayed, together with their plaintexts, and hopefully the solution will appear. Otherwise that annealing run has failed and the whole process must be repeated from scratch by running the program again.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">This algorithm gets its name because it resembles the process of metal annealing in which the metal is heated high enough to rupture its crystal structure and then is cooled in a number of steps -- with the temperature of the whole mass being allowed to reach equilibrium at each step. The intention is to form uniform sized crystals, and the probability of so doing is governed by an equation very similar to the SA equation above.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Now you can see that SA is a whole lot more complicated than hillclimbing. The simple addition of taking an occasional downward step has introduced these new aspects:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">1. a more difficult program to write. But this is not a major problem, even for the amateur;</span><br /><span style="font-family: verdana, geneva; font-size: small;">2. the need to define values for three variables before we can begin: Temperature, Temperature step, cycle length. This is an important consideration. These values cannot be calculated or intuitively guessed. They are going to vary from one application to another, from one cipher to another and indeed from one cryptanalyst to another, dependant on their scoring systems. The best values for these variables can only be found by trial and error.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">So it’s the same old story: there’s no such thing as a free lunch. If you want a better cipher solver than hillclimbing, you have to put in some work to get SA on its feet. Is the improvement worth the effort?</span><br /><span style="font-family: verdana, geneva; font-size: small;">For simple, mono-substitution ciphers there is little to be gained. But for more difficult ciphers, such as Transpositions and Polygraphic Substitutions, it is very much worthwhile. Solving times are cut from many hours to a matter of minutes.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">There are a number of interesting and useful aspects of SA to discuss like the impact of different scoring systems and the nature of the ciphertext. But since this article is on the Churn algorithm, I am going to leave these for the moment.</span></p> <p style="text-align: left;"><span style="font-family: verdana, geneva; font-size: small;">There are some programs written in C and a set of log tetragram frequencies in a <a href="articles/SA%20Package.zip">zipped package that you can download by clicking here.</a></span></p></div> <div class="feed-description"><p style="text-align: left;"><span style="font-family: verdana, geneva; font-size: small;">A way forward was found in the mid-1980’s in the form of the Simulated Annealing algorithm. This accepts a less-fit key every so often, equivalent to taking a downward step that leads away from being marooned on a local maximum. How to determine when to accept the worse key?</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The SA algorithm follows three rules, which intuitively seem reasonable:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(1) only accept a worse key occasionally;</span><br /><span style="font-family: verdana, geneva; font-size: small;">(2) greatly improve the chances of accepting if the child is close to</span><br /><span style="font-family: verdana, geneva; font-size: small;"> the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">(3) be less likely to accept as the parent improves, because the</span><br /><span style="font-family: verdana, geneva; font-size: small;"> parent is then more likely to be correct.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The way to implement these rules is somewhat arcane, and so takes a little effort to grasp.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">When presented with a worse key, the SA algorithm does two things:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">first, it calculates a value 'p' from the equation</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">                               p = e<sup>(dS/T)</sup></span></p> <p style="text-align: left;"><br /><span style="font-family: verdana, geneva; font-size: small;">meaning p is made equal to e raised to the power (dS/T)</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">    where T stands for Temperature, which at the beginning of SA is set high, let us say at 100;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    dS is the negative quantity (child score - parent score) </span><br /><span style="font-family: verdana, geneva; font-size: small;">    e  is the exponential constant 2.718.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">If you do a few sums, you will find that p is always between zero and 1.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">secondly, it gets a number at random between zero and 1. if p is greater than this number, the child passes and becomes the new parent.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Now you can see from the equation that the size of p gets greater as dS gets smaller – in other words as the child score gets closer to the parent. So the chance of accepting the worse child improves as the child gets closer to the parent –- which is what we want.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Also the exponential nature of the equation ensures that this improvement in acceptance is not linear, but mounts rapidly as the child gets closer to the parent (see the chart below) which again is what we want.</span></p> <p style="text-align: center;"><span style="font-family: verdana, geneva; font-size: small;"><img src="images/churn1.gif" border="0" alt="" /><br /></span></p> <p style="text-align: left;"><br /><span style="font-family: verdana, geneva; font-size: small;">As implementation proceeds, the Temperature is dropped in steps. The impact of a smaller value for ‘T’ is to make p smaller, and thus to make less likely the acceptance of a worse child key. This is what we want to meet the 3rd consideration mentioned above.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">So now I can put together the SA algorithm as follows:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Set Temperature = 100</span><br /><span style="font-family: verdana, geneva; font-size: small;">Do this loop</span><br /><span style="font-family: verdana, geneva; font-size: small;">    make a key at random; decipher and score;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    call this the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    set the counter to zero;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    Do</span><br /><span style="font-family: verdana, geneva; font-size: small;">        make a change to the parent to create a child;</span><br /><span style="font-family: verdana, geneva; font-size: small;">        decipher with this child key and score;</span><br /><span style="font-family: verdana, geneva; font-size: small;">        calculate diff = (child score – parent score);</span><br /><span style="font-family: verdana, geneva; font-size: small;">        if diff is positive, make the child the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">        if diff is negative</span><br /><span style="font-family: verdana, geneva; font-size: small;">            calculate p from equation;</span><br /><span style="font-family: verdana, geneva; font-size: small;">            get a random number from 0 to 1;</span><br /><span style="font-family: verdana, geneva; font-size: small;">            if p is greater than the random number make</span><span> </span><span style="font-family: verdana, geneva; font-size: small;">the child the parent;</span><br /><span style="font-family: verdana, geneva; font-size: small;">            else increment the counter;</span><br /><span style="font-family: verdana, geneva; font-size: small;">    while the counter is less than cycle length of 50,000</span><br /><span style="font-family: verdana, geneva; font-size: small;">    Reduce Temperature by 5.</span><br /><span style="font-family: verdana, geneva; font-size: small;">    If Temperature=0 then stop.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">During the process the highest scoring keys are displayed, together with their plaintexts, and hopefully the solution will appear. Otherwise that annealing run has failed and the whole process must be repeated from scratch by running the program again.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">This algorithm gets its name because it resembles the process of metal annealing in which the metal is heated high enough to rupture its crystal structure and then is cooled in a number of steps -- with the temperature of the whole mass being allowed to reach equilibrium at each step. The intention is to form uniform sized crystals, and the probability of so doing is governed by an equation very similar to the SA equation above.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Now you can see that SA is a whole lot more complicated than hillclimbing. The simple addition of taking an occasional downward step has introduced these new aspects:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">1. a more difficult program to write. But this is not a major problem, even for the amateur;</span><br /><span style="font-family: verdana, geneva; font-size: small;">2. the need to define values for three variables before we can begin: Temperature, Temperature step, cycle length. This is an important consideration. These values cannot be calculated or intuitively guessed. They are going to vary from one application to another, from one cipher to another and indeed from one cryptanalyst to another, dependant on their scoring systems. The best values for these variables can only be found by trial and error.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">So it’s the same old story: there’s no such thing as a free lunch. If you want a better cipher solver than hillclimbing, you have to put in some work to get SA on its feet. Is the improvement worth the effort?</span><br /><span style="font-family: verdana, geneva; font-size: small;">For simple, mono-substitution ciphers there is little to be gained. But for more difficult ciphers, such as Transpositions and Polygraphic Substitutions, it is very much worthwhile. Solving times are cut from many hours to a matter of minutes.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">There are a number of interesting and useful aspects of SA to discuss like the impact of different scoring systems and the nature of the ciphertext. But since this article is on the Churn algorithm, I am going to leave these for the moment.</span></p> <p style="text-align: left;"><span style="font-family: verdana, geneva; font-size: small;">There are some programs written in C and a set of log tetragram frequencies in a <a href="articles/SA%20Package.zip">zipped package that you can download by clicking here.</a></span></p></div> Genetic algorithms 2012-07-17T11:22:11+00:00 2012-07-17T11:22:11+00:00 http://www.cryptoden.com/index.php/algorithms/churn-algorithm/19-genetic-algorithms Super User mikejcowan@me.com <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">Another type of stochastic algorithm is the evolutionary or genetic algorithm. I have found it particularly useful for solving ciphers that have repeated characters in their key. The idea here is to make a large number of keys at random, perhaps 30,000 of them. This is called the ‘population’.</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">Then begins a continuous cycle of ‘reproduction’ with ‘survival of the fittest’. Two ‘parent’ keys are chosen at random from the population and from them is made a new key we call the ‘child’. This is done by taking the first ‘n’ characters from the 1st parent and the remainder from the 2nd parent – where ‘n’ is a random number, shorter than the key length. Scores from all three keys are compared – if the child is better than one of the parents, then it replaces that parent in the population.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">As this cycle proceeds, the population gets fitter and fitter, the plaintexts get better and better until</span><br /><span style="font-family: verdana, geneva; font-size: small;"> either the solution appears;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> or, the solution is not found, and the average score of the population approaches the best score seen to date. In this case the run has failed and must be repeated.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">There are numerous modifications that can be made to the algorithm as described above, including a mutation step, but what I have described gives the essence of the genetic approach – ‘red in tooth and claw’ as it is.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">I have found the genetic algorithm of use when there are numerous repeated letters, or integers, in the key. Example are Baconian, Myszkowski, Slidefair.</span></p></div> <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">Another type of stochastic algorithm is the evolutionary or genetic algorithm. I have found it particularly useful for solving ciphers that have repeated characters in their key. The idea here is to make a large number of keys at random, perhaps 30,000 of them. This is called the ‘population’.</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">Then begins a continuous cycle of ‘reproduction’ with ‘survival of the fittest’. Two ‘parent’ keys are chosen at random from the population and from them is made a new key we call the ‘child’. This is done by taking the first ‘n’ characters from the 1st parent and the remainder from the 2nd parent – where ‘n’ is a random number, shorter than the key length. Scores from all three keys are compared – if the child is better than one of the parents, then it replaces that parent in the population.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">As this cycle proceeds, the population gets fitter and fitter, the plaintexts get better and better until</span><br /><span style="font-family: verdana, geneva; font-size: small;"> either the solution appears;</span><br /><span style="font-family: verdana, geneva; font-size: small;"> or, the solution is not found, and the average score of the population approaches the best score seen to date. In this case the run has failed and must be repeated.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">There are numerous modifications that can be made to the algorithm as described above, including a mutation step, but what I have described gives the essence of the genetic approach – ‘red in tooth and claw’ as it is.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">I have found the genetic algorithm of use when there are numerous repeated letters, or integers, in the key. Example are Baconian, Myszkowski, Slidefair.</span></p></div> Churn algorithm 2012-07-17T11:31:34+00:00 2012-07-17T11:31:34+00:00 http://www.cryptoden.com/index.php/algorithms/churn-algorithm/20-churn-algorithm Super User mikejcowan@me.com <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">I have described how Simulated Annealing is a considerably more effective algorithm than Hillclimbing, but is more complicated to program and – in particular – to setup. I have found a way to avoid these difficulties with a related algorithm that I call the Churn algorithm.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">To develop Churn, I studied what happened in a typical SA run. In particular I wanted to find the probability that a worse key would be accepted (which is the downward step). This probability is related to how much worse the key is, as is shown in the chart below drawn from the data I collected.</span></p> <p style="text-align: left;"><img src="images/churn2.gif" border="0" alt="" style="display: block; margin-left: auto; margin-right: auto;" /><br /> <br /> <span style="font-family: verdana, geneva; font-size: small;">To mimic this situation in an algorithm, I created a series of 100 numbers from the chart to represent probabilities of acceptance.</span><br /><span style="font-family: verdana, geneva; font-size: small;">This series is just the worse scores from zero to 60, but with frequencies in the series to enable representation of probabilities of their acceptance.</span></p> <p><span style="font-family: 'courier new', courier; font-size: x-small;">1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,9,9,9,9,10,10,10,10,10,11,11,11,11,12,</span><br /><span style="font-family: 'courier new', courier; font-size: x-small;">12,12,12,13,13,14,14,15,15,16,16,16,16,17,17,18,18,19,19,20,21,21,22,23,23,24,25,26,27,28,29,30,31,33,34,37,39,41,43,48,57,60</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">For example, say a particular key has a score worse by 2. To decide whether to accept this key, the algorithm selects at random one of the 100 numbers from the series. If this chosen number is greater than 2, then the key is accepted. You can see from the series that 86 of the numbers are greater than 2, so the probability that this worse key will be accepted is 86/100, or 86% as the chart requires. Similarly if a key has a worse score by say 42, the probability it will be accepted is just 4%.</span></p> <p><br /><span style="font-family: verdana, geneva; font-size: small;">To implement the Churn algorithm is now very simple. First in your hillclimber add an array called value[] that contains the 100 integers above. </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">In your hillclimber, after a change is made to the key, you will currently have a line like:</span><br /> <br /><span style="font-family: 'courier new', courier; font-size: small;">if(score of changed key &gt; original key)</span><br /><span style="font-family: 'courier new', courier; font-size: small;">    accept the changed key;</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">As a second change alter the coding to this:</span><br /> <br /><span style="font-family: 'courier new', courier; font-size: small;">x=random(100);</span><br /><span style="font-family: 'courier new', courier; font-size: small;">if(score of changed key &gt; original key – value[x] )</span><br /><span style="font-family: 'courier new', courier; font-size: small;">    accept the changed key;</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">and you have changed your hillclimbing algorithm to Churn.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">You will get an immediate boost in solving performance. This will not be noticeable for quick solving ciphers, like simple substitution, but will be very evident in more difficult Transposition and Polygraphic Substitution ciphers such as the three below:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(a) Incomplete Columnar (25 columns)</span><br /><span style="font-family: 'courier new', courier; font-size: small;">SECER ISEAS DTLAE HERHL JTSSH FWRTF LHASR EIPMM OMLAS ETNSO HEDSI EDOHO RENHD LAENT STTPO REAME KHRTD ATCVN ATHLP TTPAI WOAPI NEOEE SHITI ITBAT NFEST ANTES TCRFI OINAO ADNTU LEOUS SIFRO NDIOW BSIES TEUEO AHSSF GEESS GAHTH HOTAO BIBHW TEAIE SHTHA TEDIU NGIRE SOESU ITUTE BDHTS ORLFN ABTLE SAUYM SOORA</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">-Churn: solves in average of 0.5 million keys</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Hillclimb: solves in average of 11 million keys</span><br /> <br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(b) Cadenus</span><br /><span style="font-family: 'courier new', courier; font-size: small;">MHDNE ARNTU NARBO OYBHN NAIAO TYTTI NEDIT HBLIE NREEO WTDFE RIEAT NSAIB HENOC NNALE TEOEF IEATA NRGNC ETATU GUMBH ELDCW SIIND EONRS EPHAT IUOHE UDEPE MTLIA TAWEN DDBNE BIALG SISDR SAORL IPEOL TETAA EEDLH YDIOA MYEEB ADHEH INNTH AFHNL HITTM NWINR GSAOV NREAN HAKCO GEEHD ONENM RATBD RTLIE ROGTS SEICE</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">-Churn: solves every time, average 50,000 keys (less than half</span><br /><span style="font-family: verdana, geneva; font-size: small;"> a second on my computer).</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Genetic: solves every time, 5 million keys.</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Hillclimbing: in 10 runs of 10 million keys each, fails 6 times.</span><br /> <br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(c) difficult Playfair:</span><br /> <br /><span style="font-family: 'courier new', courier; font-size: small;">AK ZB BQ GR AK IO IW UY CD ZN EN NL UG NP OV DC RZ UC KW OI PL GU IU NE TO AU BL QW UG DC EW ZB OE KY WZ IK ZF GM BN GM DE DV FK YW KE ZV</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">-Churn: solves every time, in about 50 million keys</span><br /><span style="font-family: verdana, geneva; font-size: small;"> (2 minutes on my computer);</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Hillclimbing: no solution at 4,000 million keys (2.5 hours).</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Churn programs for solving these ciphers are given in the 'Churn package' in the <a href="index.php/downloads">download section of this website.</a> These programs illustrate different ways of implementing the Churn algorithm, as is explained in the ReadMe file included in the package. Also in the package is this article as a MS Word document. Log Tetragram frequency tables are also in the download section.</span></p> <h1><br /><span style="font-family: verdana, geneva; font-size: small;">Limitations of the Churn algorithm.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">The algorithm depends on the scoring system correctly distinguishing whether the plaintext from one key is better than plaintext from another. This is done, as explained earlier, by assessing the log tetragraph frequency of plaintexts, using a table of frequencies derived from normal English texts.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">This system will fail if the plaintext of the cipher is not like normal English. For example the following plaintext was the solution to a recent cipher:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">"Junk jewelry jingles, jolts jerky jackanapes. Jealous jades jot jabberwocky, jerboa jumps jaguarundi."</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">This is full of unusual tetragraphs, and therefore has a relatively low score of just 432. In contrast, a piece of garbled plaintext that makes no sense will score higher. For example, during the Churn process the garbled text below is formed, and this scores 776 -- much more than the solution:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">"honyhepermchundrethirsthemychabyanafetheariothagethishallempibychemliahowfthadoamongu."</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Consequently Churn (or any other algorithm) meeting a cipher based on the unusual plaintext will favour other plaintexts which are garbled and incorrect but still have higher scoring tetragraphs than the piece of unusual text.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">As a result Churn will be led away from the true ciphertext and produce nothing but garbled plaintexts. Use of other n-graphs, such as digraphs or trigraphs, will suffer from the same problem and will also fail to solve.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">A way round this problem is to score with words. Although slower, it holds out a high chance of finding the solution when plaintexts are unusual. For example, the correct plaintext above scores 629 while the garbled one scores just 165.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Scoring with words is best carried out by initially putting all the words from a dictionary into a Trie structure. Then the scoring algorithm looks in the Trie to see whether successive plaintext letters form a word. If they do, the score is incremented by the square of the word length –- and the search process continues from the letter following the discovered word.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Again there is a limit to the success of this method. It will not work unless most words in the plaintext are in the Trie. Plaintexts with unusual words will thus demand a very big dictionary, which can demand more memory than the computer possesses or alternatively can excessively slow down the scoring process. However the scoring method usually succeeds.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">A second limitation of Churn, or indeed any other stochastic method, is cipher length. As ciphers get shorter so the amount of information available for scoring gets less, and the effectiveness of scoring diminishes. The result is that ciphers take longer to solve and eventually will not solve at all. This limiting aspect is reached at about 70 to 80 letters, depending on the cipher.</span><br /><br /></p> <h1><span style="font-family: verdana, geneva; font-size: small;">Adjusting for different scoring systems.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">For ciphers up to 200 letters long, the best scoring system is based on tetragraphs. For very long ciphers, like the 526-letter Playfair challenge cipher of Simon Singh's 'The Code Book', digraphs provide a faster result. You will find a <a href="articles/DiLog.txt">table of log digraph frequencies </a>in the Downloads section.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The overwhelming number of ciphers I come across are 80 to 120 letters long, and here the best scoring system is provided by log tetragraphs. You will find a table of log tetragraph frequencies, made from 12 books by my friend Hank Gibson in the downloads section of this website, in binary format and also as a text file.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">If you have made your own tetragraph table, it is likely that your frequencies differ from mine because of different lengths of texts. Consequently your score for a given plaintext is likely to be different to mine. To use my Churn array of value[] you will need to adjust your scores to be in line with mine. This is simple to do. Score a piece of plaintext with your log tetragraphs, and then work out the average tetragraph score. My average is 10. If yours is 8, then your factor is 1.25; multiply all your scores by this factor whenever you use the Churn algorithm.</span></p> <h1><br /><span style="font-family: verdana, geneva; font-size: small;">Adjusting for ciphers of different lengths.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">The 100 integers in the array called value[], that I introduced you to earlier, work well with ciphers with 100 to 120 letters. If you have a much longer cipher to solve, the scores of that cipher will be higher because of the extra length. It is then necessary to bring them back to what they would have been for a cipher of length 110 letters. That is simple to do. If the cipher is 220 letters long, then multiply all scores by a factor of 110/220 = 0.5</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">If your cipher is less than 100 letter, then scores will need increasing. So for a cipher of length 85 letters, increase the score by a factor of 110/85 = 1.3</span><br /><br /></p> <h1><span style="font-family: verdana, geneva; font-size: small;">Factors affecting solving speed.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">The longer the cipher, the faster it will solve because there is more information available to score each key. With this additional information the score is more meaningful and thus decisions on whether to accept or reject a changed key are more likely to be correct.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The dependence of solving time with length is exponential, so as ciphers get shorter they get a lot more difficult to solve.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The other factor that influences solving time is how closely the plaintext resembles normal English text. The nearer the text, the faster will a cipher solve. The relationship is again exponential, with solving time extending markedly as plaintexts move further away from everyday English words. </span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">To give you an idea of the size of this effect, a cipher in normal English of length 130 will solve in 1/50th of the time of a cipher of length 86 when using the Churn algorithm. This effect is normal whatever method is used to solve and, of course, is why the Military put limits on the length of cipher messages.</span></p></div> <div class="feed-description"><p><span style="font-family: verdana, geneva; font-size: small;">I have described how Simulated Annealing is a considerably more effective algorithm than Hillclimbing, but is more complicated to program and – in particular – to setup. I have found a way to avoid these difficulties with a related algorithm that I call the Churn algorithm.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">To develop Churn, I studied what happened in a typical SA run. In particular I wanted to find the probability that a worse key would be accepted (which is the downward step). This probability is related to how much worse the key is, as is shown in the chart below drawn from the data I collected.</span></p> <p style="text-align: left;"><img src="images/churn2.gif" border="0" alt="" style="display: block; margin-left: auto; margin-right: auto;" /><br /> <br /> <span style="font-family: verdana, geneva; font-size: small;">To mimic this situation in an algorithm, I created a series of 100 numbers from the chart to represent probabilities of acceptance.</span><br /><span style="font-family: verdana, geneva; font-size: small;">This series is just the worse scores from zero to 60, but with frequencies in the series to enable representation of probabilities of their acceptance.</span></p> <p><span style="font-family: 'courier new', courier; font-size: x-small;">1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,9,9,9,9,10,10,10,10,10,11,11,11,11,12,</span><br /><span style="font-family: 'courier new', courier; font-size: x-small;">12,12,12,13,13,14,14,15,15,16,16,16,16,17,17,18,18,19,19,20,21,21,22,23,23,24,25,26,27,28,29,30,31,33,34,37,39,41,43,48,57,60</span></p> <p><span style="font-family: verdana, geneva; font-size: small;">For example, say a particular key has a score worse by 2. To decide whether to accept this key, the algorithm selects at random one of the 100 numbers from the series. If this chosen number is greater than 2, then the key is accepted. You can see from the series that 86 of the numbers are greater than 2, so the probability that this worse key will be accepted is 86/100, or 86% as the chart requires. Similarly if a key has a worse score by say 42, the probability it will be accepted is just 4%.</span></p> <p><br /><span style="font-family: verdana, geneva; font-size: small;">To implement the Churn algorithm is now very simple. First in your hillclimber add an array called value[] that contains the 100 integers above. </span></p> <p><span style="font-family: verdana, geneva; font-size: small;">In your hillclimber, after a change is made to the key, you will currently have a line like:</span><br /> <br /><span style="font-family: 'courier new', courier; font-size: small;">if(score of changed key &gt; original key)</span><br /><span style="font-family: 'courier new', courier; font-size: small;">    accept the changed key;</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">As a second change alter the coding to this:</span><br /> <br /><span style="font-family: 'courier new', courier; font-size: small;">x=random(100);</span><br /><span style="font-family: 'courier new', courier; font-size: small;">if(score of changed key &gt; original key – value[x] )</span><br /><span style="font-family: 'courier new', courier; font-size: small;">    accept the changed key;</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">and you have changed your hillclimbing algorithm to Churn.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">You will get an immediate boost in solving performance. This will not be noticeable for quick solving ciphers, like simple substitution, but will be very evident in more difficult Transposition and Polygraphic Substitution ciphers such as the three below:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(a) Incomplete Columnar (25 columns)</span><br /><span style="font-family: 'courier new', courier; font-size: small;">SECER ISEAS DTLAE HERHL JTSSH FWRTF LHASR EIPMM OMLAS ETNSO HEDSI EDOHO RENHD LAENT STTPO REAME KHRTD ATCVN ATHLP TTPAI WOAPI NEOEE SHITI ITBAT NFEST ANTES TCRFI OINAO ADNTU LEOUS SIFRO NDIOW BSIES TEUEO AHSSF GEESS GAHTH HOTAO BIBHW TEAIE SHTHA TEDIU NGIRE SOESU ITUTE BDHTS ORLFN ABTLE SAUYM SOORA</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">-Churn: solves in average of 0.5 million keys</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Hillclimb: solves in average of 11 million keys</span><br /> <br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(b) Cadenus</span><br /><span style="font-family: 'courier new', courier; font-size: small;">MHDNE ARNTU NARBO OYBHN NAIAO TYTTI NEDIT HBLIE NREEO WTDFE RIEAT NSAIB HENOC NNALE TEOEF IEATA NRGNC ETATU GUMBH ELDCW SIIND EONRS EPHAT IUOHE UDEPE MTLIA TAWEN DDBNE BIALG SISDR SAORL IPEOL TETAA EEDLH YDIOA MYEEB ADHEH INNTH AFHNL HITTM NWINR GSAOV NREAN HAKCO GEEHD ONENM RATBD RTLIE ROGTS SEICE</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">-Churn: solves every time, average 50,000 keys (less than half</span><br /><span style="font-family: verdana, geneva; font-size: small;"> a second on my computer).</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Genetic: solves every time, 5 million keys.</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Hillclimbing: in 10 runs of 10 million keys each, fails 6 times.</span><br /> <br /> <br /><span style="font-family: verdana, geneva; font-size: small;">(c) difficult Playfair:</span><br /> <br /><span style="font-family: 'courier new', courier; font-size: small;">AK ZB BQ GR AK IO IW UY CD ZN EN NL UG NP OV DC RZ UC KW OI PL GU IU NE TO AU BL QW UG DC EW ZB OE KY WZ IK ZF GM BN GM DE DV FK YW KE ZV</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">-Churn: solves every time, in about 50 million keys</span><br /><span style="font-family: verdana, geneva; font-size: small;"> (2 minutes on my computer);</span><br /><span style="font-family: verdana, geneva; font-size: small;">-Hillclimbing: no solution at 4,000 million keys (2.5 hours).</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Churn programs for solving these ciphers are given in the 'Churn package' in the <a href="index.php/downloads">download section of this website.</a> These programs illustrate different ways of implementing the Churn algorithm, as is explained in the ReadMe file included in the package. Also in the package is this article as a MS Word document. Log Tetragram frequency tables are also in the download section.</span></p> <h1><br /><span style="font-family: verdana, geneva; font-size: small;">Limitations of the Churn algorithm.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">The algorithm depends on the scoring system correctly distinguishing whether the plaintext from one key is better than plaintext from another. This is done, as explained earlier, by assessing the log tetragraph frequency of plaintexts, using a table of frequencies derived from normal English texts.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">This system will fail if the plaintext of the cipher is not like normal English. For example the following plaintext was the solution to a recent cipher:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">"Junk jewelry jingles, jolts jerky jackanapes. Jealous jades jot jabberwocky, jerboa jumps jaguarundi."</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">This is full of unusual tetragraphs, and therefore has a relatively low score of just 432. In contrast, a piece of garbled plaintext that makes no sense will score higher. For example, during the Churn process the garbled text below is formed, and this scores 776 -- much more than the solution:</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">"honyhepermchundrethirsthemychabyanafetheariothagethishallempibychemliahowfthadoamongu."</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Consequently Churn (or any other algorithm) meeting a cipher based on the unusual plaintext will favour other plaintexts which are garbled and incorrect but still have higher scoring tetragraphs than the piece of unusual text.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">As a result Churn will be led away from the true ciphertext and produce nothing but garbled plaintexts. Use of other n-graphs, such as digraphs or trigraphs, will suffer from the same problem and will also fail to solve.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">A way round this problem is to score with words. Although slower, it holds out a high chance of finding the solution when plaintexts are unusual. For example, the correct plaintext above scores 629 while the garbled one scores just 165.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Scoring with words is best carried out by initially putting all the words from a dictionary into a Trie structure. Then the scoring algorithm looks in the Trie to see whether successive plaintext letters form a word. If they do, the score is incremented by the square of the word length –- and the search process continues from the letter following the discovered word.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">Again there is a limit to the success of this method. It will not work unless most words in the plaintext are in the Trie. Plaintexts with unusual words will thus demand a very big dictionary, which can demand more memory than the computer possesses or alternatively can excessively slow down the scoring process. However the scoring method usually succeeds.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">A second limitation of Churn, or indeed any other stochastic method, is cipher length. As ciphers get shorter so the amount of information available for scoring gets less, and the effectiveness of scoring diminishes. The result is that ciphers take longer to solve and eventually will not solve at all. This limiting aspect is reached at about 70 to 80 letters, depending on the cipher.</span><br /><br /></p> <h1><span style="font-family: verdana, geneva; font-size: small;">Adjusting for different scoring systems.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">For ciphers up to 200 letters long, the best scoring system is based on tetragraphs. For very long ciphers, like the 526-letter Playfair challenge cipher of Simon Singh's 'The Code Book', digraphs provide a faster result. You will find a <a href="articles/DiLog.txt">table of log digraph frequencies </a>in the Downloads section.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The overwhelming number of ciphers I come across are 80 to 120 letters long, and here the best scoring system is provided by log tetragraphs. You will find a table of log tetragraph frequencies, made from 12 books by my friend Hank Gibson in the downloads section of this website, in binary format and also as a text file.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">If you have made your own tetragraph table, it is likely that your frequencies differ from mine because of different lengths of texts. Consequently your score for a given plaintext is likely to be different to mine. To use my Churn array of value[] you will need to adjust your scores to be in line with mine. This is simple to do. Score a piece of plaintext with your log tetragraphs, and then work out the average tetragraph score. My average is 10. If yours is 8, then your factor is 1.25; multiply all your scores by this factor whenever you use the Churn algorithm.</span></p> <h1><br /><span style="font-family: verdana, geneva; font-size: small;">Adjusting for ciphers of different lengths.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">The 100 integers in the array called value[], that I introduced you to earlier, work well with ciphers with 100 to 120 letters. If you have a much longer cipher to solve, the scores of that cipher will be higher because of the extra length. It is then necessary to bring them back to what they would have been for a cipher of length 110 letters. That is simple to do. If the cipher is 220 letters long, then multiply all scores by a factor of 110/220 = 0.5</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">If your cipher is less than 100 letter, then scores will need increasing. So for a cipher of length 85 letters, increase the score by a factor of 110/85 = 1.3</span><br /><br /></p> <h1><span style="font-family: verdana, geneva; font-size: small;">Factors affecting solving speed.</span></h1> <p><span style="font-family: verdana, geneva; font-size: small;">The longer the cipher, the faster it will solve because there is more information available to score each key. With this additional information the score is more meaningful and thus decisions on whether to accept or reject a changed key are more likely to be correct.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The dependence of solving time with length is exponential, so as ciphers get shorter they get a lot more difficult to solve.</span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">The other factor that influences solving time is how closely the plaintext resembles normal English text. The nearer the text, the faster will a cipher solve. The relationship is again exponential, with solving time extending markedly as plaintexts move further away from everyday English words. </span><br /> <br /><span style="font-family: verdana, geneva; font-size: small;">To give you an idea of the size of this effect, a cipher in normal English of length 130 will solve in 1/50th of the time of a cipher of length 86 when using the Churn algorithm. This effect is normal whatever method is used to solve and, of course, is why the Military put limits on the length of cipher messages.</span></p></div>