Stuart’s MATLAB Videos

Watch and Learn

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the Original version of the page.

Choosing algorithm based on data size. 2

Posted by Doug Hull,

Today we will look further at the cypher algorithm, specifically the randomness of this algorithm. There are two different distributions to look at: distribution of single numbers (1,2,3,4) and the distribution of digrams ([1,1],[1,2],[1,3],[2,1],[2,2],…). The first attempts at is to generate the entire sequence of random values, and then do these distribution counts. This is easy and intuitive to implement. However, when the number of random values got to be over a billion, holding that much in memory at once has performance implications. An alternative method is proposed. While this method is slower (slightly), it scales better and can work on counts that would fail for the first algorithm. The general lesson here is that when you are dealing with large data sets, if you do not need to have it all in memory at once, you might want to take it piece by piece.

2 CommentsOldest to Newest

Brad Stiritz replied on : 1 of 2
Hi Doug, very interesting video & results, thanks. Could you please post your code for the animated graphics (shown at beginning of video) somewhere & then add a link to this blog posting? I'd really like to check it out :) Thank you, fingers crossed..
Doug replied on : 2 of 2
@Brad, The animations in the video were done manually with Power Point. Nothing fancy, just lots of copy and paste! Doug