Stuart’s MATLAB Videos

Watch and Learn

Choosing algorithm based on data size.

Today we will look further at the cypher algorithm, specifically the randomness of this algorithm. There are two different distributions to look at: distribution of single numbers (1,2,3,4) and the distribution of digrams ([1,1],[1,2],[1,3],[2,1],[2,2],…). The first attempts at is to generate the entire sequence of random values, and then do these distribution counts. This is easy and intuitive to implement. However, when the number of random values got to be over a billion, holding that much in memory at once has performance implications. An alternative method is proposed. While this method is slower (slightly), it scales better and can work on counts that would fail for the first algorithm. The general lesson here is that when you are dealing with large data sets, if you do not need to have it all in memory at once, you might want to take it piece by piece.

|
  • print

댓글

댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.