Once we know what characteristics are desirable in a random sequence, we can evaluate PRNGs in terms of the sequences they generate. Just as important, we can decide which characteristics do not affect our use of the PRNG. If a particular characteristic has no effect on your problem, there is no need to consider that characteristic when evaluating PRNGs.
3.1 Distribution
Probably the most fundamental characteristic of a sequence of numbers
is its distribution. The PRNGs in the ThinAir library (with a few
exceptions) produce uniformly distributed number sequences. Other
distributions include normal, Poisson, geometric,
binomial, and student-t. Other authors have published algorithms
that convert uniformly distributed number sequences into other distributions.
(See [4, pages 131-133])
3.2 Range
Another fundamental characteristic of a random number sequence is its
range. What is the smallest number in the sequence? What is the largest?
These two numbers define the range of the whole sequence. The range of a
sequence is more important and more subtle than most people realize. If the
range of a random number sequence is 0 to 64, it is useless for generating
random percentages. One good example of a bad range problem appears in some
simple card games. In these examples, a 16-bit random number generator is
used to choose a layout of a deck of cards. The problem lies in the fact that
the generator can produce at most 65,536 different numbers. A deck of cards
can produce 80.66 * 1066 different layouts. If you are very careful in
your use of the generator, this problem is solvable. If not, most of the
possible layouts never occur.
3.3 Type
Another useful characteristic of a random sequence is its type. What
kind of numbers make up this sequence. Are we talking about bits, integers,
floating point numbers, strings, or something else entirely. Most of these
types can convert into each other relatively easily. In general, PRNGs
fall into three categories:
3.4 Period
All algorithm-based PRNGs repeat if they are allowed to run for long enough.
This is caused by the fact that the generator has a finite amount of internal
state, and by the fact that an algorithm cannot produce truly random
output. Since we know the sequence must repeat, we would obviously like to have
it repeat after as long a time as possible. The number of individual numbers
generated by the PRNG before it begins to repeat is known as the period
of the sequence. This sequence of numbers is often called a cycle. The
amount of internal state retained by a PRNG defines the upper limit on the
period. A PRNG with a period equal to the maximum predicted by its internal
state is sometimes called a maximal-period generator.
Depending on the algorithm used, a PRNG may actually generate more than one cycle. The more individual cycles a generator has, the shorter the period each cycle has. The best generator would, of course, have a single cycle that was longer than we could ever use (effectively infinite). Some multiplicative congruential random number generators have two cycles, one being very long (approximately 232-1 numbers in the sequence), the other being a cycle of period 1 at 0.
3.5 Coverage
The term coverage describes how well a given sequence covers its output
range. For example, a PRNG with an output range of 0 to 232-1 should
generate every number between 0 and 232-1 if allowed to run for long
enough. Not all PRNGs cover their entire output range.
If a PRNG has multiple cycles, the coverage issue is even more interesting. The generator may actually be capable of covering the whole range of output values. However, only certain values are covered in each cycle. The multiplicative congruential PRNGs mentioned above have a maximum coverage one less than expected in the main cycle. This is because an output of zero is never part of the main cycle.
Many tests have been devised to study coverage. Two of these are the equidistribution, or frequency, test and the gap test. The equidistribution test tries to verify that all of the numbers in the sequence are equally likely. The gap test studies the probabilities of gaps of different sizes between numbers in a particular range.
3.6 Runs
One interesting characteristic of a sequence of numbers concerns runs.
When each successive number in a subsequence is larger than the previous number,
we have a run up. When each successive number in a subsequence is smaller
than the previous number, we have a run down. In a good random sequence
we would expect the number of runs up to be pretty close to the number of runs
down. We would also expect that longer runs would be less likely than shorter
runs.
3.7 Serial Correlation
A good sequence of random numbers should not have any correlation between
successive numbers. This means that you should not be able to predict the next
number in the sequence by looking at previous numbers. The classic example of
serial correlation in a PRNG is the linear congruential generator. When
a linear congruential PRNG is used to generate pairs of numbers for use as
x, y coordinates on a graph, a pattern often shows up in the graph. The pairs
of numbers tend to fall on a set of parallel lines on the graph. This
artifact is a simple result of the algorithm used.
One way to test for high serial correlation is the Poker test. The Poker test looks at the combinations of five successive numbers taken from the sequence. These numbers are analyzed for pairs, three-of-a-kind, full house, etc. The probabilities of these combinations should approach that of a random number stream.
3.8 Permutations
In a truly random number stream, any permutation of a set of numbers is as
likely as any other permutation of the same numbers. If one permutation of a
set of numbers is more likely than another, a few numbers in the set may supply
enough information to help predict following numbers.
3.9 Spectral Characteristics
Even if a sequence of numbers does not exhibit any obvious patterns, it may
have higher order artifacts. A truly random sequence would have no particular
frequency component more pronounced than any other. Since any PRNGs must repeat
after some length of time, there is at least one dominant frequency component.
Studying the spectral characteristics tells us if there are any others.