Most mathy people have a pretty good mental model of what a random process is (for example, generating a sequence of 20 independent bits).
I think most mathy people also have the intuition that there’s a sense in which an individual string like 10101001110000100101
is more “random” than 0000000000000000000
even though both strings are equally likely under the above random process, but they don’t know how formalize it, and may even doubt that there is any way to make sense of this intuition.
Mathematical logic (or maybe theoretical computer science) has a method for quantifying the randomness of individual strings: given a string , the Kolmogorov complexity
of
is the length of the shortest Turing machine that outputs it.
In this blog post, I would like to explain why I think this is a very satisfying definition.
Keeping Grounded
I think a good way to help avoid philosophical quagmires when thinking about randomness is to recognize that random numbers are useful in the real world, and to make sure that your thinking about randomness preserves that.
For example, there are algorithms that take a fixed length
string
, and produce the correct answer to whatever problem they’re trying to solve on some large proportion
of all the length
strings. Then a good approach would just be to feed
a random
, and you’ll get the right answer with probability
.
Just to give a concrete example: a very familiar way that random numbers are useful is to estimate the average of a large list of numbers by taking a random sample and averaging them. You might have a list of 1000 numbers (say, bounded between 0 and 10), and have encode a set of 100 indices, then
will return the average of the numbers at those indices. If you say that
succeeds for this problem if it returns an average that’s within some fixed tolerance of the true average, then you can work out
for the given tolerance (although I think getting exact numbers for this problem is actually pretty tricky).
The reason that I think that the Kolmogorov complexity is a good account of randomness is that they above story “factors” through Kolmogorov complexity in the following way: For any computable where
is high enough (in a sense to be made precise below), there is an integer
such that:
- For all
with
,
returns a correct answer.
- Almost all
(of the given length
) have
.
That is, Kolmogorov complexity lets you view the problem as follows: Any string of high complexity will yield the right answer when fed into , so the only role of randomness is as an easy way to generate a string of a high Kolmogorov complexity.
As a note: the notation means the shortest Turing program that outputs
when given
as an input. The reason for using this concept instead of
is that we want to, e.g., consider any string of all 0s to be low complexity, even if the length of the string happens to be a high complexity number.
Some Rough Intuitions
The intuition for why almost all strings should have high Kolmogorov complexity is that there are only so many Turing machines: For example, there are
strings of length
and
Turing machines of length
, so the proportion of strings of Kolmogorov complexity
must be at least
.
The intuition for why should be correct for all strings of sufficiently high complexity is as follows: We’re presuming that
is correct for most strings, and that
is computable. If
isn’t correct, that means you can describe it fairly succinctly: i.e., as the
th string
for which
isn’t correct. This will be a short description since, by presumption,
will be small.
Formalization
I said above that this fact about Kolmogorov complexity only holds if is high
enough. How can we formalize this? One approach would be to consider a sequence of algorithms instead of a single
as above. Each algorithm
should return a correct answer on at least
of its input strings. Furthermore, the different algorithms should be consistent: specifically, if
returns the correct answer, then so should
for
.
Now, if we kept the size of the input string
fixed, then this would be trivial, since for
greater than
,
would have to return the correct answer on any string. So we should also consider algorithms
that take input strings of length
and give a correct answer on at least
of those strings. (And if
, we will have to define “correct answer” for
so that every input string returns a correct answer. Thus
won’t be very useful, but we can look at
for higher
s.)
In fact, it turns out we can just describe in terms of the sets of input strings on which the algorithm returns a correct answer.
Definition: A P-test
is an assignment of a natural number to each
finite stringsuch that, for each
, the number of
of length
such that
is
.
If has length
, then
corresponds to
being the smallest
such that
returns a correct answer in our discussion above.
Theorem (Martin-Löf?): For any computable P-test
, there is a constant
such that: For all
of length
and natural numbers
, if
, then
.
Furthermore, the proportion of
such that
is at least
.
I think this was one of Martin-Löf’s original theorems but I’m actually not sure. It’s a rephrasing of the results in Section 2.4 of Li and Vitányi’s book.
So, there is a complexity bound such that any string of high enough complexity will return a correct answer when plugged into the algorithm. However,
may have to be made high (which corresponds to making high) to ensure
that there are a large number of such high complexity strings (or any at all).
What about Noisy Data?
The algorithms discussed above are all deterministic: that is, they correspond to things like Monte Carlo integration rather than averaging noisy data collected from the real world.
So what about noisy data? Random numbers are also useful in analyzing real world data, but the theorem above only applies to computable algorithms. The answer is so simple that it seems like cheating: if you model the noise in your data as coming from some infinite binary sequence , you can simply redo the whole thing but with Turing machines that have access to
! In other words, you won’t get theorems about
, but you will get theorems about
, which is the length of the shortest Turing machine that has access to
and outputs
.
What about Infinite Random Sequences?
Above we considered algorithms that knew ahead of time how many random bits we need. What about algorithms that might request a random bit at any time? This is also handled by Kolmogorov complexity: here we say that an infinite binary sequence is Martin-Löf random if there is some such that each prefix of the sequence of length
has complexity at least
. (There actually has to be a technical change to the definition of complexity of finite strings in this case.)
As in the finite case, there’s a theorem saying that any sufficiently robust algorithm will yield a correct answer on any Martin-Löf random sequence.
One thing I like about this framework is that it provides an idea for what it means for a single infinite sequence to be random. For example, people often say that the primes are random (in fact, it’s one of their main points of interest). Since the primes are computable, they aren’t random in this sense, but this gives an idea of what it might mean: perhaps there’s some programming language that encapsulates “non-number-theoretic” ideas in some way, and some sequence derived from the primes can be shown to be “Martin-Löf” random with Turing machines replaced by this weaker description language. But this is pure speculation.