A Good Definition of Randomness

Most mathy people have a pretty good mental model of what a random process is (for example, generating a sequence of 20 independent bits).

I think most mathy people also have the intuition that there’s a sense in which an individual string like 10101001110000100101 is more “random” than 0000000000000000000 even though both strings are equally likely under the above random process, but they don’t know how formalize it, and may even doubt that there is any way to make sense of this intuition.

Mathematical logic (or maybe theoretical computer science) has a method for quantifying the randomness of individual strings: given a string $\sigma$ , the Kolmogorov complexity $C(\sigma)$ of $\sigma$ is the length of the shortest Turing machine that outputs it.

In this blog post, I would like to explain why I think this is a very satisfying definition.

Keeping Grounded

I think a good way to help avoid philosophical quagmires when thinking about randomness is to recognize that random numbers are useful in the real world, and to make sure that your thinking about randomness preserves that.

For example, there are algorithms $P(\sigma)$ that take a fixed length string $\sigma$ , and produce the correct answer to whatever problem they’re trying to solve on some large proportion $\alpha$ of all the length strings. Then a good approach would just be to feed a random $\sigma$ , and you’ll get the right answer with probability $\alpha$ .

Just to give a concrete example: a very familiar way that random numbers are useful is to estimate the average of a large list of numbers by taking a random sample and averaging them. You might have a list of 1000 numbers (say, bounded between 0 and 10), and have $\sigma$ encode a set of 100 indices, then $P(\sigma)$ will return the average of the numbers at those indices. If you say that succeeds for this problem if it returns an average that’s within some fixed tolerance of the true average, then you can work out $\alpha$ for the given tolerance (although I think getting exact numbers for this problem is actually pretty tricky).

The reason that I think that the Kolmogorov complexity is a good account of randomness is that they above story “factors” through Kolmogorov complexity in the following way: For any computable where $\alpha$ is high enough (in a sense to be made precise below), there is an integer such that:

For all $\sigma$ with $C(\sigma|n) > c$ , $P(\sigma)$ returns a correct answer.
Almost all $\sigma$ (of the given length ) have $C(\sigma|n) > c$ .

That is, Kolmogorov complexity lets you view the problem as follows: Any string of high complexity will yield the right answer when fed into , so the only role of randomness is as an easy way to generate a string of a high Kolmogorov complexity.

As a note: the notation $C(\sigma|n)$ means the shortest Turing program that outputs $\sigma$ when given as an input. The reason for using this concept instead of $C(\sigma)$ is that we want to, e.g., consider any string of all 0s to be low complexity, even if the length of the string happens to be a high complexity number.

Some Rough Intuitions

The intuition for why almost all strings $\sigma$ should have high Kolmogorov complexity is that there are only so many Turing machines: For example, there are 2^n strings of length and $2^{n-c}-1$ Turing machines of length $\leq n - c - 1$ , so the proportion of strings of Kolmogorov complexity $\geq n - c$ must be at least $1 - 2^{-c}$ .

The intuition for why $P(\sigma)$ should be correct for all strings of sufficiently high complexity is as follows: We’re presuming that $P(\sigma)$ is correct for most strings, and that is computable. If $P(\sigma)$ isn’t correct, that means you can describe it fairly succinctly: i.e., as the th string $\tau$ for which $P(\tau)$ isn’t correct. This will be a short description since, by presumption, will be small.

Formalization

I said above that this fact about Kolmogorov complexity only holds if $\alpha$ is high
enough. How can we formalize this? One approach would be to consider a sequence of algorithms P_i instead of a single as above. Each algorithm P_i should return a correct answer on at least $1-2^{-i}$ of its input strings. Furthermore, the different algorithms should be consistent: specifically, if $P_i(\sigma)$ returns the correct answer, then so should $P_j(\sigma)$ for j > i .

Now, if we kept the size of the input string $\sigma$ fixed, then this would be trivial, since for greater than , P_i would have to return the correct answer on any string. So we should also consider algorithms $P_{i,n}$ that take input strings of length and give a correct answer on at least $1-2^{-i}$ of those strings. (And if i>n , we will have to define “correct answer” for $P_{i,n}$ so that every input string returns a correct answer. Thus $P_{i,n}$ won’t be very useful, but we can look at $P_{i,n'}$ for higher s.)

In fact, it turns out we can just describe in terms of the sets of input strings on which the algorithm returns a correct answer.

Definition: A P-test $\delta$ is an assignment of a natural number to each
finite string $\sigma$ such that, for each , the number of $\sigma$ of length such that $\delta(\sigma) \geq m$ is $\geq 2^{n - m}$ .

If $\sigma$ has length , then $\delta(\sigma) = m$ corresponds to being the smallest m_0 such that $P_{m_0,n}(\sigma)$ returns a correct answer in our discussion above.

Theorem (Martin-Löf?): For any computable P-test $\delta$ , there is a constant such that: For all $\sigma$ of length and natural numbers , if $C(\sigma|n) \geq n + c - m$ , then $\delta(\sigma) \leq m$ .

Furthermore, the proportion of $\sigma$ such that $C(\sigma|n) \geq n + c - m$ is at least $1 - 2^{c-m}$ .

I think this was one of Martin-Löf’s original theorems but I’m actually not sure. It’s a rephrasing of the results in Section 2.4 of Li and Vitányi’s book.

So, there is a complexity bound such that any string of high enough complexity will return a correct answer when plugged into the algorithm. However,
may have to be made high (which corresponds to making $\alpha$ high) to ensure
that there are a large number of such high complexity strings (or any at all).

What about Noisy Data?

The algorithms discussed above are all deterministic: that is, they correspond to things like Monte Carlo integration rather than averaging noisy data collected from the real world.

So what about noisy data? Random numbers are also useful in analyzing real world data, but the theorem above only applies to computable algorithms. The answer is so simple that it seems like cheating: if you model the noise in your data as coming from some infinite binary sequence , you can simply redo the whole thing but with Turing machines that have access to ! In other words, you won’t get theorems about $C(\sigma)$ , but you will get theorems about $C^X(\sigma)$ , which is the length of the shortest Turing machine that has access to and outputs $\sigma$ .

What about Infinite Random Sequences?

Above we considered algorithms that knew ahead of time how many random bits we need. What about algorithms that might request a random bit at any time? This is also handled by Kolmogorov complexity: here we say that an infinite binary sequence is Martin-Löf random if there is some such that each prefix of the sequence of length has complexity at least n - c . (There actually has to be a technical change to the definition of complexity of finite strings in this case.)

As in the finite case, there’s a theorem saying that any sufficiently robust algorithm will yield a correct answer on any Martin-Löf random sequence.

One thing I like about this framework is that it provides an idea for what it means for a single infinite sequence to be random. For example, people often say that the primes are random (in fact, it’s one of their main points of interest). Since the primes are computable, they aren’t random in this sense, but this gives an idea of what it might mean: perhaps there’s some programming language that encapsulates “non-number-theoretic” ideas in some way, and some sequence derived from the primes can be shown to be “Martin-Löf” random with Turing machines replaced by this weaker description language. But this is pure speculation.

A Good Definition of Randomness

Keeping Grounded

Some Rough Intuitions

Formalization

What about Noisy Data?

What about Infinite Random Sequences?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112