Top/KolmogorovComplexity

Top | Top | recent changes | Preferences

Kolmogorov complexity is motivated by the principle of Occam's razor --- that, given many explanations for something observed, the simplest one is preferred.

Fix alphabet {0,1}.

Fix some encoding of Turing machines.

For any string x in {0,1}*, define K(x), the Kolmogorov complexity of x, as follows. Consider all Turing machines that, when run on blank tape, output x. K(x) is the length of the smallest encoding of any of these Turing machines. That is, K(x) is the size of the smallest Turing machine encoding x.

some facts:

for all x, K(x) ≤ |x| + O(1) (because x can be directly encoded into a machine)
for all n, for x = 1n, K(x) ≤ log(|x|) + O(1)
for all n, for x = 12n, K(x) ≤ log(log(|x|)) + O(1)
for almost all x of length n, K(x) ≥ |x|/2 --- almost all strings have high Kolmogorov complexity. We proved this using the observation that, while there are 2n strings of length n, there are only 2n/2+1 encodings of length n/2 or less.

See chapter 6 of Sipser for basic properties of Kolmogorov complexity.

Intuitively, although almost all strings have high Kolmogorov complexity, for any given string, it is impossible to verify that it has high Kolmogorov complexity. Next we try to formalize this intuition a bit.

Define HARD={ x ∈ {0,1}* : K(x) ≥ |x|/2 }.

A language L is immune if (1) L is infinite, (2) every Turing-recognizable subset of L is finite. That is, not only is L not Turing-recognizable, but the only subsets of L that are Turing recognizable are the finite subsets.

claim: HARD is immune.


proof: Suppose for contradiction that there is an infinite Turing recognizable subset L of HARD. Let M be a Turing machine recognizing L. Define (for each i=1,2,3,...) Turing machine Mi as follows:
  1. Dovetailing, run M on all inputs x of length 2i or more.
  2. Output the first input x that M accepts.

Mi always halts because, as L(M) is infinite, there will be some input of size 2i or larger that M accepts.

Define xi to be the string output by Mi. What can we say about K(xi)?

K(xi) ≤ |Mi| = O(log i) + |M| + O(1) = O(log i) because Turing machine Mi outputs xi
K(xi) ≥ |xi|/2 ≥ 2i-1 because xi ∈ HARD and |xi| ≥ 2i.

Thus, for all i, 2i-1 ≤ O(log i), which is a contradiction.


corollary: K(x) is not computable.

Relevance to Goedel's theorem (incompleteness of axiomatic systems)

Let A be any axiomatic system for reasoning about (at least) Turing machines.

Previously we argued that any such axiomatic system is either incomplete (for some statement S, neither S nor its negation was provable) or inconsistent (for some statement S, both S and its negation were provable). One interpretation is that either there is a true statement that is not provable, or there is a false statement that is provable.

The proof was by diagonalization. We carefully constructed a particular S by diagonalizing against the statements provable by A.

Next we want to argue for something much stronger. That in fact, for any fixed A and any (large enough) integer n, a constant fraction of all statements of length n are true but not provable by A.

Consider, for any string x, the statement S(x) = "String x has K(x) at least |x|/2".

For some constant c, the statement S(x) can be written down (in the language of A) in |x|+c bits. (|x| bits to write down x, c bits to write down the def'n of Kolmogorov complexity.)

Thus, among the at most 2n n-bit statements, at least 2n-c are of the form S(x) for some x (where |x| = n-c, in fact).

Furthermore,

  1. Almost all of these latter statements are true (because almost all strings have high kolmogorov complexity).
  2. For large n, none of these statements are provable. (To see this, note that if the set of provable statements of the form S(x) for some x was infinite, then there would be a Turing machine that accepted an infinite subset of HARD. The T.M. would work as follows: "On input x, enumerate all possible proofs using the axioms from A (dovetailing) and accept if a proof that K(x) ≥ |x|/2 is found.")

Thus, among all statements of length n, at least a fraction 1/2c of them are true but not provable.


One philisophical interpretation is this. At any time, for any set of axioms or procedures that you've come up with so far, you are likely to find statements whose truth you cannot determine using the axioms or procedures that you've figured out to date. Unless you stay safely in the set of things you already know, which, of course, is not much fun. And may not be possible anyway, really.


We've seen something similar before: we know that the collection of all languages is uncountable (like the reals), while the collection of Turing-recognizable languages is countable (like the integers).

Our conclusion regarding Kolmogorov complexity is that, even though the set of true statements is countable, and the set of provable statements is countable, for any axiomatic system A, the set of true but not provable statements is a constant fraction of all the statements.


Top | Top | recent changes | Preferences
This page is read-only | View other revisions
Last edited November 30, 2004 11:55 am by Neal Young (diff)
Search: