Consider flipping a fair coin n times. How likely it is that the number of heads is much larger than its expectation, n/2?
Define p(k) = Pr[#heads = k] = (n choose k)/2n. Define S(k) = ∑ni=k p(i) -- this is the probability of k or more heads.
Note: (n choose k) = n!/(k!(n-k)!).
Unfortunately there is no simple expression for S(k), however we can understand its approximate behavior well. To do so we first need to understand p(k) better.
P(k) is maximized at k=n/2. By StirlingsApproximation p(n/2) is proportional to 1/sqrt(n).
The ratio p(n/2+d)/p(n/2) is fairly easy to estimate; it's about e-2d2/n. (Work this out...) This means that, up to constant factors, p(n/2+d) is about the same as p(n/2) as long as d is O(sqrt(n)). Intuitively, as d gets larger than that, p(n/2+d) gets exponentially small in comparison to 1/sqrt(n).
Intuitively, this means that S(n/2+d) will be constant as long as d = O(sqrt(n)) because the first sqrt(n) terms in the sum will each contribute an amount proportional to sqrt(n).
Then, for d >> sqrt(n), the terms in the sum S(n/2+d) are very small: the first term is proportional to e-2d2/n/sqrt(n), and the sum is dominated either by its first few terms (e.g., the first sqrt(n) terms dominate the sum (for d near sqrt(n)) or perhaps the first term alone dominates the sum (for d much larger)). Thus, S(n/2+d) is, up to a factor of 1/sqrt(n), proportional to e-2d2/n.
We can sum this up by saying that up to constant factors, we can think of the distribution of heads as essentially uniform in a range {n/2-c*sqrt(n),...,n/2+c*sqrt(n)}, as long as we don't try to pin down the constant c too precisely.
Thus, with very high probability, the number of heads will be very close to its expectation n/2 (within O(sqrt(n))) .
This is true for any sum of independent random variables, as long as each variable is relatively small compared to the expectation of the sum. This is the intuition behind the Chernoff bound:
thm: Let X = ∑i Xi be the sum of independent random variables where each Xi is between 0 and 1. Let μ = E[X] and choose any ε > 0. Then Pr[X > (1+ε)μ] ≤ e-μ*min(ε,ε2)/3. Also Pr[X < (1-ε)μ] ≤ e-μ*ε2/2.
Exercise: Bound the probability that N coin flips yield more than N/2+D heads. What bound do you get for D = 100 sqrt(N) ? For D = 100 sqrt(N log N) ?
Exercise: If M balls are thrown into N bins, the expected number of balls in a bin is M/N. How much deviation from M/N do you expect in a "typical" bin (i.e. in a typical bin, with constant probability) and in the "extreme" bin (i.e. in the bin with the most balls, w/ const. prob.)? Answer this for M=N3 and then for M=N.
References: