## Chebyshev's Bias

—————————

My National Review Online *Diary* column for September 2002 had a note on a
mathematical topic: Chebyshev's bias. I include
the note below, as it appeared.

The note ends with me saying: "Michael Rubinstein and Peter Sarnak proved in 1994 that the violations have
nonzero density, a fascinating and
counter-intuitive result …" Several readers felt this was a cliff-hanger. What does "nonzero
density" mean? they wanted
to know. So after the original piece, I've appended a note on logarithmic density.

—————————

I have just lost a minor battle with the editors of my book about
math. I wanted to add a 6-page appendix on Chebyshev's Bias. They: "No! The darn book is already too long!
No! *No! NO!!!*"

OK, fine. Chebyshev's Bias deserves to be much better known than it is, though,
so to get the word out, I'm going to blog it, right here. This is *absolutely the only* conservative web
site where you get serious math.

Write down the first few prime numbers:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 …

Divide each one by 4 and note the remainder:

2 3 1 3 3 1 1 3 3 1 3 1 1 3 3 1 3 …

Once you get past *p* = 2, the remainder must be either 1 or 3. Which one is "ahead"
at
any point? Denoting the answer by 1, 3, or T (for "tie"), the answer is:

T 3 T 3 3 3 T 3 3 3 3 3 T 3 3 3 3 …

That's a Chebyshev bias. Do the 1's ever take the lead? Yes, they do; but not until
*p* = 26,861.
And that's nothing: if you divide by 3 instead of 4, the remainder (once you get past *p* = 3) must be
either 1 or 2. The bias is to 2; and *that* bias doesn't get violated until
*p* = 608,981,813,029!
(This result wasn't found until 1978, by Carter Bays and Richard Hudson.)

If you divide by 10 instead of by 4 or 3, you will just get the last digit of your prime number. (659
divided by 10 leaves remainder 9.) Once you get past *p* = 2 and *p* = 5, every prime
number must
end in 1, 3, 7, or 9. Is there a Chebyshev bias?

I ran through all the primes up to *p* = 100,711,433,
which is as many as I keep handy on disk. That's the first 5.8 million primes. Threes and sevens were in the
lead roughly 2.8 million times each, ones had 113,922 leads, nines had 357, and there were 26,776 ties.

Notice, by the way, that these "who's ahead" biases arise from very small margins. The actual counts for ones, threes, sevens and nines as last digit in those first 5.8 million primes were: 1,449,824 ones, 1,450,185 threes, 1,450,153 sevens, and 1,449,836 nines — a variation of only 361, a niggardly 0.025 percent.

The situation resembles those "first past the post" election systems, where a nationwide majority of 51 percent can give your party a landslide in terms of parliamentary seats; or a foot race with very well-matched runners, in which one runner manages to stay slightly ahead for most of the race, and gets all the glory.

The English mathematician J.E. Littlewood proved in 1914 that any Chebyshev bias gets violated
infinitely often, if you go far enough. Michael Rubinstein and Peter Sarnak proved in 1994 that the violations
have nonzero density, a fascinating and counter-intuitive result … But that's about as much math as
I can get away with on NRO. You'll have to read the amazing Rubinstein-Sarnak result for yourself:
"Chebyshev's Bias," in *Experimental Mathematics*, Vol.3, 1994 (pp. 173-197).

—————————

OK, here's the addendum on logarithmic density

## Logarithmic density

Consider the ordinary counting numbers 1, 2, 3, 4, … Mathematicians call them "the natural numbers." Now, it often happens in math that we find ourselves dealing with some sub-set of the natural numbers. Examples:

- The even numbers: 2, 4, 6, 8, 10, 12, 14, …
- The powers of two: 1, 2, 4, 8, 16, 32, 64, …
- The perfect squares: 1, 4, 9, 16, 25, 36, 49, …
- The primes: 2, 3, 5, 7, 11, 13, 17, …
- The Fibonacci numbers: 1, 2, 3, 5, 8, 13, 21, … (from 3 on, each is the sum of the previous two).
- Numbers with no square factor: 1, 2, 3, 5, 6, 7, 10, …
- Numbers that leave remainder two when you divide by seven: 9, 16, 23, 30, 37, 44, …
- The "slices of pi" sequence: 3, 14, 15, 92, 653, 5897, 9323, 84626, 433832, …

When we are dealing with one of these sub-sets, an obvious question to ask is: How *big* is this
sub-set, relative to all the natural
numbers? What proportion of the natural numbers belong to it? A popular way to ask the same question is: If I pick a
number at random from all
the natural numbers, what's the probability that the number I picked belongs to this sub-set? The mathematical term of
art is: What's the
sub-set's **density**?

On the one hand this is a simple question, to which the answer is often perfectly plain. For the first and
seventh of the sub-sets above, the
answers are of course: one-half, one-seventh. On the other hand, there is an infinity of natural numbers, and
infinity is a notoriously difficult
thing to handle arithmetically. How would you go about *proving* that the density of the even numbers is
one-half?

"Logarithmic density" is one approach to this issue. (There are others.) It rests on the fact that
if you add up the reciprocals of
all the natural numbers up to *N*, the answer is approximately log(*N*), and the approximation gets
better, without limit, as *N*
gets bigger. This is the natural logarithm here, by the way — it appears on many pocket calculators as
"ln," but I prefer
"log."

For example:

1 + 1/2 + 1/3 + 1/4 + 1/5 + … + 1/1000 = 7.48547

while log(1000) is 6.90776. There is only a 7.72 percent error in the approximation. If instead of a thousand
you sum to a million, then a
billion, then a trillion, the percent errors are 4.01, 2.71, and 2.05. As *N* gets bigger and bigger, the
percent error dwindles to zero.
(Though it does so awfully slowly. The error doesn't drop below one percent until *N* =
6,568,651,717,465,645,315,850,822.)

Mathematicians have a special symbol for this phenomenon, the "twiddle" symbol:

1 + 1/2 + 1/3 + 1/4 + 1/5 + … + 1/N∼ log(N)

Now, pretty obviously, you could just as well write this as:

(1 + 1/2 + 1/3 + 1/4 + 1/5 + … + 1/N) / log(N) ∼ 1

This gives a way to measure how "dense" a sub-set is, among the natural numbers as a whole. What you
do is, take all the reciprocals
of the numbers in your sub-set up to some large number *N*, add 'em up, divide the result by log(*N*),
and see what you get —
more precisely, what your answer twiddles to, as *N* becomes indefinitely large. Your answer can't possibly be
greater than 1, because 1 is
the answer when the "sub-set" is *all* the natural numbers!

If your sub-set was the even numbers, you get an answer that twiddles to one-half. (I mean, for bigger and
bigger *N*, it gets closer
and closer to one-half.) That's what you would expect. For the seventh of those sub-sets I listed above, the answer
is one-seventh, which again
makes sense.

All but one of the other sub-sets I listed above have logarithmic density zero. Yes, even the primes — a fact expressed very elegantly in Hardy and Wright's Theory of Numbers by the sentence: "Almost all numbers are composite." ("Composite" is the opposite of "prime.")

The exception is the last but two in my list, numbers with no square factor. The density of these
"square-free"
(mathematicians actually use the German word *quadratfrei*) numbers is about
0.608 — that is, 6 / π². What on earth is π — the ratio of a
circle's circumference to its
diameter — doing in an arithmetic problem? Don't ask.

The wonderful and amazing result got by Rubinstein & Sarnak is this: Take a Chebyshev bias; for
illustration, I'll take the bias to
remainder 3 when you divide a prime by 4. For each natural number *N* in turn, check to see whether the primes
up to *N* satisfy the
bias. That is, check to see whether or not remainder-3 primes up to *N* outnumber remainder-1 primes up to
*N* (inclusive, in both
cases). If they don't, put *N* into a special sub-set: the "Chebyshev bias violation" sub-set. Now,
this sub-set would seem to be
pretty darn sparse. The first violation, as I mentioned in the original piece, doesn't occur until 26,861. The first
really big block of
violations is from 616,877 to 617,026. Yet the stunning fact — Rubinstein & Sarnak proved it
mathematically — is that this
subset has nonzero logarithmic density. Its density is, in fact, 0.0041. In other words, taking the natural numbers
as a whole, around one number
in 244 is a Chebyshev violator (for this particular bias).

This is the kind of result mathematicians love. It is counter-intuitive — deeply non-obvious. The math you need to get it is fairly heavy, but not that hard to follow if you concentrate on it. Best of all, it is, so far as I know, perfectly useless.

If you want to try your hand at this, here is a sub-set of the natural numbers: All those that have a "3" in their ordinary (that is, decimal) form. This sub-set begins: 3, 13, 23, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 53, …. What is the logarithmic density of this sub-set?