I wrote a short script that searched for each number between
11 and 1000 on the search engine Alta Vista. (It was not
possible to search for numbers 1-10 for some reason.)
For each number I recorded the number of hits Alta Vista
reported and the result included a few surprises.
I got the following diagram.
It was perhaps not so surprising that lower numbers
are more popular than higher numbers and that even fives,
tens, twenty fives and hundreds were vastly overrepresented.
What I had not expected, however, was that at each
even hundred, the rate of occurence jumped up and then
fell within the hundred, only to jump up again at the
next. This effect can be explained by
Benford's law
as pointed out to me by Golan Levin.
What also suprised me at first was the
distinct difference between the groups of numbers 11-31,
32-60 and 61-94, something which can better be seen on
a zoom up of numbers 11-100.
I now believe this is explained by date
and time strings where days are in the range 1-31 and
seconds/minutes are within 0-59. (Credit to Mats Wicksell
for this suggestion.)
Then there is the issue of individual numbers that are
overrepresented. As was seen above, numbers
divisible by 5 are in general overrepresented.
Therefore I filtered out all those,
to make it easier to identify other overrepresented numbers.
What I got was the following
diagram, where I have annotated some of
the numbers that stand out.
Hence, the spikes in the diagram correspond to the
overrepresented numbers.
Some of those I can understand. The
even powers of two, 64, 128, 256, 512 turn
up in most computer related situations.
The number
404 is the code for the
ubiquitous error message "Page not found"
and 877, 888 are area codes for toll free
numbers in the US (which also explains why 800
is the most common even hundred after 100).
Some have commercial roots, like the CPUs
386 and 486 and Levi's 501 jeans. Windows 95/98
probably contributes to the large number of
hits in the range 95-99 (even more than for 100),
but mostly I think those numbers turn up in dates;
the 95-99 peak should hence reflect the age and growth
of the Internet. (A better view of this is in the second figure above.)
Some numbers look funny, like 333 and 999, and are maybe common
because of that. (But why not 444 and other similar?)
Others are totally puzzling to me. Why are for instance
152, 163, 301, 541, 624, 672, 703 and 972 overrepresented?
If anyone has an idea, I would be happy to know.
I had a hunch that some pop culture numbers
like 187 (California police code for homicide),
242 (Front 242, and the UN resolution) and 666 (number
of the beast) would be common, but this turned out
to be wrong.
Zoom ups of this last filtered diagram are here:
Numbers 11-100 | Numbers 101-200 | Numbers 201-300 | Numbers 301-400 | Numbers 401-500 | ||||||||||||||||||||
Numbers 501-600 | Numbers 601-700 | Numbers 701-800 | Numbers 801-900 | Numbers 901-1000 |
Finally, the raw data is available here.
The experiments above were carried out in September 2000. I was recently made aware of a very similar but more ambitious website, The Secret Lives of Numbers, by Golan Levin et. al. launched in 2002, based on data collected as early as 1997.
No more links. You've reached the end of Webworld.
Don't fall over the edge.