Digital Minefield

Why The Machines Are Winning

Internet Lie #4

When you use a search engine you’re searching the Internet.

I’m constantly amazed at how many people are unaware of the simple truth. Namely, that when you use a search engine you’re only searching its indexed cache. The search engine does search the Internet (actually the World Wide Web), but not when you ask it to. Rather, it searches continually by crawling the Web with automated spider software (robots or bots). Then, it integrates that data into its indexed cache. These two activities are separate from sifting through that cache to satisfy your search. You can find more details at this site:

Why am I making a big deal out of this distinction? First, because many if not most people think they’re searching the Internet (the Web) in real time. Although data does travel the lines connecting all the Web servers in the world at the speed of light, it still takes time. While it’s difficult to count these servers (many millions), it’s even harder to estimate how long the lines are connecting them. Traversing the Web takes so long that Google only crawls the entire Web once a month—and that takes more than a week! So, no, you’re not searching the whole Web in real time.

But there’s another, more important reason to understand why your Internet searches are limited. To do this you must understand the cache, because this is actually what you’re searching. To begin, it takes time to get pages into the cache. If Google—as big as they are, and they’re really big—only can search the entire Web once a month, then we should not be surprised to discover it can take up to six weeks for a new page to appear in the cache. So clearly, at any given moment, the cache is not the Web.

Moreover, the cache, despite Google’s size (and not only are they really big, they’re not telling anyone how big) is finite. And the World Wide Web is bigger still. Much. Think of this way: The index of a book is not the book, it doesn’t index everything in the book. One more analogy: less than a hundred years ago, astronomers thought our galaxy was all there was to the Universe. Now they know it’s only one of some 100 to 200 billion galaxies. Nor is Google’s indexed cache the Web.


