Digital Minefield

Why The Machines Are Winning

Internet Lie #3

The Internet Is Getting Bigger

Last week (Internet Lie #2), I said the Internet was getting bigger (and that bigger is not necessarily better). Now I’m going to challenge that: Is the Internet Really Getting Bigger?

Let’s look at the numbers. Counting pages (Web, Blog, Facebook, Twitter, etc.), page visits, and visitors, obviously yields bigger numbers—minute by minute, day by day, hour by hour, and so on. But what do those numbers mean? Does it mean that content on the Internet is growing? Does it mean people are finding what they want? Or need?

Let’s try to answer the second question (content) first. For example, is adding a thousand new pages a minute—pages that say essentially the same thing—really growth of content? When I say “essentially the same thing,” I’m being kind: for the most part these pages have merely cut and pasted each other. So before we can answer this second question, we need to rephrase it: Is unique content on the Internet growing?

Before you jump to a quick, “Of course” (having in mind all those unique people on Facebook), you need to consider all the unique pages that have either a) disappeared from the Internet or b) can no longer be found. But that’s not all. There’s also the Deep Web:

“The Deep Web (also called Deepnet, the invisible Web, DarkNet, Undernet, or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.” —Wikipedia

But that’s not all. When we use a search engine we get the pages currently indexed. Yet, even after subtracting for the Deep Web, that’s not nearly all the pages on the Internet. What’s currently indexed is not everything for a number of reasons, e.g., pages disappear. But the least discussed category of missing pages (existing pages not retrieved by the search engine) are those pages once indexed but of insufficient relevance—to the search engine—to remain in its cache of indexed pages.

Sufficient relevance? Google’s PageRank is but one way to determine this. But anyway you slice it, some older pages have far less relevance (e.g., actual traffic) than many new pages. As a result, they are cleared from the cache to make room for these newer, more relevant (at the moment) pages. Just one little problem with this approach: all these new pages—pages about current news and events—are very likely to be less unique than these old pages.

Given the parameters of the problem, can anyone truly say the unique content of the Internet is getting bigger? Me, I wouldn’t even try to guess. As for the question of whether people are finding what they want or need, that’s a topic for another lie.


Single Post Navigation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: