Why aren’t all web pages indexed in search engines?

Why aren’t all web pages indexed in search engines?

Saturday, 6 April, 2019

When people talk about the internet, they’re usually referring to the World Wide Web.

Created in 1991 by Sir Tim Berners-Lee, the www prefix in website addresses indicates content hosted on the surface web.

In other words, it’s intended to appear within page results in search engine indexes.

When you enter a search term into Google or Bing, the algorithms powering these search engine indexes will return relevant results based on webpage analysis.

However, this isn’t the only content hosted on the internet.

Indeed, the phrase ‘surface web’ is rather fitting, since there’s far more material in the internet’s depths than on its surface.

Deep, dark and rather scary

Content not included in search engine indexes falls into two categories:

The deep web. This is online data which isn’t intended to be publicly accessible.

There are numerous reasons why that might be the case:

  1. Product databases. When you’re looking to buy an item, you want to see a clean and tidy webpage featuring a product description, some photos and availability data.

    You don’t want to see the complex databases responsible for confirming stock levels of different sizes and colours.

  2. Developmental websites. If you’ve ever created a website using a platform like WordPress or Wix, you’ll know the satisfaction of hitting the Publish button.

    Until that moment, the website will be cloaked from search engines because it’s incomplete. It’ll stay that way until a user makes it live.

  3. Intranets. Many companies have dedicated web portals, enabling employees to log in and view information, share documents or communicate with each other.

    Intranets host sensitive corporate data like internal reports or private messages between colleagues, which shouldn’t be publicly visible in a Google search.

  4. Financial platforms. Imagine a scenario where confidential online banking information was displayed in third party search results, and anyone could access it.

    When dedicated financial services webpages open up (minus the usual bookmarks and browser bar options), it’s to prevent personal data being publicly accessible.

  5. Archived data. Companies generate huge amounts of information, and it may be confusing or inappropriate if older data is visible.

    Archived material is still hosted online where relevant individuals may view it, but it shouldn’t be published alongside contemporary webpage data.

The dark web. While the deep web provides the underpinnings for searchable internet pages, the dark web is a rather different entity.

Visible only through the privacy-focused Tor browser, which prevents third parties tracking individual user activity, dark web material tends to be illicit or dangerous.

Beyond the reach of regulatory spotlights, the internet’s worst secrets are stored. This is the natural home of extreme pornography websites and drug dealing marketplaces.

Websites are located at web addresses comprising lengthy strings of random alphanumeric characters, ending with a .onion suffix (Tor stands for The Onion Router).

The dark web is a perilous place for the unwary to tread, and payment is generally made using untraceable cryptocurrencies, though much of its advertised content is fraudulent.

Paying two bitcoin to a self-proclaimed assassin is unlikely to result in your nemesis being gunned down in an alley. And you won’t be able to claim a refund, either.

It’s easy to appreciate why Google and Bing feel such material doesn’t deserve mainstream publicity.

Instead of search engine indexes, dark webpage addresses tend to be published on bulletin boards, which are often out of date and consequently inaccurate in their page descriptions.

Unless you know what you’re doing, the dark web is best avoided entirely.

Neil Cumins author picture

By:

Neil is our resident tech expert. He's written guides on loads of broadband head-scratchers and is determined to solve all your technology problems!