Earlier this year, Facebook was down for 14 hours because of an issue with the servers used to store and distribute people’s account data and timelines.
If that had happened to an ecommerce platform, the outage would probably have gone largely unnoticed, bar some angry tweets and a subsequent mea culpa from the firm’s PR agency.
Instead, Facebook’s unexpected disappearance made global headlines.
It was embarrassing for a company of Facebook’s size and importance to go down, especially as Facebook-owned Instagram and WhatsApp were also affected.
But the outrage over the outage highlighted how susceptible internet platforms are to unexpected failures and software conflicts.
It’s only a few months since 30 million O2 users were unable to make and receive calls or use 4G services.
As with the Facebook incident, a software fault was blamed.
What causes outages?
There are numerous reasons why services could go down, including rare scenarios like natural disasters.
However, these are the events generally leading to offline websites and content outages:
Server issues. These huge storage repositories of internet data power most online tools and portals, from games to business tools.
The apps installed on your devices and consoles are little more than windows for displaying data accessed in real time from the cloud.
If servers crash or become damaged (possibly by malware), the content they host will temporarily disappear.
Servers are often targeted by mass spamming raids known as Distributed Denial of Service. A DDoS attack is intended to overwhelm the servers with requests, forcing them offline.
Like any IT equipment, servers occasionally fail or stop working. Traffic should be absorbed by neighbouring servers, but this process isn’t always instantaneous or seamless.
Software conflicts. The O2 outage was triggered by expired software licences managed by a subcontractor – Swedish firm Ericsson.
While Ericsson hasn’t fully explained this incident, it probably resulted from someone forgetting to renew an annual licence.
However, human error isn’t always to blame for software issues.
Newly-installed programs or applications may clash with existing software in ways nobody could have foreseen.
Australia’s child support IT system stopped working last year when new network optimisation software conflicted with the 16-year old IT system it was supposed to improve.
Network traffic. The internet is surprisingly fragile, yet issues like offline websites or inaccessible servers are uncommon under normal traffic loads.
These loads may spike because of a DDoS attack, but they could also rise in response to major events or news stories.
Twitter has occasionally crashed during periods of intense activity, such as after terrorist attacks or during major political events.
The final episode of Blue Planet II slowed the whole internet across China, because so many people wanted to watch Sir David Attenborough critique mankind’s use of plastics.
There isn’t enough spare bandwidth to comfortably absorb significant increases in online traffic.
Web browsers are programmed to display messages about offline websites after a number of seconds has elapsed, even if the site’s servers are just working more slowly than usual.
Legal events. A company in administration or the throes of a takeover might temporarily suspend its online presence.
Unpaid hosting fees could result in sites or services going down, and accidental domain name expiry isn’t unheard of.
The police or security agencies might even force a brand or business offline, or seize a domain name while they investigate potential criminal activities.