Hundreds of millions of people were unable to access Facebook, Instagram, and WhatsApp for more than six hours on Monday, underscoring the world’s reliance on platforms owned by the Silicon Valley giant.
But what actually caused the outage?
What does Facebook say happened?
According to infrastructure vice president Santosh Janardhan, a command issued during maintenance inadvertently caused a shutdown of the backbone that connects all of Facebook’s data centers, everywhere in the world.
“Configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.”Santosh Janardhan, Facebook’s vice president of infrastructure
That by itself is bad enough, but as we’ve already explained, the reason you couldn’t use Facebook is that the DNS and BGP routing information pointing to its servers suddenly disappeared. According to Janardhan, that problem was a secondary issue, as Facebook’s DNS servers noted the loss of connection to the backbone and stopped advertising the BGP routing information that helps every computer on the internet find its servers. The DNS servers were still working, but they were unreachable.
Can you explain that in plain English?
Cyber experts think the problem boils down to something called BGP, or Border Gateway Patrol — the system the internet uses to pick the quickest route to move packets of information around.
Sami Slim of data center company Telehouse compared BGP to “the internet equivalent of air traffic control”.
In the same way that air traffic controllers sometimes make changes to flight schedules, “Facebook did an update of these routes,” Slim said.
But this update contained a crucial error.
It’s not yet clear how or why, but Facebook’s routers essentially sent a message to the internet announcing that the company’s servers no longer existed.
Why did it take so long to fix the problem?
Experts say Facebook’s technical infrastructure is unusually reliant on its own systems — and that proved disastrous on Monday.
After Facebook sent the fateful routing update, its engineers got locked out of the system that would allow them to communicate that the update had, in fact, been an error. So they couldn’t fix the problem.
“Normally it’s good not to put all your eggs in one basket,” said Pierre Bonis of AFNIC, the association that manages domain names in France.
“For security reasons, Facebook has had to very strongly concentrate its infrastructure,” he said.
“That streamlines things on a daily basis — but because everything is in the same place when that place has a problem, nothing works.”
The knock-on effects of the shutdown included some Facebook employees being unable to even enter their buildings because their security badges no longer worked, further slowing the response.
Is this unprecedented?
Social media outages are not uncommon: Instagram alone has experienced more than 80 in the past year in the United States, according to website builder ToolTester.
This week’s Facebook outage was rare in its length and scale, however.
There is also a precedent for BGP meddling being at the root of a social media shutdown.
In 2008, when a Pakistani internet service provider was attempting to block YouTube for domestic users, it inadvertently shut down the global website for several hours.
And the outage’s impact?
Between Facebook, Instagram, WhatsApp, and Facebook Messenger, “billions of users have been impacted by the services being entirely offline”, the Downdetector tracking service said.
Facebook, whose shares fell nearly five percent over the outage, has stressed there is “no evidence that user data was compromised as a result of this downtime”.
But even though it lasted just a few hours, the impact of the shutdown ran deep.
Facebook’s services are crucial for many businesses around the world, and users complained of being cut off from their livelihoods.
Facebook accounts are also commonly used to log in to other websites, which faced additional problems due to the company’s technical meltdown.
Rival instant messaging services meanwhile reported that they had benefited from the fact that WhatsApp and Facebook Messenger were down.
Telegram went from the 56th most downloaded free app in the US to the fifth, according to monitoring firm SensorTower, while Signal tweeted that “millions” of new users had joined.
And among the more curious side-effects, several domain name registration companies listed Facebook.com as available for purchase.
“There was never any reason to believe Facebook.com would actually be sold as a result, but it’s fun to consider how many billions of dollars it could fetch on the open market,” said cyber security expert Brian Krebs.