Global air travel has been one of the most impacted sectors so far. Huge lines formed at airports around the world, with one airport in India using handwritten boarding passes. In the US, Delta, United, and American Airlines grounded all flights at least temporarily, with a dramatic graphic showing air traffic plummeting above the US.
The catastrophic situation reflects the fragility and deep interconnectedness of the internet. Numerous security practitioners told WIRED that they anticipated or even worked with clients to attempt to protect against a scenario where defense software itself caused cascading failures as a result of malicious exploitation or human error, as is the case with CrowdStrike. “This is an incredibly powerful illustration of our global digital vulnerabilities and the fragility of core internet infrastructure,” says Ciaran Martin, a professor at the University of Oxford and the former head of the UK’s National Cyber Security Center.
The ability of one update to trigger such massive disruption still puzzles Raiu. According to Gartner, a market research firm, CrowdStrike accounts for 14 percent of the security software market by revenue, meaning its software is on a wide array of systems. Raiu suggests that the Falcon update must have triggered crashes at cloud providers such as Azure and Amazon Web Services, which vastly multiplied the disaster. “CrowdStrike is big, but it can’t be this big,” Raiu says. “Airports, critical infrastructure, hospitals. It cannot be just CrowdStrike everywhere. I suspect we’re seeing a combination of factors, a cascading effect, a chain reaction.”
Hyppönen, from WithSecure, says his “guess” is that the issues may have happened due to “human error” in the update process. “An engineer at CrowdStrike is having a really bad day,” he says. Hyppönen suggests that CrowdStrike could have shipped software different to what they had been testing or mixed up files, or there could’ve been a combination of different factors. “Software like this has to go through extensive testing,” Hyppönen says. “That’s what we do. That’s what CrowdStrike, of course, does. You have to be really careful about what you ship, which is tough to do because security software is updated very frequently.”
While many of the impacts of the outage are ongoing and still unraveling, the nature of the problem means that individually impacted machines may need to be rebooted manually rather than through an automated process. “It could be some time for some systems that just automatically won’t recover,” CrowdStrike CEO Kurtz told NBC.
The company’s initial “workaround” guidance for dealing with the incident says Windows machines should be booted in a safe mode, a specific file should be deleted, and then rebooted. “The fixes we’ve seen so far mean that you have to physically go to every machine, which will take days, because it’s millions of machines around the world which are having the problem right now,” says Hyppönen from WithSecure.
As system administrators race to contain the fallout, the larger existential question of how to prevent another, similar crisis looms large.
“People may now demand changes in this operating model,” says Jake Williams, vice president of research and development at the cybersecurity consultancy Hunter Strategy. “For better or worse, CrowdStrike has just shown why pushing updates without IT intervention is unsustainable.”
Update 7/19/2024, 11 am ET: Added comment from Microsoft saying that the Azure outage and the CrowdStrike kernel driver issue are unrelated.
Update 7/19/2024, 12:30pm ET: Added further comment from Microsoft about its lack of oversight of CrowdStrike’s updates.