Skip to main content
CrowdStrike

Reflections on the CrowdStrike Outage

Providing a Comprehensive Overview of the Outage

by Thomas Whang, September 7th, 2024

In a recent incident, the world was shaken by a major disruption caused by CrowdStrike, a prominent cybersecurity company specializing in endpoint security. This incident had a significant impact, resulting in widespread outages for multiple companies globally. The ripple effects of this event were felt across various industries, emphasizing the interconnected nature of today’s business ecosystems.

What Caused the Outage

The CrowdStrike outage was caused by a faulty configuration file update on their Falcon sensor platform that affected most Windows hosts running Falcon during a three-hour window starting on July 19, 2024, at 04:09 UTC. Based on CrowdStrike’s published technical details, the channel file update unintentionally caused a “logic error”, resulting in Windows systems going into the dreaded Blue Screen of Death (BSOD) reboot loop.

The situation was made even worse by several logistical and technical obstacles. This caused some organization’s path to recovery take much longer than expected. In addition, those organizations that were not customers of CrowdStrike was also impacted as their partners and service providers who depended on CrowdStrike for endpoint security were affected. In today’s interconnected business landscape, the impact of a single security solution’s failure can have far-reaching consequences. It can create a domino effect, affecting not only those that are directly affected, but also the supply chain of providers and suppliers to your organization impacting various operations beyond what was initially expected.

The Aftermath: Overcoming Challenges and Finding Solutions

Organizations that experienced the outage encountered multiple difficulties as they worked towards recovery. Significant obstacles arose due to technical issues, especially when encryption was involved… BitLocker encryption and key management being the most prominent. Several companies encountered difficulties with BitLocker key management and key synchronization, which posed challenges in promptly restoring their systems. With the change in the way we work, aka Working from Home, IT teams struggled to resolve these issues as many of them could not be resolved through traditional remote management tools. Many had to send out bootable USB storage devices, ship new laptops, and some even had to walk the user through resolving the issue by having the user type in the 48-digit number read to them via a phone.

In addition to the technical hurdles, the outage revealed deficiencies in many organization’s business continuity plans, disaster recovery plans, and incident response plans. Companies that did not have well-defined or thoroughly tested plans discovered that they were not ready to handle the crisis, resulting in extended periods of downtime and financial losses.

Reflections and Lessons to be Learned

The recent CrowdStrike outage is a powerful reminder of just how crucial it is to have strong Business Continuity Plans (BCPs), Disaster Recovery Plans (DRPs), and Incident Response Plans (IRPs). This serves as a reminder for organizations to reevaluate and enhance their preparedness strategies. Here are some important points to remember:

Regular Tabletop Exercises

Regularly conducting tabletop exercises is crucial to ensure that all stakeholders have a clear understanding of their roles and responsibilities in times of crisis. These exercises are great for pinpointing any areas that may need improvement in the plans and enhancing collaboration among teams.

Establishing Clear Roles and Responsibilities

Having well-defined roles and responsibilities is essential during a crisis. It is crucial to have clear instructions and contact information readily available to minimize response times and enhance overall efficiency.

Routine Drills

Developing a regular practice of disaster recovery and incident response activities can help teams develop a reflexive response, allowing them to react quickly and efficiently in the face of a real crisis. Consistent practice guarantees that response procedures become instinctive for everyone involved.

Review and Update Plans

It is essential to regularly review and update business continuity, disaster recovery, and incident response plans. These plans should be treated as dynamic documents that require ongoing attention and adjustments. It is crucial to regularly review and update these plans to ensure they remain effective in the face of evolving business environments, technological advancements, and emerging threats.

The CrowdStrike incident has underscored the interconnectedness of modern business operations and the critical role of cybersecurity in maintaining operational continuity. By learning from this event and strengthening preparedness strategies, organizations can enhance their resilience and better navigate future challenges.

We can do the hard work for you.

Talk to an Expert