CrowdStrike Overhauls Testing and Rollout Procedures to Avoid System Crashes

CrowdStrike Overhauls Testing and Rollout Procedures to Avoid System Crashes

September 24, 2024 at 04:12PM

CrowdStrike has overhauled its testing and update processes to avoid a recurrence of the disruptive July outage on Windows systems. Vice President Adam Meyers outlined new protocols, such as controlled software rollouts, improved code validation, and expanded testing to cover various scenarios. The changes aim to prevent similar system failures in the future.

From the meeting notes, the key takeaways are:

1. CrowdStrike has revamped its testing, validation, and update rollout processes to prevent a repeat of the embarrassing July outage that caused widespread disruption on Windows systems around the world.

2. The new set of protocols includes carefully controlled rollouts of software updates, better validation of code inputs, and new testing procedures to cover a broader array of problematic scenarios.

3. CrowdStrike is gradually releasing threat detection configuration information across increasing rings of deployment, allowing them to monitor for issues in a controlled environment and proactively roll back changes if problems are detected before affecting a wider population.

4. CrowdStrike has introduced new validation checks to ensure that the number of inputs expected by the sensor and its predefined rules match the same number of threat detection configurations provided, in order to prevent similar mismatches from occurring in the future.

5. The company has enhanced existing testing procedures to cover a broader array of scenarios, including testing all input fields under various conditions to detect potential flaws before threat detection configuration information is sent to the sensor.

6. CrowdStrike has made tweaks to provide customers with additional controls over the deployment of configuration updates to their systems.

7. Additional runtime checks have been added to the system to ensure that the data provided matches the system’s expectations before any processing occurs, reducing the likelihood of future code mismatches causing catastrophic system failures.

These takeaways provide a clear understanding of the improvements and actions taken by CrowdStrike to address the issues and prevent similar incidents in the future.

Full Article