The faulty software update provided by CrowdStrike Falcon crashed 8.5 million Windows systems worldwide and sent many critical services into disarray. This update slipped through the cracks due to errors in the cybersecurity vendor’s content validation software. In the preliminary Post Incident Review (PIR) the company admits it over-relied on its past successes and promises improvements.
The disastrous update was a piece of what CrowdStrike calls “rapid response content,” which is supposed to bring the latest configurations to respond to the changing threat landscape at operational speed. The updates are regular and doesn’t require an update of the whole program.
On July 19th, 2024, the update, which was supposed “to gather telemetry on possible novel threat techniques,” passed validation despite containing problematic content data, the company said in the PIR.
The faulty update was neither code nor kernel driver but was a small 40 KB configuration data file.
Crowdstrike produced the update based on previous testing of its template after it was introduced in February. The company placed “trust in the checks performed in the Content Validator,” and previous successful deployments of similar instances.
Basically, the company assumed that if the template tests passed and the previous updates didn’t cause any trouble, the July 19th update would be the same.
However, this time, when Crowdstrike’s Falcon sensor (a software component designed to monitor and analyze data, that runs at the kernel level) received the update and loaded it into the Content Interpreter on Windows machines, the system crashed – affecting 8.5 million systems worldwide.
“Problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD),” the explanation reads.
How does CrowdStrike plan to prevent similar accidents?
The company promises several additional measures to prevent incidents from ever happening again. Crowdstrike plans to enhance the rapid response content testing, including methods such as local developer testing, stress testing, fault injection, stability testing, etc.
The company is also adding additional validation checks and enhancements in error handling to its software to guard against problematic content.
Customers will have greater control over when updates are delivered and applied, and will get detailed information about the updates. Also, the rapid response updates will be delivered gradually in the future, starting from a small controlled group, which will then be made available to larger portions of the client base.
CrowdStrike is also committed to publicly releasing the full root cause analysis once the investigation is complete.
US lawmakers have called on CrowdStrike CEO George Kurtz to testify on Capitol Hill and explain in detail the events leading up to last Friday’s global tech outage.
Your email address will not be published. Required fields are markedmarked