Lessons from the CrowdStrike outage
Like everyone, we were shocked at the chaos caused by the faulty software update from CrowdStrike last Friday morning. 8.5 million Windows devices crippled! That’s unprecedented. Despite CrowdStrike’s efforts to mitigate the damage, devices remained offline for hours. WM Promus wasn’t touched although we had clients who, looking at the blue screen of death, immediately called us. ‘What do we do?’ was the common plea.
A stark reminder
The CrowdStrike outage incident was a stark reminder that adhering to some best practices is what delivers a stable, secure, and efficient deployment process as well as a reliable functioning IT environment. So – let’s look at what lessons could be learned and what best practices should be embraced.
- Thorough testing in multiple environments is essential for ensuring robust software deployment. Whether it is stage environment testing or sandbox testing, plenty local developer testing and rollback testing, you’re looking for 100% confidence that the code behaves as expected and that if anything goes wrong, you can get back on track quickly.
- Adopting a slow roll deployment approach would have been a game changer last week. If an incident does occur, at least it’s contained. It’s all about risk mitigation. Start with a canary deployment, then move to a small subset of the user base before staging what remains of the full rollout.
- What about the push towards CI/CD models as per the DevOps world? Should we rethink this? Last Friday does not mean CI/CD needs a rethink – it’s just further confirmation that the quality needs to be embedded within the CI/CD model. If you’ve embedded automated testing – and in a DevOps world, we would advocate automation where possible – make sure it is robust.
- Monitor, monitor, monitor! Make sure you can see what is happening throughout your various deployment stages. Risk mitigation 101!
Let’s hope we don’t see anything like this again. You can safeguard yourself from a lot of chaos by implementing some best practices as mentioned here. If you want to discuss this further or need help implementing these lessons, contact us. We’re always happy to help.
Eileen O’Mahony
General Manager, WM Promus