How to ensure business resilience
Four lessons from the CrowdStrike outageNo organisation is immune to an IT outage. This is especially true given the recent global outage caused by a CrowdStrike software update that left organisations grappling with significant operational disruptions, and many entering crisis mode. As a technology community, there is a lot to be learned from this incident.
1. Reliable end-user support services are critical
Understandably, the outage caught many organisations off guard. This was largely due to inadequate resources available to support the sharp increase in IT support requests. As businesses increasingly rely on technology to drive day-to-day operations, providing effective end-user support is paramount. For example, the ability to troubleshoot technical issues to bring devices back online in the wake of an outage is integral to minimising downtime.
But this isn’t just about laptops – the real danger lies in critical systems that are hosted on servers going down.
This is an important reminder to assess your end user support and ensure it covers a range of technical issues, from software troubleshooting to hardware malfunctions and network connectivity problems. Your team may also consider undertaking training to help users troubleshoot common issues independently. A robust server management system will also be integral to maintaining business continuity in the event of a crisis.
Furthermore, the outage is a prompt to assess your vendor relationships. Are fail-safes built into the supply chain, and are there rollback procedures in place?
Organisations must ensure their backup strategies account for vendor failures. This includes having redundancies in place for critical services and understanding how interconnected systems may affect recovery. Hybrid cloud-based backup solutions may be a prudent move, as the flexibility in data recovery is unparalleled.
2. No more single point of failure – look for a single point of communication
It’s not always practical to hold all vendor relationships in-house given the diversified technology needs of organisations. Managed Service Providers (MSPs) offer a way of aggregating and managing relationships on your behalf, providing guarantees of support during a crisis.
MSPs, leveraging their strong vendor partnerships and with a focus to get your business back online, can be the difference between a slow recovery that costs millions, and a rapid response that returns you to BAU before your customers notice anything amiss.
A good MSP will be vendor-agnostic, helping you diversify your risk without carrying the burden of relationship-holding. It will help you evaluate vendor risk, negotiate stronger contracts, and reduce the impact an outage, like this one on your organisation.
It goes without saying that effective communication is the cornerstone of successful incident management. MSPs leverage sophisticated solutions to monitor, identify, and communicate with impacted parties, helping ensure swift and coordinated responses.
3. Boards need to be more aware of the importance of incident response and recovery
Good governance starts with good preparation. In an article from Forbes, Sherri Davidoff, CEO of LMG Security, said that “most organisations were surprised that their incident response procedures failed today because their plans did not fully include all vendors or other security tools prevented them from automating the restoration process.”
For boards, this means that there is a need for a deeper understanding of not only in-house IT, but also vendor relationships and the dependencies therein. While Boards are not necessarily part of the vendor selection process for IT, it’s important for them to ensure all third-party suppliers are assessed for risk.
Board members may also benefit from being educated on cyber security threats and governance frameworks to make informed decisions, and they should ensure that there are clear policies and procedures for handling cyber security incidents and that these policies are regularly reviewed and updated. This means regular briefings on cyber security posture, incident response readiness, and vendor landscape changes.
Most importantly, businesses need to be prepared to treat incidents like this the same way they treat fires and floods – regular simulated drills need to be routine for disaster recovery, not just a once-off occurrence.
4. Shore up defences and prepare for recovery
Organisations need to strengthen their incident response and recovery strategies. This involves more than just technical solutions; it requires a holistic approach that includes people, processes, and technology.
The outage has revealed that many organisations were unprepared for such a widespread disruption. Analysing the incident response processes of companies affected by the outage, as well as those that were not, can provide valuable insights.
Cyber teams should actively seek to identify weaknesses in incident response plans and update them to include all vendors and security tools. Sometimes, this means having ‘pen and paper’ backup solutions ready to go in the interim before recovery is completed.
It isn’t something that can be set and forgotten – cyber security is a constant cadence of activity. Interactive’s CISO, Fred Thiele, stressed the importance of digital hygiene when it comes to cyber resilience: “Cyber hygiene isn’t a project to be completed or an item to tick off your list, it needs to be woven into the fabric of your organisation, so it becomes second nature for everyone”.
Looking ahead
As organisations navigate the aftermath of the recent outage, it’s crucial to implement robust strategies that enhance resilience and reduce the risk of future incidents. While no business is immune to an outage, there are ways to bolster preparation and response.
For many businesses, disaster recovery has been a distant possibility rather than an operational necessity – but now is the best time to change that. Interactive Anywhere is about meeting businesses where they are on their IT journey. Talk to our team today, to find out how Interactive Anywhere helps businesses enhance resilience, protects valuable assets, and minimises downtime, so that businesses are always prepared to face and overcome any challenge.