© 2022 CyberNews - Latest tech news,
product reviews, and analyses.

If you purchase via links on our site, we may receive affiliate commissions.

Firms should have response plans for third-party provider failures – interview


Some companies are probably reviewing the costs and benefits of using Cloudflare after its recent outage, but for most the tradeoffs are still positive, an expert told Cybernews.

Cloudflare is a global network designed to secure everything you connect to the internet – websites, internet applications, application programming interfaces (APIs), and protecting corporate networks, employees, and devices.

According to its latest presentation to investors, Cloudflare has over 154,000 customers and 10,500 networks directly connected to it, including internet service and cloud providers, and large enterprises.

No wonder even a short outage can’t go unnoticed.

Cloudflare’s outage on June 21 was caused by the company’s error and was not a result of a cyberattack. It affected traffic in 19 of its data centers, accounting for a significant proportion of its global traffic. That resulted in an hour-and-a-half long outage for some businesses. And downtime almost always costs money.

“This outage was caused by a change that was part of a long-running project to increase resilience in our busiest locations,” the company said in a post-mortem outage analysis. “A change to the network configuration in those locations caused an outage which started at 06:27 UTC. At 06:58 UTC, the first data center was brought back online, and by 07:42 UTC, all data centers were online and working correctly.”

This is not the first time that Cloudflare’s outage caused inconvenience to its customers. In July 2020, a configuration error in its backbone network caused an outage for internet properties and Cloudflare services that lasted 27 minutes.

In April 2020, the Cloudflare Dashboard and API were rendered unavailable by the disconnection of redundant fiber connections from one of its two core data centers.

Given that downtime can result in revenue loss for some businesses, such as crypto exchanges, I talked to Ben Schmidt, the CSO of blockchain-based cybersecurity company PolySwarm, about what lessons the recent Cloudflare outage could teach us.

This is not the first time that Cloudflare has experienced such an outage. In 2020, it happened twice. Why is it that so many services rely only on Cloudflare, and no mechanism gets triggered to stay online when Cloudflare is down?

Companies rely on Cloudflare because constructing a reliable global CDN (content delivery network) is a challenging task that most smaller companies are better off outsourcing. Cloudflare makes serving content faster and generally more reliable for those companies.

Because running a CDN is complex, sometimes cascading or unexpected failures in platform updates cause downtime. When downtime occurs, companies often have no alternative, as they are dependent on Cloudflare for critical network services like DNS DNS (domain name system.)

What measures could Cloudflare, as well as companies relying on it, implement to prevent similar problems in the future?

Cloudflare’s post-mortem presents some reasonable next steps, but it mostly comes down to improving processes and systems for updates. In such large systems, it’s sometimes difficult to predict the outcome of an update, so more testing and staggered rollouts of changes are critical to preventing these types of issues.

Because of how Cloudflare works, it’s difficult for customers to plan for this type of outage. Most use Cloudflare to serve DNS for their domain, but because this service also had issues, the only functional alternative during downtime is switching DNS providers. These changes take time to propagate, however, meaning some downtime is inevitable without special preparation.

Some services were down for almost an hour and a half. Is that a significant amount of time for various businesses?

Most businesses won’t suffer greatly from a 30-minute outage, but downtime almost always costs money. I’m sure some companies are reviewing the costs and benefits of using Cloudflare after the downtime, but for most the tradeoffs are still positive.

What industries are most vulnerable to similar outages?

Crypto exchanges, and financial organizations in general, probably have the most to lose. Any industry or organization where downtime directly translates into lost money. Constructing a platform that can tolerate a major provider outage is quite difficult, but some of these organizations may be considering now how to handle these failures.

It’s important for companies to understand the risks as well as the rewards of using third-party providers for critical services. While they provide a lot of obvious benefits to their users, CDNs often become a single point of failure for organizations and may not be recognized as such.

If the service is being used in a critical path, companies should have response plans for third-party provider failures and know what steps need to be taken in the event of a prolonged outage.


More from Cybernews:

Period-tracker data trading raises human rights fears

Remote exam proctoring software has major issues

Harmony, a $100m heist victim, offers a six-digit bounty for the return of funds

Why hackers destroying one Starlink satellite could cause orbital Armageddon

Ukrainian cyber experts who stayed behind to work and fight

Subscribe to our newsletter



Leave a Reply

Your email address will not be published. Required fields are marked