• About Us
  • Contact
  • Careers
  • Send Us a Tip
Menu
  • About Us
  • Contact
  • Careers
  • Send Us a Tip
CyberNews logo
Newsletter
  • Home
  • News
  • Editorial
  • Security
  • Privacy
  • Resources
Menu
  • Home
  • News
  • Editorial
  • Security
  • Privacy
  • Resources
CyberNews logo

Home » Editorial » Chaos engineering in an age of uncertainty

Chaos engineering in an age of uncertainty

by Neil C. Hughes
1 September 2020
in Editorial
0
miniature people Work on Computer Keyboard
18
SHARES
Many businesses are embracing chaos engineering as the proactive approach to identifying problems.

Nine months ago, our timelines were filled with unusually positive articles about preparing for a life of opportunities as we stepped into a new decade. Futurists dared to gaze into their virtual crystals balls to share their technology predictions for the roaring twenties. But nobody (other than Bill Gates) predicted a global pandemic and the chaos that would ensue. 

The more proactive than reactive approach makes it possible to fight fires before they happen and navigate turbulent conditions successfully. Chaos engineering (CE) requires an upgraded mindset to identify failures before they become outages.

The days of shutting down racks and removing a network cable and unwittingly causing an outage somewhere can finally be retired.

But unexpected problems, vulnerabilities, and weaknesses are not always the result of human error. The reality is that every system will be affected by its environment and the random, turbulent conditions that come with it.

What is chaos engineering?

Every business is now challenged with ensuring their online services operate in every time zone—all without downtime to enable users to continue consuming vast quantities of data. The term chaos engineering was made famous by Netflix when they migrated its services from a traditional data center to the cloud. 

The streaming giant was forced to deal with the complexities and reliability of its new servers. 

Rather than waiting to react to issues, chaos engineering is the act of injecting failure in a controlled manner.

Resilience by design quickly became the latest best practice as the focus shifted to building better apps and websites that were prepared for the inevitable unplanned interruptions on the horizon.

Ultimately, if a customer raises a support ticket, you have already failed. Planned experiments that reveal weaknesses in systems, teams, and processes should help you prevent outages or fix them before users are aware.

What are the business benefits of chaos engineering?

If my Netflix stops working regularly or becomes unreliable, I would probably switch to one of the many other reliable streaming services at my disposal. If Ticketmaster experienced an outage during a Billie Eilish ticket release, the promoter would pass those tickets to another ticket agency. When a website or app goes offline at a critical time, your customers will immediately go elsewhere.

The reliability and resiliency of tech are not just about profit margins. It’s also increasingly becoming a matter of life and death. 

For example, would you sit inside a self-driving car and put your life in the hands of poorly written code?

The mitigation of risk, expecting the unexpected, and providing customers with the confidence that they are in safe hands or that system will continue to operate safely when things go wrong are table stakes.

Every business will have its own Achilles heel and reasons why downtime in a digital world costs money. It could bring down the virtual shutters on your business’s front door when you least expect it. But it doesn’t have to be like that. By embracing the chaos, you can learn what might fail, and tweak the design of your system or infrastructure accordingly to ensure you do not suffer an unplanned outage. 

How does it work?

Netflix created Chaos Monkey, which randomly terminates instances in production. The brave move forces engineers to up their game and implements services that are resilient to instance failures. The old way of doing things involved engineers being thrown in at the deep end during an outage when it could have been months or years since they had encountered a problem like it.

By introducing frequent failures, CE incentivizes them to build resilient services and increases their familiarity with the infrastructure. A series of regular digital fire drills could expose vulnerabilities and result in reliable and responsive systems throughout your company. 

When faced with unprecedented demand on its streaming services, Netflix, YouTube, Amazon, and Disney announced that they would be downgrading their video quality, which would lower its overall bandwidth utilization. This more proactive approach prevented downtime and bad publicity that would have inevitably followed.

Embracing the chaos

The concept of running experiments in a live production environment will be enough to bring out a cold sweat in any IT director or CTO. Diving in headfirst would be reckless, which is why taking baby steps in chaos engineering should begin in a different environment that is as close to the production environment as possible. 

Technical teams are in control of the so-called “blast radius.” Simulations should be carefully planned to unlock learning opportunities. As your progress in introducing a new best practice of resilience increases, you will feel much more confident about making changes.

Software, apps, and systems will be continuously tweaked and updated to add additional functionality or fix problems. But what do you break in the process? For these reasons alone, it would be foolish to assume that a system will respond to a fault injection test (FIT) in the same manner several weeks from now. 

Gremlin’s new ‘Status Checks’ capability also offers peace of mind by automatically verifying that systems are healthy and ready for Chaos Engineering experiments. With Black Friday sales on the horizon, many retailers will be looking to make up for lost ground. But how many have embraced the chaos and learned the lessons of last year’s eCommerce site outages?

Chaos engineering should never be seen as the cause of your problems.

It should be seen as a way of revealing them before they result in a costly outage. Behind the unnerving name, CE should be seen as a way of increasing resilience, reducing risk, and delivering valuable lessons about your organization. But the biggest winners should be your customers who will enjoy an improved user experience and remain loyal to your brand.

Share18TweetShareShare

Related Posts

Hacked image on mobile

Here are the biggest digital heists of the last decade

19 January 2021
Hackers leverage sophisticated and novel techniques to break into networks

Hackers leverage sophisticated and novel techniques to break into networks

18 January 2021
Health tracking on mobile

Is it healthy to track your fitness and wellbeing?

18 January 2021
NSFW: tech support workers share their oddest job experiences

NSFW: tech support workers share their oddest job experiences

15 January 2021
Next Post
Man and a woman in front of a computer

Information security officer Monica: there’s never a dull moment in my work

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the Terms & Conditions and Privacy Policy.

Popular News

  • 70TB of Parler users’ messages, videos, and posts leaked by security researchers

    70TB of Parler users’ messages, videos, and posts leaked by security researchers

    82912 shares
    Share 82901 Tweet 0
  • ProtonMail review: have we found the most secure email provider in 2021?

    61 shares
    Share 61 Tweet 0
  • Bitwarden Review

    0 shares
    Share 0 Tweet 0
  • The ultimate guide to safe and anonymous online payment methods in 2021

    13 shares
    Share 13 Tweet 0
  • Custom mechanical keyboards – 17 coolest ones we’ve ever seen

    442 shares
    Share 441 Tweet 0
Facebook says some users facing issues with Messenger, Instagram

Factbox: How Facebook, Twitter, and others are girding for inauguration threats

20 January 2021
Uploading on mobile screen and Data Protection on desktop screen

Privacy and data protection trends in 2021

20 January 2021
valve logo

EU hits game distributor Valve, five others with 7.8 million euro fine

20 January 2021
google logo

Trump pardons former Google self-driving car engineer Levandowski

20 January 2021
Malwarebytes hacked by state actors behind SolarWinds attack

Malwarebytes hacked by state actors behind SolarWinds attack

20 January 2021
Edvardas Šileris

Head of Europol’s European Cybercrime Centre: there are no systems that cannot be breached

20 January 2021
Newsletter

Subscribe for security tips and CyberNews updates.

Email address is required. Provided email address is not valid. You have been successfully subscribed to our newsletter!
Categories
  • News
  • Editorial
  • Security
  • Privacy
  • Resources
  • VPNs
  • Password Managers
  • Secure Email Providers
  • Antivirus Software Reviews
Tools
  • Personal data leak checker
  • Strong password generator
About Us

We aim to provide you with the latest tech news, product reviews, and analysis that should guide you through the ever-expanding land of technology.

Careers

We are hiring.

  • About Us
  • Contact
  • Send Us a Tip
  • Privacy Policy
  • Terms & Conditions
  • Vulnerability Disclosure

© 2021 CyberNews

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.

Home

News

Editorial

Security

Privacy

Resources

  • In the News
  • Contact
  • Careers
  • Send Us a Tip

© 2020 CyberNews – Latest tech news, product reviews, and analyses.

Subscribe for Security Tips and CyberNews Updates
Email address is required. Provided email address is not valid. You have been successfully subscribed to our newsletter!