How DevOps and CI/CD contribute to Antifragility
I started exploring DevOps back in 2013. After my upstream journey from programming via designing and requirements engineering to strategy development, I started my downstream journey to examine how strategy can be implemented more quickly and what obstacles need to be overcome in order to do so. I came across literally all the waste that is covered in DevOps literature. And I saw the mess left by Frederick Taylor’s ‘Principles of Scientific Management’. There are people working in silos, there’s a lack of creativity and inflexible structures. There’s much work to be done.
To quote W. Edwards Deming, ‘survival is not mandatory’. If coping with a VUCA world (volatility, uncertainty, complexity, and ambiguity) is wanted, however, organizations need antifragility.
What Antifragility is and why it matters
Antifragility is the exact opposite of fragility. Until 2012, when Nassim Nicholas Taleb wrote his book ‘Antifragile’, we did not have a word for the exact opposite. We would call it robust or resilient. Let me explain these four concepts.
- Fragile systems and organizations break down quickly. Think of older Windows versions, for instance. A lot of organizations are fragile or become fragile after their first years of innovation and growth. Blockbuster became fragile after failing to respond to Netflix mail-order service. And we all know what happened to Kodak.
- Banks and their systems are a good example of robustness. They continue to function for a long time until the stress becomes too high, which is what happened in 2008. Robust systems and organizations suppress randomness and volatility. As a result, they become more fragile.
- Aircraft are resilient. An Airbus 380 is flying software. It will not get better during a flight, but if one hydraulic system fails, there is another. If a vacuum system fails, there is an electrical alternative. It keeps functioning. But all the resilience is designed upfront. That’s why these systems don’t like unexpected events.
- The best metaphor for antifragility is Hydra, the multi-headed monster from Greek and Roman mythology. It is also in the logo of antifragility.works. When Hercules chopped off one head, he discovered that it was immediately replaced by two new heads. With every attack, it became stronger. In IT we have another vicious beast: the Chaos Monkey. Built by Netflix, it was designed to randomly attack a system between 9 AM and 3 PM so engineers can fix it during office hours. After some time, all malfunctions have been dealt with and as a result, the system became stronger. This was such a success that Netflix had to build a Chaos Gorilla as they had become immune to the Monkey.
As you can see, robust and resilient systems and organizations don’t like unexpected events. Antifragile systems and organizations love them.
To assess the antifragility of an organization (or a system) you should measure the three following aspects: nonlinearity, optionality, and transferring fragility. This theory can be translated into a series of examples of best practices:
- Nonlinearity translates to CI/CD (continuous integration and continuous delivery), termination of software projects, and chaos engineering, which goes far beyond just the Chaos Monkey previously mentioned.
- Optionality translates to Minimum Viable Products, A/B-testing, and Canary releases.
- Transferring fragility translates to decentralizing decision-making, reducing organizational debt, and… DevOps!
Finally, there is a connection. If one party has the downside and another party has the upside, fragility is being transferred from one party to another. When developers build crappy software, the Ops department is facing the consequences. When the Ops department is blocking changes, the developers are facing the consequences. The reverse is skin in the game. That’s when a person has something to lose in a given situation.
How Antifragility Relates to DASA’s DevOps Principles & Competence Model
Without DevOps, Dev has no skin in the Ops game and Ops has no skin in the Dev game. If you put them together in one team they all have skin in each other’s game, which is to say, a mutual sense of responsibility, trust, and understanding.
All the DevOps principles have a connection with antifragility:
- DevOps Principle #1: Customer-Centric Action. Calculated risk-taking is encouraged. This helps to inoculate your system and organization with a little risk at a time and to rapidly respond to changing or emerging needs. It helps to build antifragility by responding to the outside world.
- DevOps Principle #2: Create with the End in Mind. DevOps teams need to adopt product and service thinking. They have to become mini-companies. They take their own decentralized decisions based on early feedback. That builds antifragility. Politically, it can be argued that city-states potentially are more successful than nation-states.
- DevOps Principle #3: End-to-end Responsibility. Vertically oriented teams and decentralized decision making puts an end to the transfer of fragility. It encourages skin in the game. And that builds antifragility.
- DevOps Principle #4: Cross-Functional Autonomous Teams. Those teams share the same goal and have overlapping skills and knowledge. Like the previous principle, this also encourages skin in the game and builds antifragility.
- DevOps Principle #5: Continuous Improvement. Here, the relationship is more complex. It is dangerous to experiment if you are fragile. Your systems and organization are fragile when the possible downside of an experiment will be larger than the possible upside. You first have to navigate very carefully away from technical debt and organizational debt by structured problem-solving, minimizing waste, and optimization. Many people are not even aware of organizational debt, which consists of silos, hand-offs, politics, fear, ego, centralized decision-making, etc.
- DevOps Principle #6: Automate everything you can. This is about optionality. Take each and every automation option available. It will lead to Continuous Delivery as Code. With a completely virtual pipeline. Visible only in the form of parameters. It reduces configuration drift, technical debt, and legacy. It is also about nonlinearity, as it doesn’t matter how frequently you push software to production once a working CI/CD-pipeline is created.
Regarding the competence model, there is obviously a clear relationship with the skill areas. There are two competence areas in particular that are worthy of being considered in this setting:
- Architecture & Design. And especially microservices as they are the foundation of antifragile systems. Autonomous, separately deployed services that share nothing produce different risk characteristics than a single large unit. The re-implementation of a microservice can be done fast, which supports experimentation. And the reduction of big decisions into smaller, frequent decisions is a key factor in adaptability, and therefore potential antifragility.
- Security, Risk & Compliance. Nassim Taleb, the author of ‘Antifragile’ once said “You should study risk-taking, not risk management’. Risk management is in his opinion, ‘risk theatre’. Most systems and organizations are designed to deal with known risks only and IT departments invest heavily in avoiding failure. What if you designed systems that expect all parts of the system to fail? Forced random failure would validate resilience. Again, this is where the Chaos Monkey comes in.
Both DevOps and Continuous Deployment are antifragile. A lot of practices that DASA mentions are also antifragile, like AB-testing, microservices, and focus on MTTR (mean time to repair). On a future opportunity, I’d like to expand on the importance of DASA to consider all the fundamental antifragility concepts that should not be neglected.