Decision making strategies for Agile Architecture
Ideas to make technical decisions when designing architecture in an iterative way
This article was published originally on the blog for Conference “Agile for Architecture” that is being organized this year on 5th and 6th October. Details are here.
A significant indicator of high performing technology teams building software is how they make technical decisions on a regular basis, especially the pace of their decision making process where they have built habits that are widely accessible to all team members helping them to go slow when they need to, and be fast when required. This is the essence of what I am going to share in this essay, and something that deeply resonates with me building software for large and small organisations for around two decades. I see a high performing team when I see that they have established these habits that are intuitively understood and applicable by everyone in the team and are building high quality products that customers love and talk to others about.
Thinking about Architecture of your software system as a set of technical decisions that are expensive or hard to change is one of the foundations of software development theory. Often teams face challenges when they have to identify the next technical decision that will be part of their Architecture vocabulary. The approach to build “Evolutionary Architecture” is well understood practice with a system of characteristics, structures and associated fitness functions that help evolve the overall architecture as the teams develop the next set of changes to their system.
I have covered a few approaches that will help you make wise choices when making your technical decisions to define your architecture vocabulary. Ultimately, architecture of your system is a set of technical decisions that are widely understood in your team(s), and deciding the pace of making those decisions is useful for any team venturing in the unknown of doing “architecture”.
Define the “Blast-Radius” of your decisions
A technical decision that will affect more than one team(s) or parts of your product is likely going to have a larger blast-radius if the decision turns out to be sub-optimal. The “blast-radius” term refers to the negative impact if such a decision were to be ineffective, likely causing increase in technical debt, rigid implementation that need to be undone later or unlikely to solve the problem that the decision was trying to solve in the first place requiring reimplementation. Often such technical decisions manifest in the form of deciding upon a programming language or framework that a team intends to use to solve a problem. Choice of coupling and quality of abstractions are also likely a high blast-radius decision and a sub-optimal decision with those would have a comparatively high impact to software teams. The practice to use here is be aware when a decision is likely going to have high blast-radius, and thus create measures to go slow by testing alternatives as part of your decision before cementing on something that you want to adopt a decided approach to a technical problem. You can start applying this by sharing “questions” with your software teams that they can ask to decide the blast-radius. The following are few examples of such questions:
If the decision is sub-optimal, will it affect end customers negatively ?
If the decision is wrong, will it negatively affect the leading KPIs that are used to measure effectiveness of your product or service which the technical system is supporting ?
If the decision is wrong, will it affect the productivity of the teams building or operating the system in a way that leads to high toil and slower development time ?
Measure and monitor the cost of making your decisions reversible
If you want to optimise the speed of your decision making in a software system, one thing that helps is to identify the cost of “reversing” a particular decision and making that cost visible to everyone including non-technical stakeholders. One thing to note here is that the cost of reversing a decision increases over time even for a decision that may not have a high “blast-radius”. For example, a decision to choose between a cloud provider that offers Infrastructure as a Service offering may lead to a decided provider as an output of the decision making process. Here, the cost of removing the dependency on the selected provider to move to another one is a good input to consider as part of the decision making process. If deciding amongst options, it helps to identify what would cost to reverse the selection of an option across different time scales. Knowing the cost of reversing the option selected in such a decision also provides a view of how fast such decisions are required to be made - a faster decision when the cost of reversing is low versus a slower decision when the cost is non deterministic or high. Low or high cost is relative to the context of a software team and their risk appetite as dictated by their business and problem domain.
Consider the question : What will help to fasten the process of making a technical decision ?
Often teams get into analysis paralysis when deciding between different options to choose when making changes to their software system. Irrespective of the blast-radius or cost of the reversibility of a technical decision, an element to consider while making the decision is to identify what is missing to account for that could have made the decision making process faster. The usual culprits for what is missing are : lack of understanding of the problem domain, team lacking experience in a given context necessary for solving the problem, missing stakeholders or participants in the decision making process etc. A prominent culprit that software teams forget to account for is “missing constraint”. Constraints are effective in helping to decide amongst options, and provide a way to move faster on a decision making process, also in a software context. Constraints could come in the form of “budget” that needs to be respected when making decisions, eg: time to go to market (fixed deadlines), performance benchmark that needs to be respected (loading time, rendering time, response time etc.), maximum cost that can be incurred with the development and operation of the system (cost limit, quotas etc). The question to ask software teams is if they have incorporated relevant constraints as part of the technical decision making framework in teams to help them move fast. The missing constraints and other culprits shared before needs to be accounted for to decide if a decision making process can be made faster for the next decision that the team makes.
Measure of your architecture is how cheap your decisions become over time
The goal of architecture of a system that a software team develops and operates is to reduce the cost of future technical decisions over time. Cost is incurred in a technical decision making process via many reasons : inclusion of more people in the decision process, unawareness of blast-radius and cost of reversibility and missing constraints, as indicated from the previous sections. To build better architectures is to identify steps that a software team is taking to reduce time when making decisions in future when the team grows bigger in size, complexity of the business has increased, and technical landscape becomes vastly different from the current moment. An audit of all the technical decisions that a software team is making or needs to make at the current moment is a good first step to check if the current architecture has improved the speed of making such decisions compared to the scenario in the past when a similar decision had to be made. I have seen numerous times that when teams are asked to check if decision making speed has improved for similar decisions, they are unaware of that. Often it feels that as we grow a software system and evolve its architecture, the speed of making a technical decision does not improve but somehow remains the same or gets worse due to changing business and technical landscape. To mitigate this, teams must identify more fundamental technical decisions that need to be incorporated in the vocabulary of their architecture that will perhaps reduce the time to decide on decisions in future. One approach to take is to consider “standardisation” of moving pieces in a system to reduce “snowflake implementations” and ensure high reuse across teams. By building reliable and reusable underlying primitives in a software system, one reduces the chances of making the similar decision again in future, and thereby enables lower cost of future decisions built upon it.
The “Architecture” of a large or small system is again a compilation of technical decisions, each working with the other, enabling all users including the builders and operators, to do the right thing, without the need for a conscious recall of those decisions at any time. Software teams can embrace the real nature of architecting without falling prey to analysis paralysis when it comes down to building systems by optimising the way they make technical decisions and thereby the speed of decision itself. They do that by being aware of the blast-radius of a decision, knowing the cost of reversing the decision outcomes, incorporating constraints that will help make the decision process faster and setting up the north star of your architecture to increase the speed of making future decisions.