merge-conflict:~/managing-risk$

Managing Risk

The further I’ve gone my career in software development the more I’ve recognised that a large part of our job is entirely about managing risk. There are two distinct strategies for dealing with these risks. One camp prefers to be cautious and avoid/restrict change as much as possible while the other camp expects change and uses practices and tools to allow those changes to occur as easily as possible.

Both strategies acknowledge the risks but deal with them differently. The first fear the risks and minimises them by minimising the changes. I was in a software safety certification meeting last week with an accreditation body and the recommendations they provide come from a document written in 1990. Every minor change incurred a huge waterfall-like process with reports and handovers between each stage - coding did not start until several layers of approval.

The modern ‘agile’ approach appreciates the risk that exists, but instead of avoiding it with bureaucracy actively comes up with strategies to minimise these risks. The best way to come up with these new strategies is to take this piece of unorthodox advice: when something is painful, do it more often - this will force you to make it better.

I’ll go through some cases in the rest of this post that I’ve come across in my career although, I will firmly be taking the ‘agile’ approach to solving these problems.

Developer Mistakes

Developers make mistakes like any human does. I’ve been in some environments that don’t seem to recognise this or prefer to label these mistakes as incompetence or a lack of skill. It’s extremely easy to look at a mistake and then conclude that the developer using/changing the code is entirely at fault and moving on. A better approach is to look a little deeper - is the code difficult to work with? Can we make the code easier to work with?

As a system grows adding new features or modifying existing features can become more and more difficult because the number of things we have to understand grows. But, there are strategies to deal with this.

An example system I once worked on suffered from this an awful lot. The developers were used to copy/pasting code all over the system rather than build re-usable pieces of code. This system had to notify users via email when certain events occurred, there were lots of these possible events. Generating an email involved creating the email record and inserting it into a database table (where the raw SQL was duplicated a many number of times unfortunately) but also there had to be some logic that checked to make sure you weren’t notifying the user too many times. This extra logic to check for duplications ended up being the problem piece of code as each author had done it slightly differently.

Once the system was deployed to production it became apparent that 1000s of emails were being spammed out for these notifications because this duplication logic had been done incorrectly. It turned out the problem occurred in many different parts of the system.

One of the criticisms was that this hadn’t been tested well enough, which is correct. But the answer certainly wasn’t to ’test more’ - don’t accept that. The problem was how the code was designed, it required each and every developer in that team to solve the problem of sending an email and check for duplications themselves. In addition to their own feature it also required them to test this logic which inevitably got missed or was assumed to work as it was a copy/paste from elsewhere.

The beauty of software is that we don’t have to build everything from scratch. We can build our own abstractions to hide how something works. If done well, it allows us to ignore how it works almost entirely.

In this system it would have been hugely beneficial for there to have been an interface that made the job of sending an email so easy that you barely had to think about it. The more something is used within your system, the easier it should be to use and reuse.

The argument I’ve heard in response to building abstractions is that ’the wrong abstraction is worse than no abstraction’ - I’m not sure I entirely agree with that. If I’ve got the wrong abstraction which has been written cleanly with a set of well-written tests then I can make it become the right abstraction. Building abstractions is a skill, you will get it wrong - use strategies that let you get it wrong with minimal cost.

Coupling with Libraries/Frameworks

Sticking with the email theme another project I viewed was using a specific email API library to send emails. This library ended up being a dependency of a huge part of the application as emails were necessary across the application.

What could be the problem with this?

It can be incredibly tempting to use a library across your app, especially if it’s nice to use - but you shouldn’t. Libraries can be abandoned, you might require a new library with a completely different API? How do you unit test with these libraries? Would you mock a piece of code you didn’t own?

While in the short term using one of these libraries doesn’t or may not ever cause considerable cost, they can become maintenance nightmares. They might stop you upgrading to the latest SDK you wish to use, the package may introduce API changes that affect the API/behaviour considerably. This risk isn’t worth it in my opinion compared to the slight cost of making sure a specific library isn’t coupled to large parts of your codebase.

We can get around this problem by thinking less of the immediate problem ‘I need to send an email to notify the user’. Let’s think about the ‘what’ rather than the ‘how’ - the ‘what’ is a notification. The email in this case is the ‘how’, the interfaces we design should entirely focus on what and not how.

We might design an interface like

inteface INotifier {
	Notify(NotifyModel model);
}

By only coupling the rest of our system to this interface, the library we’re using only needs to be a dependency in the single module that implements INotifier allowing us to change it and modify it at any time without needing to touch any other module in the system.

A lot of people like to call these ‘wrappers’, but they’re not simply wrappers that copy the exact API of a library you’re using. These interfaces should be entirely designed around the domain which may lead you towards an API completely different from the one provided by the library. These libraries often have to support many different use cases so their API’s can be complicated, by narrowing the API to your domain you often make the code far easier to use too.

Changing Requirements

While requirements that change too frequently can become very disruptive to our productivity as developers, we should acknowledge that we can’t eradicate them entirely. In fact, we should try and welcome them.

The risk of requirements that change frequently can mean too much rework, the risk of making requirements changes difficult is that people tend to be reluctant to give any kind of feedback at the fear of the evil words ‘change request’ or simply because they know they won’t be able to change anything anyway. This risk can accumulate into you building a system the client doesn’t actually want.

I remember a client once asking if they could change the text in one of the email templates, the response was that they might need a change request to do that which I found rather funny. Bureaucracy has a magical way of turning a 2 minute job into a 2 hour job.

Another point is that on any software project I’ve worked on features are rarely specified upfront completely and correctly. Typically, there will be changes whether we like it or not.

So, how do we write code that allows us to be open and take on new feedback and potential changes in requirements? This is where we need to think about topics like coupling and cohesion while using practices like TDD. We can never guarantee that we can change any part of the software system easily, but we can increase the probability that any future change is easier to make.

An important point worth mentioning is that you don’t design a flexible system by throwing in design patterns or trying to anticipate the changes upfront. You will likely end up making the system very easy to change in regards to one aspect, but very difficult in another. Flexibility such as config and design patterns come with a cost, they’re not free so it’s important that we only add this kind of flexibility when it’s needed. By practicing TDD we can increase the confidence that when we do refactor a system to add flexibility that the code still works.

I can’t give a ‘magic rule’ in how to do this. I admittedly still get it wrong sometimes. When you do get it wrong, try and understand why the code is difficult to change - is it an architectural design? Is it coupling? Is it a set of tests that are tightly coupled to the implementation? Then, see what small thing you can do to at least make the situation better.

Releasing our Software

Releasing software used to be a ‘big event’ that happened infrequently. I remember the days of a ‘code freeze’ where developers weren’t allowed to commit to the mainline except in extreme circumstances with approval from the QA team to minimise the amount of changes, and thus minimise the risk of things breaking. This was when we had a team of manual QA testers who couldn’t possibly test quick enough to verify every change in time before a release.

The problem with these infrequent software releases is that they tend to be very big releases containing lots of changes. This means more testing, and the bigger potential that things might go wrong in production. When databases get involved it can make rollbacks very difficult.

Let’s go back to that piece of advice I mentioned earlier: when something is painful, do it more often.

It’s now becoming much more common to release frequently, even as much as every hour. By releasing frequently you minimise the amount of changes that could occur and you get feedback from production much sooner than you otherwise would. By keeping the changes small you can focus testing and the chances of disrupting a whole application are much less.

Some teams are so confident in this that every commit is immediately deployed to production - this is called continuous deployment.

Tackling Risk

I’ve gone over lots of different strategies for dealing with risk in different parts of the software development lifecycle without compromising our ability to deliver. Traditionally, the industry has gone the opposite way and added more and more layers of process and bureaucracy to slow development to a crawl and make it incredibly expensive to make changes.

It’s not always about stopping mistakes happening entirely, but minimising the impact/cost of any mistakes that do happen. Bureaucracy looks wonderfully attractive and some systems it may be required to some extent, but it’s all too easy for departments to become drowned in processes and rules that end up crippling their ability to compete - a pretty big risk I’d say.