From all this years where I worked as a developer to now as an advisor on critical matters, there is one thing I have realized:
a lot of things can go wrong at any time
Like recently, it happened that one of a top-athlete posted our store link in his Instagram story and boom – the site goes down – reportedly for hours.
It was disappointing – not just for the business for owner but also for the engineers. The realization that athlete is paid a heavy fees for the simple point that they will bring new crowd and ultimately more customers for the business. When they are actually doing it, the website is not holding up and we are unable to convert the customers.
That’s just one scenario.
Here is another one, you deployed new features, just another week – all went smooth. You wake up next day and you are seeing several reports from customer that they no longer have the option to skip there subscriptions for this week and hence got charged for something they did not want OR they end up cancelling the subscription altogether. A loss in both the scenarios – user experience/reliability and loss of subscribers.
Alright, one more!
Out of nowhere we were told that marketing ads are not working as they were expected to work. Sales are down, we are spending too much amount on the ads budget and the business is taking a hit.
I can go on and on with the list.
But I wonder whether there is a pattern in all of these events?
Are there well-known engineering processes that can help avoid these situation – well if not avoid give early signs to us the engineering team so that we can proactively work on mitigating the issues before the fire has spread across uncontrollably.
I can also reverse engineer – pick one scenario, go to the root cause and start setting the process around it. Keep doing it for all the scenarios I have came across but I am afraid that:
- the process will be too much tied to these scenarios and if a slightly different scenario pops chances of failure is high
- these scenarios are just a few one of the 1000 cases that can happen and then I will have to wait for those to come in first
I may be overthinking also, so it’s like a dilemma – whether I should do this as this is the only way or whether I should research and find out if there is a better.