Will AI Kill DevOps Troubleshooting? Really?
Wild 2 AM calls, they are the battle fields for creative problem-solving.
I have been a part of multiple 2 AM production outage calls.
Not that i love them, but because they are interesting and sometimes funny.
Everyone in those calls suddenly become a world-class detective (take it on a lighter note.) And some people, the crazier ones, even go wild. Wild as if they just spotted a dinosaur in the server room 😅
Server room - Now that reminds me of something.
I spent the first 4 years of my professional career in data centers. I’ve managed servers in environments where even the printers have attitude. That was 15 years ago though.
But coming back to the main point, only good troubleshooting can help you in such production outage scenarios.
Because DevOps troubleshooting is an art.
It’s the science of diagnosing issues.
It’s the SOUL of DevOps.
Now, is Troubleshooting Getting Killed by AI?
Let’s step back and look at the world as it is, and as it’s becoming.
Automation, AIOps, and Self-Healing Systems:
We all know that the landscape is shifting. AI Tools will get better at finding issues, will get better at predicting failures, and even auto-fix stuff before we get involved.
Looks like the dinosaur is ready to show up one day 🦖
It’s easy to imagine a future where human troubleshooting becoming void. A museum piece. A portable radio. But is it really where we are heading?
I think the answer is NO.
And here’s why:
Each time AI automates, let’s say one class of problems, new, more interesting problems emerge. Because the distributed systems, microservices, and cloud-native architectures are complex.
When something goes wrong, it’s not always a matter of 'turn it off and on again.' It’s more about tracing the butterfly effect across all the services, networks, and other dependencies.
You see, the butterfly must first be gently/quickly captured and carefully nourished before we can truly begin to understand its mysteries. It's all about finding the root cause that cascades across services or any other dependencies.
And it’s not that easy for AI.
At least for now.
Why?
Because to do that, AI systems need something very important.
CONTEXT
And so …
AI Systems Need Human Friends. Us
AI can fix some issues, no questions there.
It can point to the “what” part very well, but not always the “why” part and "how" part.
And that’s why, I believe, our - experience, pattern recognition, and domain expertise remain irreplaceable. Especially when the unexpected happens.
One more thing. Troubleshooting is also about communication.
Explaining problems to stakeholders, collaborating with teammates (our friends, the human ones) and learning from incidents, these are human skills, OUR OWN CORE SKILLS. These are not going away.
DevOps is not a glamorous job. But surely, it needs human in the loop. The tools will evolve, but i strongly believe that the need for curious, creative problem-solvers will not disappear.
At least in my eyes, and i hope, in your heart too. Tell me, what do you think?
That's it for today. If you like this post, please share it. More long form content coming next. DevOps is my passion, if you align, please consider subscribing. Thank you.