In distributed IT systems, not only errors in application functionality but also in infrastructure ones are inevitable. By moving to cloud services, businesses are no longer in full control of their infrastructure and must therefore adapt their applications to “unforeseen” problems.
At DevOps Velvon, we increase IT resilience through chaos engineering. We verify the behavior of the system in problematic situations, which usually occur at the production environment (undelivered messages, network outages, long latencies). We had to redefine availability metrics and adjust the perception of “normal” system behavior.
We will present how we approached chaos engineering. We will also show the application of chaos engineering in practice.