PyConMY 2025

PyConMY 2025

Python for Chaos Engineering: Observability During Failure Scenarios
2025-11-02 , Hall 1

Chaos Engineering is a discipline that tests system resilience by injecting controlled failures into distributed systems. Python, with its rich ecosystem of libraries, plays a pivotal role in automating chaos experiments and enhancing observability during failure scenarios. This presentation explores how Python can be used to simulate failures in cloud-native applications, monitor system behavior, and collect critical observability data such as logs, metrics, and traces. By integrating Python with tools like OpenTelemetry, Prometheus, and Grafana, developers can gain insights into system weaknesses and improve reliability. The session highlights practical examples of Python scripts for chaos experiments and observability in modern infrastructures.


Introduction (5 min)

What is Chaos Engineering?
Importance of observability during failures.
Role of Python in automating chaos experiments.
Chaos Engineering Basics (10 min)

Principles of Chaos Engineering.
Common failure scenarios in distributed systems.
Overview of Chaos Engineering tools and Python integration.
Observability Overview (10 min)

Three pillars: logs, metrics, and traces.
Tools for observability: OpenTelemetry, Prometheus, Grafana.
Why observability is critical during chaos experiments.
Python for Chaos Engineering (15 min)

Simulating failures with Python (chaoslib, Kubernetes integration).
Automating chaos experiments with Python scripts.
Python for Observability (15 min)

Collecting logs, metrics, and traces with Python.
Visualizing failure impact using Prometheus and Grafana APIs.
Tracing microservices with OpenTelemetry.
Case Study (10 min)

Real-world example: Injecting failures in Kubernetes.
Monitoring system behavior with Python.
Analyzing results to identify weaknesses.
Best Practices & Q&A (10 min)

Best practices for combining Chaos Engineering and observability.
Lessons learned and common pitfalls.

Saurabh is a Cloud Architect with a deep passion for DevOps and automation. Saurabh actively engages with the tech community, sharing insights on cloud-native technologies, security, and multi-cloud strategies. As a speaker at various conferences, meetups, and workshops, Saurabh helps teams enhance their cloud adoption and optimization efforts.