Saving a sinking ship
08-24, 13:45–14:25 (Asia/Kuala_Lumpur), JC 1

After joining a new company, Ivan was assigned to work with a Python task server on a legacy project, the project provides raw data from various sources to multiple upstream, things were broken and unstable, how did it end up?

Technical debts, lack of visibility, out-of-memory issues, spaghetti code, inconsistency in logging and error handling, etc. are what Ivan was facing. Listen to him how a team of data DevOps and Data Engineers dealt with these issues.


Ivan worked on a project dealing with Celery with multiple upstream channels, many things were unstable, at the same time he and the team encountered issues due to technical debt, unstable upstream dealing with lots of firefighting, some of the issues they faced:

  • multiple steps to deploy to production / manual deployment
  • no visibility on tasks run
  • out of memory issues
  • dealing with data larger than memory
  • spaghetti code due to append-only only copy-pasting
  • inconsistency in logging and error handling
  • dealing with high traffic udp output

In a team of data devops and data engineers, how did they go and deal with these issues? Saving the sinking ship!

Ivan likes to volunteer in communities, like Rust Malaysia and PyCon MY. He used to be a helix maintainer.