PyConMY 2025

PyConMY 2025

Building Polaris: A Data Discovery and Metadata Management Agent
2025-11-01 , Hall 1

In a world drowning in data, finding the right dataset is a persistent challenge. Our data team solved this by building Polaris, a custom agent designed to automate data discovery and metadata management. In this talk, we'll walk you through how we built this powerful tool, from the automations involved to our practical use of LLMs. Join us to learn how a homegrown system can solve a real-world problem and the valuable lessons we learned along the way.


In a world drowning in data, finding the right dataset at the right time is a persistent challenge. Our data team faced this head-on, spending countless hours manually cataloging and searching for data. The solution was Polaris, a custom-built agent designed to automate data discovery and metadata management.
In this talk, we will walk you through the journey of building Polaris. We'll explore its architecture, which leverages a unified metadata index with a Vector DB for semantic search and a Graph DB for join discovery.
We'll delve into the practical applications of a Large Language Model (LLM) Agent to automate key tasks like generating table and column descriptions, and we'll share how this intelligence transforms the user experience.
This is a story about solving a real-world business problem with a homegrown, agent-based system, and the valuable lessons we learned about building robust, data-centric tools.