If you’re working in a hands-on data role using Snowflake, Databricks, or Bigquery, chances are you’ve encountered dbt as a companion technology. 🎉 On April 3rd, 2023, dbt Labs announced that Tropos.io became one of the 5 premier partners worldwide.
Where former technologies primarily focus on storing and processing data, dbt (lowercase intended) excels at defining and documenting the exact transformations. It’s also great at managing changes in those metrics and derivations
How did this happen?
Even though dbt is ubiquitous in data land nowadays, it’s a relatively new technology. Formerly traditional ETL tools — technology designed to transport and transform data — reigned supreme. When we became a Snowflake partner in 2017, our first all-by-ourself Snowflake project was orchestrated using one of these too. Since none of these technologies is natively built for the cloud, strange things started to happen and we couldn’t wrap our heads around combining Snowflake’s performance with a stubborn model of graphical process design.
In search of anything better, we found a GitHub repository of an obscure, open-source template-driven data transformation tool maintained by a company of 15, and reachable through a wildly enthusiastic Slack community.
We, even smaller back then and ambitious to master Snowflake started tinkering around, and one day in 2018 — that “aha” moment came and we recognized dbt could become the formal structure of any of the projects we wanted to do back then. Luckily, we still have a picture of that meeting.
Ever since, we spend a good part of our engineering time building tools and processes to scale data platforms across teams, regions, and sometimes continents.
We ❤️ dbt because it gives us control
dbt is a technology that allows an analyst or engineer to express their thoughts using code. SQL code, mainly. An accessible way to define declarative analytical thinking without too much reliance on the overhead that comes with software engineering processes, a practice that was wildly popular throughout the 2010s. Even though it came with strong merits such as change management and structured testing, it was an expensive activity reserved for highly specialized engineering staff.
Yet, the power of SQL is limited to expressing a single analysis for a single engineer. Scaling it across teams — let alone departments — comes with governance challenges. Executing it and managing scale often requires custom software engineering.
dbt fills this gap. It gives the analyst or engineer access to the advantages of software engineering whilst keeping the learning curve relatively flat.
It is both surprisingly simple as well as ingenious engineering. In a climate of shortage of skilled data staff, dbt allows more people to participate in the analytics process.
🔎 Upscaling data teams for large organizations
Even though dbt can be learned with relative ease, scaling it up across organizations can be a challenge. The freedom analysts and engineers experience can quickly lead to an opaque spaghetti of SQL and overturn the benefits in pure technical debt in no time.
Customers ask us these days to either prevent this from happening or to get their practice up to par again. Apart from having a solid analytics engineering (the term coined by dbt labs to define the activity of writing dbt code) team, much of our work revolves around designing processes and practices to control the propagation of data analytics on Snowflake throughout organizations.
Think about organizing change management, good practices, release management, team design, and more specific data transformations. But also think about ecosystem design and integration. Since dbt is developed as cleartext code, it allows third-party technologies to build more governance processes on top of the integration processes managed by dbt itself.
Some of those technologies are built in-house at Tropos, such as:
- dbt governor, our engine to start new projects from scratch based on best practices and enforces control
- Rehearsal, our framework to test assumptions within data quality frameworks
- dbt mask, another framework to manage RBAC (row-based access controls) in Snowflake
Our future with dbt
Since we’re building good practices and learning from not-so-good practices since 2018, we find ourselves scaling up internationally in larger organizations. Those who face exponential growth in data use cases and need to control technical debt, Snowflake cost, and team efficiency all at the same time. dbt is at the core of about every single project we are working on at the moment, and this isn’t changing anytime soon!