TD · Labs

Notes from the workshop.

Experimental releases, engineering notes, and showcases of tools we’ve built or things we’ve learned shipping governed data systems for mid-market data-driven teams. Not marketing — the actual work.

Playbook · Data validation 1 min read

Data validation in the lakehouse

Most data pipelines fail silently when a source schema drifts. dbt tests run AFTER the model — they catch the broken state, they do not prevent it from being written. We wire Great Expectations as the OSS validation engine on every engagement, with a clear-eyed view of where it shines, where it doesn't, what we are NOT doing after GX Cloud's May-2026 shutdown announcement, and which alternatives (Soda, Pandera, dbt-native tests, Elementary) we layer alongside it. Includes the current GX 1.x Fluent-API code, the integration patterns that actually work in production, the real performance bottlenecks (with citations), the competitive landscape (GX vs Soda vs Pandera vs Anomalo vs Monte Carlo vs Bigeye), and the anti-patterns we audit in client engagements.

Read post →
Playbook · Reference engagement 1 min read

From a fresh AWS account to dashboards and AI chat — a TwiceData engagement walked end-to-end.

The canonical TwiceData engagement: customer starts with a freshly provisioned AWS account and the vendor's Postgres database. Twelve weeks later they have a Iceberg lakehouse on S3, dbt-modeled metrics, a governed semantic layer, Looker dashboards their team designs, AND an AI chat surface that answers natural-language questions on the same data. This post walks through every layer of the build — the choices, the tradeoffs, the seams between layers, and the day-91 handoff where you keep the keys.

Read post →
Deep dive · Dimensional modeling 1 min read

Slowly Changing Dimensions — a diagnostic walkthrough of all eight types, the hybrids, and when to build your own.

Most data teams default to SCD Type 2 because it's the only pattern they remember from Kimball. There are eight types, three modern variants, and three hybrid systems — and the right one for your pipeline is determined by signals in your incoming data, not by tradition. This article walks the diagnostic loop end-to-end: identify the data pattern, identify the query need, pick the type (or compose a hybrid), implement in dbt + Iceberg. Every type gets its own worked example.

Read post →
Playbook 1 min read

Programmatic prompts with DSPy — when vibes-based engineering runs out of road.

Most prompt engineering is informed guessing. DSPy treats prompts like programs: define a signature, build a dataset, pick an optimizer, ship the compiled artifact. Here is the loop we run, the optimizer trade-offs we actually face, and the signal-detection trick (emotion markers, escalation phrases) that quietly does most of the work.

Read post →
Manifesto 1 min read

Introducing TD Labs.

Where we ship the experiments, half-baked tools, and engineering notes that don't fit on the product pages.

Read post →
Engineering 1 min read

A governed ARR rollup in 47 lines of dbt.

The exact dbt model we drop into mid-market SaaS engagements — the four contract types it normalizes, the three tests that gate it, and the lineage hook that keeps it honest.

Read post →

Want us to build something like this for you?

If a post here describes work you wish existed in your own stack, that’s usually a sign for a conversation.