TD · Labs

Notes from the workshop.

Experimental releases, engineering notes, and showcases of tools we’ve built or things we’ve learned shipping governed data systems for mid-market data-driven teams. Not marketing — the actual work.

Series · 6 parts

The Pillars of AI

A plain-spoken field guide to using AI without being fooled by it — from what the model in front of you is actually doing, up to systems you can defend.

Start the series →

Pillars of AI · Part 4 June 16, 2026 10 min read

Use AI as a tool, not a master.

We've covered what the model is (prediction), that it invents (hallucination), and where your data goes (deployment) — risks that live inside the machine. This pillar is about the risk you add: doing what the AI says without checking. Blind acceptance is the single most expensive habit in applied AI, and the fix isn't a better model. It's a human in the right place at the handoff.

Read post →

Pillars of AI · Part 6 June 7, 2026 10 min read

AI you can't account for is a liability.

Five pillars cataloged how AI goes wrong — it predicts, invents, leaks, gets blindly trusted, and costs more than the sticker. This final pillar is about the aftermath: when a decision turns out wrong, can you reconstruct what the system did, explain why, and prove who was responsible? That's accountability, and it rests on three properties — reproducibility, explainability, traceability. It's what makes the other five survivable, and increasingly it's the law.

Read post →

Pillars of AI · Part 5 June 7, 2026 15 min read

The API fee is the most visible part of an AI system.

The per-token fee is the number on the pricing page — the easiest cost to see, and almost never the largest line on the real invoice. This pillar opens the whole bill: what a token actually costs once you count the reasoning tokens you never see, why renting through an API or buying your own GPUs is the decision that moves the total most, and the hidden line items that turn a cheap demo into an expensive system. Budget for the whole system, not the call. (Prices are stamped with the month — a deliberate time capsule.)

Read post →

Pillars of AI · Part 3 June 6, 2026 13 min read

Is my data safe?

Pillar 2 told you to ground a model in your own verified data. But to do that, you have to give the model your data — and that opens a different risk. When people ask whether an AI is safe, the part actually in your control is simpler than the model: where does your data go, who can see it, and is it allowed to be there? That's not a question about the model. It's a deployment decision — cloud, local, or a deliberate hybrid — and it's the one place where a choice made up front closes the risk almost entirely.

Read post →

Pillars of AI · Part 2 June 6, 2026 15 min read

Managing Hallucinations

In the first pillar we landed on a hard fact: a language model predicts the next word, and it optimizes for plausible, not true. A hallucination is what happens when those two come apart — and it isn't the machine breaking. It's the machine doing exactly what it does, in the same confident voice it uses for everything else. You don't patch that out. You contain it. This pillar is about how.

Read post →

Pillars of AI · Part 1 June 5, 2026 16 min read

Most of AI isn't generative — and generative AI isn't thinking.

Since text-generating AI went mainstream, an entire industry started using one word — AI — to mean one product. That collapse is expensive. It hides the half-century of techniques quietly running everything from fraud detection to global logistics, and it dresses up a next-word predictor as something that reasons. The first pillar of AI literacy is learning to see the field at its real size, and knowing what the model in front of you is actually doing when it answers.

Read post →

Playbook · Data validation May 21, 2026 28 min read

Data validation in the lakehouse

Most data pipelines fail silently when a source schema drifts. dbt tests run AFTER the model — they catch the broken state, they do not prevent it from being written. We wire Great Expectations as the OSS validation engine on every engagement, with a clear-eyed view of where it shines, where it doesn't, what we are NOT doing after GX Cloud's May-2026 shutdown announcement, and which alternatives (Soda, Pandera, dbt-native tests, Elementary) we layer alongside it. Includes the current GX 1.x Fluent-API code, the integration patterns that actually work in production, the real performance bottlenecks (with citations), the competitive landscape (GX vs Soda vs Pandera vs Anomalo vs Monte Carlo vs Bigeye), and the anti-patterns we audit in client engagements.

Read post →

Playbook · Reference engagement May 21, 2026 29 min read

From a fresh AWS account to dashboards and AI chat — a TwiceData engagement walked end-to-end.

The canonical TwiceData engagement: customer starts with a freshly provisioned AWS account and the vendor's Postgres database. Twelve weeks later they have a Iceberg lakehouse on S3, dbt-modeled metrics, a governed semantic layer, Looker dashboards their team designs, AND an AI chat surface that answers natural-language questions on the same data. This post walks through every layer of the build — the choices, the tradeoffs, the seams between layers, and the day-91 handoff where you keep the keys.

Read post →

Deep dive · Dimensional modeling May 21, 2026 37 min read

Slowly Changing Dimensions — a diagnostic walkthrough of all eight types, the hybrids, and when to build your own.

Most data teams default to SCD Type 2 because it's the only pattern they remember from Kimball. There are eight types, three modern variants, and three hybrid systems — and the right one for your pipeline is determined by signals in your incoming data, not by tradition. This article walks the diagnostic loop end-to-end: identify the data pattern, identify the query need, pick the type (or compose a hybrid), implement in dbt + Iceberg. Every type gets its own worked example.

Read post →

Playbook May 20, 2026 13 min read

Programmatic prompts with DSPy — when vibes-based engineering runs out of road.

Most prompt engineering is informed guessing. DSPy treats prompts like programs: define a signature, build a dataset, pick an optimizer, ship the compiled artifact. Here is the loop we run, the optimizer trade-offs we actually face, and the signal-detection trick (emotion markers, escalation phrases) that quietly does most of the work.

Read post →

Manifesto May 19, 2026 2 min read

Introducing TD Labs.

Where we ship the experiments, half-baked tools, and engineering notes that don't fit on the product pages.

Read post →

Engineering May 14, 2026 2 min read

A governed ARR rollup in 47 lines of dbt.

The exact dbt model we drop into mid-market SaaS engagements — the four contract types it normalizes, the three tests that gate it, and the lineage hook that keeps it honest.

Read post →

Want us to build something like this for you?

If a post here describes work you wish existed in your own stack, that’s usually a sign for a conversation.

Talk to solutions See engagements