Text-to-SQL on Databricks Lakebase — Live Build for Data Engineering Leaders
WhenWed, May 27, 2026 · 12:00 p.m. EDT – 1:30 p.m. EDT
WhereOnline via Microsoft Teams
Capacity100 seats
For Data Engineering leaders deciding whether to adopt Genie, build a custom text-to-SQL stack, or wire something in between.
90 minutes. Live build, not slides. Real workspace, real data, a real LLM call across an HTTP boundary you control.
What you'll see
I'll go from an empty Databricks workspace to a working text-to-SQL agent that:
- Joins live OLTP rows in Lakebase (managed Postgres) with pre-aggregated gold tables in Unity Catalog Delta — through Lakehouse Federation, in a single query.
- Generates SQL via a pluggable LLM endpoint — Databricks Model Serving, OpenAI-compatible APIs, or a self-hosted vLLM on a neo-cloud GPU — switched with one environment variable.
- Validates every SQL string before execution with a SELECT-only safety guardrail that catches the Databricks-specific destructive ops generic validators miss (
OPTIMIZE,VACUUM,ZORDER,COPY). - Is auditable end-to-end: one question = one LLM call, one SQL statement, one execution. No autonomous loops, no surprise bills.
What you'll leave with
- A decision framework for db-agent vs Genie vs Agent Bricks for your specific use case — including when not to build.
- The companion open-source db-agent repo (presented at AAAI-25, ships a Databricks Apps deployment variant) and a quick-lab repo with a step-by-step build.
- A reference architecture diagram and the actual code — pipeline orchestrator is ~60 lines of Python, safety validator is ~30.
- Specific gotchas that cost real time the first time: federation database options, Lakebase token rotation, Streamlit/Apps reverse-proxy traps, context-window blowouts on real catalogs.
Who it's for
- Heads of Data, Data Engineering Managers, Staff and Principal Data Engineers
- Teams already on Databricks (or evaluating) who are being asked: "Can we put an AI agent on top of this?"
- Anyone making a build-vs-buy call between Genie, Agent Bricks, and a custom text-to-SQL stack — and wants to make it with their eyes open
This is a technical session. We'll read code. Bring your senior engineers.
Agenda
- 0:00 — The architecture in one slide
- 0:05 — Lakebase + Unity Catalog + Lakehouse Federation: why both data planes, and what breaks
- 0:20 — The agent pipeline: schema → prompt → LLM → validate → execute
- 0:40 — The SQL safety guardrail: what generic SELECT-only validators miss on Databricks
- 0:50 — The pluggable LLM layer: live swap from a hosted API to a self-hosted vLLM on a neo-cloud GPU
- 1:05 — db-agent vs Genie vs Agent Bricks: when to use which, and why
- 1:15 — Q&A
Why this exists
Three things hit production data teams at the same time in 2026:
- Lakebase is GA. Databricks now ships managed Postgres as a first-class compute resource alongside Delta — OLTP and OLAP in one workspace, joinable through Lakehouse Federation.
- Text-to-SQL crossed the "good enough" line. Modern instruction-tuned models generate correct, schema-aware SQL on real catalogs if you feed them the schema cleanly and validate what comes back.
- GPU economics broke open. Self-hosted inference on neo-clouds is now 3–10× cheaper per token than first-party model serving for steady-state traffic. The bottleneck isn't capability — it's wiring.
This webinar shows the wiring.
Format
- Live on Microsoft Teams — interactive, questions encouraged
- 90 minutes including Q&A
- Recording sent to registrants within 48 hours
- Discord channel for follow-up questions
About your speaker
Chandan Kumar — founder of beCloudReady (Databricks Registered Partner) and organizer of TorontoAI, a 10K+ member community of AI builders. Maintainer of the open-source db-agent text-to-SQL agent, presented at the AAAI-25 workshop on AI agents. Twenty-plus years in software engineering, cloud architecture, and data engineering across enterprises in Canada, the US, and the UK.
Pre-read (optional)
The full architecture write-up is the post this webinar walks through live. Bring questions.
Want this against your data?
If you'd rather have this wired up against your actual Databricks workspace and your actual data, book a 30-minute architecture call — a small number of these run each month.
Speakers
Chandan Kumar
Founder, beCloudReady (Databricks Registered Partner) · Organizer, TorontoAI
20+ years in software, cloud architecture, and data engineering. Maintainer of the open-source db-agent project, presented at the AAAI-25 workshop on AI agents. Has trained and placed 500+ engineers across Canada and the US.
LinkedIn ↗