Cloud Data Platform
January 2026 – present
Also available in ES →
TL;DR
- 8-person engagement team; sole person responsible for infrastructure
- Replacing ~1,500 fragmented ETL jobs with structured, team-owned pipelines
- Legacy AWS environment: ProServe-built, 3 years without new workloads
- As much organizational navigation as technical work
The situation
The client runs close to 1,500 ETL jobs. The goal is not to delete them, but to replace a fragmented spread of one-off pipelines with a structured platform where new data sources slot in without spawning another job.
The engagement team is eight people across governance, project management, engineering, and architecture and infrastructure. On the architecture side we are two; on the infrastructure itself, I work alone. Getting things done also means working through the client’s own team: requesting what I need and making sure the reasoning is clear, since I have no direct access to most of what I depend on.
What I’m building
The platform runs on OpenTofu and Terragrunt within the client’s multi-account AWS setup: a ProServe-built environment from three years ago, with no new workloads deployed since and institutional knowledge of it sitting largely outside the client’s own team.
Working within it means narrow access (project accounts only) and coordinating with ProServe for anything outside that boundary. Each environment I provision is consistent, auditable, and deployable from a single source of truth.
The stack:
- Databricks: configured as the processing and transformation layer.
- Snowflake: the analytical warehouse — provisioning in progress.
- Confluent Cloud (Kafka): event streaming across domains — next up.
All traffic in this environment flows through a central networking account with an inspecting firewall. I’ve taken a minimum-path approach: only the routes the platform actually needs are open, using Databricks VPC endpoints and Transit Gateway attachments scoped to the project accounts. Staying within the existing security perimeter and off the public internet is both the client’s requirement and the right call for an environment I don’t control end to end.
What it enables
When complete, the platform replaces the 1,500-job sprawl with a small set of well-structured pipelines. Teams own their data and their pipelines. New sources slot in without a central team becoming the bottleneck.
Getting there has required as much organizational navigation as technical work: understanding an environment the client team itself couldn’t fully describe, coordinating infrastructure access through ProServe, and making steady progress on the narrowest possible set of permissions.