Production data doesn't belong in your dev environment. But getting realistic test data shouldn't be this hard.
I got tired of watching teams jump through hoops to get data into lower environments.
Copying production data into dev or staging? Security nightmare. But without realistic data, demos look fake and testing is worthless. There had to be a better way.
The Problem
Here's what I kept seeing:
Management needs a demo ready by Friday. Sales promised a client they'd see the product with "real" data. So someone suggests pulling from production.
Now DevOps is pissed. Security wants to know why we need VPC peering between prod and staging. Someone has to scramble to mask PII. A DBA gets pulled into a meeting. The "quick demo" becomes a two-week project involving three teams and four Slack channels.
All because we needed some realistic-looking data.
And it wasn't just demos. Every new developer onboarding needed a seeded database. Every QA cycle needed fresh test data. Every staging environment refresh became a production.
The answer was always the same: "Can we just copy prod?"
No. No you can't.
The Alternatives Sucked
Faker libraries? Great if you have time to wire up every column, write the boilerplate, and maintain the scripts. You need to map email to faker.email(), phone to faker.phone(), created_at to faker.date(). For every table. Every time the schema changes.
Nobody has that time when the demo is Thursday.
Prompt an LLM? Sure, it works. Until it formats things differently each time. Until your tests start flaking because the output structure shifted. Until you're paying tokens for every CI run and your pipeline depends on OpenAI's uptime.
Copy production data? Good luck getting that past security. And even if you do, now you're liable for customer PII sitting in your dev environment. One accidental log, one screenshot in a bug report, and you've got a compliance incident.
None of these options made sense for the simple ask: "I need realistic data that isn't production data."
The Realization
Then it hit me.
The schema already defines what the data should look like.
email VARCHAR(100) should be an email. phone VARCHAR(20) should be a phone number. created_at TIMESTAMP should be a timestamp. first_name VARCHAR(50) should be a first name.
The structure is right there in the DDL. Why were we fighting security, begging DevOps, and risking compliance just to get data that the schema could already describe?
I didn't need production data. I needed production-shaped data.
What I Built
DDL to Data takes your CREATE TABLE statement and generates realistic data instantly.
Paste your schema, get JSON back. Emails look like emails. Phone numbers look like phone numbers. Names look like names. All inferred from column names automatically.
No production access needed. No VPC peering. No PII concerns. No security review.
And it's not an LLM — it's deterministic pattern matching. Same schema, same structure, every time. Fast enough for CI/CD (milliseconds, not seconds), realistic enough for demos, consistent enough for tests.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
phone VARCHAR(20),
created_at TIMESTAMP
);
Becomes:
[
{
"id": 1,
"first_name": "Sarah",
"last_name": "Chen",
"email": "sarah.chen@techstartup.io",
"phone": "+1-555-234-5678",
"created_at": "2024-03-15T09:23:41"
},
{
"id": 2,
"first_name": "Marcus",
"last_name": "Johnson",
"email": "m.johnson@company.com",
"phone": "+1-555-876-5432",
"created_at": "2024-03-14T14:56:12"
}
]
No setup. No configuration. No prompts to engineer.
Who It's For
This is for:
- Teams who need demo data without the security headache
- Dev environments that shouldn't touch production
- QA engineers who need realistic data, not
test@test.comandJohn Doeeverywhere - CI/CD pipelines that need consistent, fast data generation
- Anyone who's been on a call where management asks "why can't we just copy prod?"
If you've ever spent more time getting test data than actually testing, this is for you.
What's Next
I'm building two features that extend this further:
Story Mode — Describe the scenario you want, and the data adapts to match. "A growing B2B SaaS with enterprise clients and some churned accounts" or "an e-commerce store with seasonal holiday trends." The data tells a coherent story instead of just being random values.
Direct Database Seeding — Skip the copy-paste. Connect DDL to Data directly to your PostgreSQL database and populate tables in one click. Seed staging without ever touching production.
Try It
If you've ever been stuck between "we need realistic data" and "we can't touch production," this is what I built it for.
Free tier. No credit card. Just paste your schema and see what comes out.
Built by Travis. Questions or feedback? Reach out at travis@ddltodata.com or @DDLTODATA on X.