Name: DDL to Data
Author: DDL to Data

Production data doesn't belong in your dev environment. But getting realistic test data shouldn't be this hard.

I got tired of watching teams jump through hoops to get data into lower environments.

Copying production data into dev or staging? Security nightmare. But without realistic data, demos look fake and testing is worthless. There had to be a better way.

The Problem

Here's what I kept seeing:

Management needs a demo ready by Friday. Sales promised a client they'd see the product with "real" data. So someone suggests pulling from production.

Now DevOps is pissed. Security wants to know why we need VPC peering between prod and staging. Someone has to scramble to mask PII. A DBA gets pulled into a meeting. The "quick demo" becomes a two-week project involving three teams and four Slack channels.

All because we needed some realistic-looking data.

And it wasn't just demos. Every new developer onboarding needed a seeded database. Every QA cycle needed fresh test data. Every staging environment refresh became a production.

The answer was always the same: "Can we just copy prod?"

No. No you can't.

The Alternatives Sucked

Faker libraries? Great if you have time to wire up every column, write the boilerplate, and maintain the scripts. You need to map email to faker.email(), phone to faker.phone(), created_at to faker.date(). For every table. Every time the schema changes.

Nobody has that time when the demo is Thursday.

Prompt an LLM? Sure, it works. Until it formats things differently each time. Until your tests start flaking because the output structure shifted. Until you're paying tokens for every CI run and your pipeline depends on OpenAI's uptime.

Copy production data? Good luck getting that past security. And even if you do, now you're liable for customer PII sitting in your dev environment. One accidental log, one screenshot in a bug report, and you've got a compliance incident.

None of these options made sense for the simple ask: "I need realistic data that isn't production data."

The Realization

Then it hit me.

The schema already defines what the data should look like.

email VARCHAR(100) should be an email. phone VARCHAR(20) should be a phone number. created_at TIMESTAMP should be a timestamp. first_name VARCHAR(50) should be a first name.

The structure is right there in the DDL. Why were we fighting security, begging DevOps, and risking compliance just to get data that the schema could already describe?

I didn't need production data. I needed production-shaped data.

What I Built

DDL to Data takes your CREATE TABLE statement and generates realistic data instantly.

Paste your schema, get JSON back. Emails look like emails. Phone numbers look like phone numbers. Names look like names. All inferred from column names automatically.

No production access needed. No VPC peering. No PII concerns. No security review.

And it's not an LLM — it's deterministic pattern matching. Same schema, same structure, every time. Fast enough for CI/CD (milliseconds, not seconds), realistic enough for demos, consistent enough for tests.

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  email VARCHAR(100),
  phone VARCHAR(20),
  created_at TIMESTAMP
);

Becomes:

[
  {
    "id": 1,
    "first_name": "Sarah",
    "last_name": "Chen",
    "email": "sarah.chen@techstartup.io",
    "phone": "+1-555-234-5678",
    "created_at": "2024-03-15T09:23:41"
  },
  {
    "id": 2,
    "first_name": "Marcus",
    "last_name": "Johnson",
    "email": "m.johnson@company.com",
    "phone": "+1-555-876-5432",
    "created_at": "2024-03-14T14:56:12"
  }
]

No setup. No configuration. No prompts to engineer.

Who It's For

This is for:

Teams who need demo data without the security headache
Dev environments that shouldn't touch production
QA engineers who need realistic data, not test@test.com and John Doe everywhere
CI/CD pipelines that need consistent, fast data generation
Anyone who's been on a call where management asks "why can't we just copy prod?"

If you've ever spent more time getting test data than actually testing, this is for you.

What's Next

I'm building two features that extend this further:

Story Mode — Describe the scenario you want, and the data adapts to match. "A growing B2B SaaS with enterprise clients and some churned accounts" or "an e-commerce store with seasonal holiday trends." The data tells a coherent story instead of just being random values.

Direct Database Seeding — Skip the copy-paste. Connect DDL to Data directly to your PostgreSQL database and populate tables in one click. Seed staging without ever touching production.

Try It

If you've ever been stuck between "we need realistic data" and "we can't touch production," this is what I built it for.

Free tier. No credit card. Just paste your schema and see what comes out.

Try DDL to Data →

Built by Travis. Questions or feedback? Reach out at travis@ddltodata.com or @DDLTODATA on X.

Why I Built DDL to Data

The Problem

The Alternatives Sucked

The Realization

What I Built

Who It's For

What's Next

Try It

Related Articles

How My First Hacker News Launch Went (And What I Did About It)

Generate PostgreSQL Test Data Without Code (Step-by-Step)

How to Generate Test Data for PostgreSQL (2 Methods)

Ready to try DDL to Data?