# Synthetic Data Factory

Multi-module Python pipeline that generates synthetic relational datasets (users, products, transactions), validates them with Pydantic, runs quality checks, and exports to Parquet/JSONL/CSV with an HTML report.

## Quick start

```bash
pip install -r requirements.txt
python job.py
```

## Environment variables

| Variable     | Description                          | Default    |
|--------------|--------------------------------------|------------|
| `OUTPUT_DIR` | Directory where results are written  | `./output` |

## Output

```
$OUTPUT_DIR/
  users/          users.parquet, users.jsonl, users.csv
  products/       products.parquet, products.jsonl, products.csv
  transactions/   transactions.parquet, transactions.jsonl, transactions.csv
  report.html     Visual quality report with embedded charts
```

## Configuration

Edit `synthetic_factory/config.py` to change:

- `SEED` — random seed for reproducibility
- `NUM_USERS`, `NUM_PRODUCTS`, `NUM_TRANSACTIONS` — record counts
- `CATEGORIES`, `PAYMENT_METHODS` — domain values
