FOUNDRY/7 · BUILD 0507·26v1.0 · prin7r.com

Schema in.
Dataset out.

On-demand synthetic data for ML teams. Declare the schema, the constraints, and the bias profile you want. Receive a labeled dataset with provenance, audit log, and a reproducibility manifest you can hand to a regulator.

  • 01Cold-start training sets
  • 02Edge-case augmentation
  • 03Eval benchmarks & contrast sets
  • 04Privacy-safe production clones

Pay in USDT / USDC · NOWPayments hosted invoice · 48-hour first dataset

VERIFIED · MANIFEST 0x8c4f
Input · schema.foundry.ymlCOMPILING
dataset: support_tickets_emea
domain: customer_support
locale: [en-GB, de-DE, fr-FR, nl-NL]
volume: 50_000

schema:
  ticket_id:    uuid
  channel:      enum[email, chat, voice_transcript]
  product_sku:  ref(./skus.csv, weighted)
  intent:       enum[refund, defect, billing, how_to, escalation]
  message:      text(min=24, max=420, lang=$locale, register=mixed)
  sentiment:    float(-1..1, dist=skew_neg)
  escalated:    bool(p=0.18 if intent in [defect, escalation])

bias_profile:
  age_distribution:    uniform(22, 71)
  income_distribution: lognormal(mu=10.6, sigma=0.42)
  edge_cases:
    - sarcasm_density:        0.06
    - codeswitching_density:  0.04
    - non_native_grammar:     0.11

constraints:
  - escalated == true => sentiment < -0.3
  - if locale == "de-DE": message.formal_register >= 0.7
  - product_sku ∈ skus.csv  # foreign key

reproducibility:
  seed: 0x4f5c
  manifest: signed
  emit:    [jsonl, parquet]
Output · row 14,492 / 50,000 · jsonl✓ MANIFEST: 0x8c4f9aa1
{
  "ticket_id":   "0c9e2a4b-7e54-4f2a-9e01-2b8a4...",
  "channel":     "voice_transcript",
  "product_sku": "AX-740-BLK",
  "intent":      "escalation",
  "message":     "Hi, das ist jetzt das dritte Mal — der
                  Knopf hängt komplett. Ich brauche
                  einen Vorgesetzten, sofort.",
  "sentiment":    -0.78,
  "escalated":    true,

  "_provenance": {
    "lineage":   "sha256:8c4f…",
    "model":     "foundry-text-3.1 + foundry-policy-1",
    "seed":      "0x4f5c",
    "row_proof": "0x4a31d2ef…"
  }
}
RECORDS
50,000
CONSTRAINTS PASSED
3 / 3 ✓
BIAS DRIFT
0.014 σ
WE FORGE FORML TEAMSEVAL LEADSAPPLIED SCIENTISTSRED-TEAMERSSAFETY ENGINEERSDATA PRODUCT MANAGERSROBOTICS LABSFINTECH RISK DESKSMEDICAL AILEGAL AIEDU AI
03 · What we generate

Four shapes of dataset, one foundry floor.

Every Foundry/7 run starts as a schema and ends as a signed dataset. The shape changes; the rigor doesn't.

01 · COLD-START

Boot a model when you have zero real data.

You have a product idea but no users yet — and no data to fine-tune on. We generate the first 100k examples that look like the world your model will see, with intent and tone distributions you control.

  • First-touch fine-tunes
  • Pre-launch benchmark dummies
  • Cold-start RLHF prompts
02 · EDGE-CASE AUGMENTATION

Find the failure modes before production does.

You already have data, but it's smooth. We synthesize the long tail: code-switching, sarcasm, OCR noise, adversarial typos, near-OOD inputs. Boost your eval set so the next regression is caught in CI.

  • Adversarial contrast sets
  • Counterfactuals (one-attribute flips)
  • OOD perturbations with manifest
03 · EVAL BENCHMARKS

Build the benchmark that proves your claim.

Your model is shipped — now prove it. We forge labeled benchmark sets aligned to a rubric you supply: ground truth, gold labels, calibration buckets, plus a reproducibility manifest auditors will accept.

  • Domain-specific gold sets
  • Bias / fairness slices
  • Versioned scorers + rubric
04 · PRIVACY-SAFE CLONES

Replace production data when legal won't let you ship it.

We take a real schema and synthesize a statistically faithful clone — same joint distributions, same correlations, zero PII. Train, demo, or share with a vendor without DPA review purgatory.

  • k-anonymous + l-diverse outputs
  • PII scrubber + signed attestation
  • GDPR / HIPAA artefact pack
04 · The manifest

Every dataset arrives with its receipt.

Synthetic data without a manifest is gossip. Foundry/7 ships every run with a signed reproducibility receipt — schema, seed, model versions, bias profile, constraints, and a row-level provenance hash chain.

# manifest.toml — signed by foundry/7
[run]
id              = "run_2026-05-08_0c9e"
schema_hash     = "sha256:e1d3…f4a2"
seed            = "0x4f5c"
records_emitted = 50_000
started_at      = "2026-05-08T11:02:14Z"
finished_at     = "2026-05-08T11:38:46Z"
duration_s      = 2192

[models]
text          = "foundry-text-3.1@4f9c"
policy        = "foundry-policy-1@1.0.4"
labeler       = "claude-3.5-sonnet (read-only adjudication)"
human_loop    = "panel_emea_v3 (3 reviewers)"

[bias_profile]
hash    = "sha256:bb04…7c91"
input   = "./bias_profile.yml"
checked = ["age", "gender", "locale", "register"]

[constraints]
declared = 8
passed   = 8
failed   = 0
report   = "./constraints.html"

[provenance]
lineage      = "merkle"
chain_root   = "0x8c4f9aa1…"
row_proofs   = "row_proofs.parquet"
signature    = "ed25519:foundry-7-prod"

[license]
output_terms = "client_owned, non-exclusive transfer"
REPRODUCIBLESEED + LINEAGE

Re-run a dataset bit-for-bit from the manifest. Auditors love it. Your MLOps team loves it more.

BIAS-AUDITEDDECLARATIVE PROFILE

You declare the distribution. We measure deviation. The manifest reports drift in σ — not vibes.

ROW-LEVEL PROVENANCEMERKLE CHAIN

Every record carries a proof linking back to the seed. Drop a row, prove a row, kill a row — without re-running the dataset.

CLIENT-OWNED OUTPUT

Datasets are yours. We don't train downstream foundation models on your domain. Read it in section 09 of the FAQ.

05 · API quickstart

POST a schema. GET a dataset.

REST API + Python and Node clients. Idempotent runs, streaming progress, signed download URLs that expire. Sandbox key ships free with every account.

curl · create a runshell
curl https://api.synthetic-data-factory.prin7r.com/v1/runs \
  -H "Authorization: Bearer $FOUNDRY_API_KEY" \
  -H "Content-Type: application/yaml" \
  --data-binary @schema.foundry.yml

# → 202 Accepted
# {
#   "run_id":  "run_2026-05-08_0c9e",
#   "status":  "compiling",
#   "stream":  "wss://api.../v1/runs/run_…/stream",
#   "manifest":"https://.../runs/run_…/manifest.toml"
# }
python · poll & downloadpython
from foundry7 import Foundry

f = Foundry(api_key=os.environ["FOUNDRY_API_KEY"])
run = f.runs.create_from_file("schema.foundry.yml")

for event in run.stream():
    print(event.phase, event.records, event.bias_drift)

dataset = run.wait_for_completion()
dataset.download("./out/", format="parquet")
print(dataset.manifest.chain_root)
node · ergonomic SDKts
import { Foundry } from "@foundry7/sdk";

const f = new Foundry({ apiKey: process.env.FOUNDRY_API_KEY });
const run = await f.runs.create({ schemaPath: "./schema.foundry.yml" });

for await (const ev of run.stream()) {
  console.log(ev.phase, ev.records, ev.constraints.passed);
}

const ds = await run.complete();
await ds.download({ to: "./out", format: "jsonl" });
webhook · run completedjson
POST https://you.example.com/foundry7

{
  "event": "run.completed",
  "run_id": "run_2026-05-08_0c9e",
  "records": 50000,
  "constraints": { "passed": 8, "failed": 0 },
  "bias_drift_sigma": 0.014,
  "manifest_url": "https://.../manifest.toml",
  "download_urls": {
    "jsonl":   "https://.../signed?...",
    "parquet": "https://.../signed?..."
  },
  "signature": "ed25519:6a4c…"
}
06 · Inside a run

From schema to signed dataset in four phases.

  1. T+0

    Compile

    Parse schema, lint bias profile, resolve $refs and FK joins. Reject contradictory constraints before a single token is generated.

  2. T+8m

    Forge

    Run the generator at the volume you asked for, with seed and model versions pinned. Stream progress over WS or SSE, including running bias drift.

  3. T+34m

    Adjudicate

    Sample 0.5–2% of records into a labeler-of-judgement loop. Re-run the failed rows. Optionally route into a human panel for the high-stakes verticals.

  4. T+38m

    Stamp

    Assemble the manifest, sign the lineage, mint signed download URLs, and emit `run.completed` to your webhook.

07 · Pricing

Three credit shapes. One foundry.

Buy a single bench run, a monthly production credit, or a quarterly managed program. Pay in USDT or USDC — the invoice is hosted by NOWPayments and the fiat partner card on-ramp is enabled.

PROOF · ONE RUN

Bench

One run, one schema, one purpose. For benchmarks and cold-start training sets.

$480single dataset run
  • Up to 50,000 labeled records
  • Schema validator + bias-profile linter
  • Reproducibility manifest (signed)
  • Single export: JSONL, Parquet, CSV
  • 48-hour delivery, async review

Pay in USDT / USDC · NOWPayments hosted invoice

RECURRING · 5 RUNS/MONTHMost teams pick this

Production

Five runs per month, branched schemas, and a dedicated foundry engineer in your Slack.

$2,400monthly credit
  • 5 runs/month, up to 250,000 records each
  • Branching schemas + edge-case augmentation
  • Contrast sets, counterfactuals, drift sims
  • Private S3 / GCS / Azure export
  • 4 hours of foundry-engineer time / month
  • PII-scrubbed clones of production data

Pay in USDT / USDC · NOWPayments hosted invoice

MANAGED · QUARTERLY

Enterprise

Managed dataset program with vertical specialists and a regulator-ready audit trail.

$12,000quarterly · 90 days
  • Unlimited schemas across the quarter
  • Vertical specialists: medical, legal, fintech, robotics
  • Human-in-the-loop adjudication
  • SOC-friendly audit log + signed lineage
  • SLA-backed turnaround (24h critical path)
  • Named foundry team + on-call review

Pay in USDT / USDC · NOWPayments hosted invoice

Need an invoice in EUR / GBP / wire? Email foundry@synthetic-data-factory.prin7r.com

08 · Provenance ledger

Every row, every reason.

Foundry/7 keeps a per-row Merkle proof linking each record to its seed, schema version, and adjudication. When a customer or regulator says "why was this record in your training data?" you have an answer with a hash on it.

ROWRUNCHAIN ROOTCONSTRAINTVERIFY
14,492run_2026-05-08_0c9e0x8c4f…aa1✓ all 8VIEW
14,493run_2026-05-08_0c9e0x8c4f…aa1✓ all 8VIEW
14,494run_2026-05-08_0c9e0x8c4f…aa1✓ all 8VIEW
14,495run_2026-05-08_0c9e0x8c4f…aa1↺ re-rolled (sentiment)VIEW
14,496run_2026-05-08_0c9e0x8c4f…aa1✓ all 8VIEW
14,497run_2026-05-08_0c9e0x8c4f…aa1✓ all 8VIEW
09 · Frequently asked

Honest answers, no hand-waving.

Are the datasets actually useful for training?

Yes — within the bounds you declare. Foundry/7 is best for cold-start, eval, edge-case augmentation, and privacy-safe clones. We are honest about limits: synthetic data is not a substitute for representative real-world data when subtle distribution shifts matter. The manifest tells you exactly where the dataset can be trusted.

What model do you generate from?

A pinned ensemble: a domain-specialized text generator (foundry-text-3.1), a constraint-aware policy network (foundry-policy-1), and an adjudication labeler. The exact versions are recorded in every run's manifest. Vertical packs swap in domain specialists for medical, legal, fintech, and robotics.

Do you train on my schema or my data?

No. Schemas you submit, samples you upload, and any production clones we generate are scoped to your account. We do not fold customer schemas back into the foundation models. Output is client-owned with a non-exclusive transfer; the license terms ship in the manifest.

How do you handle PII when I upload a real schema or sample?

PII is scrubbed at the door. We run a tagger over your input, strip what's tagged, and only forward statistical summaries to the generator. The signed attestation in the run's manifest documents what was scrubbed and when. Enterprise plans add SOC-friendly retention windows.

Can I bring my own evaluator?

Yes. Drop a Python or JS scorer into the run and we'll execute it inside the foundry sandbox. The manifest reports your scorer's results next to ours so audits hold up.

How fast?

Bench tier: 48 hours for the first run. Production tier: typical 50k-row run lands in 30–45 minutes; 250k in 2–3 hours. Enterprise has SLA-backed turnarounds for critical-path runs.

Crypto-only payment? What about wires and cards?

The hosted invoice is run by NOWPayments, which accepts USDT/USDC and a fiat-partner card on-ramp. Need EUR/GBP/USD wire or a Stripe-style card path? Email foundry@synthetic-data-factory.prin7r.com and we'll cut a custom invoice.

Refunds and abandoned runs?

If a run fails to compile (constraint contradiction, schema invalid, etc.) we credit the full amount back to your foundry balance. If a run completes but you reject the manifest within 7 days, we re-run on us. Full terms in §07 of /docs/07-sales-strategy.md.

How is this different from Tonic / Mostly / Gretel / etc.?

Foundry/7 is opinionated about three things: (1) reproducibility manifests as a first-class artifact, (2) row-level provenance instead of dataset-level provenance, and (3) vertical specialists rather than one model for everything. We are smaller, narrower, and more auditable. Read /docs/04-pain-points.md for the head-to-head.