Use a powerful LLM without handing it your users' personal data. Personal information is replaced with opaque tokens before any model sees it — and a watcher blocks anything raw from ever crossing.
The honest answer to "but everything just goes to OpenAI, PII and all." With CongoSky it doesn't: the identities stay home, the model reasons over tokens, and we can prove nothing raw left.
How it works
1
Detect & tokenize
Each piece of PII in your text → a stable GUID, kept in a local vault that never leaves your tenant. Emails, ID numbers, phones, cards (pattern-based) plus any names you supply.
2
Send only tokens
The model receives opaque GUIDs — never the real values. Same value → same token, so it can still reason about one person without knowing who.
3
The brain reasons blind
It returns its answer still in tokens. It never held a name, a number, or an address.
4
Re-hydrate locally
Tokens are swapped back to the real values — inside your tenant, for your final answer only.
⛔
The watcher — fail closed
Every outbound payload is checked. If any raw PII is still present — even something the tokenizer missed — egress is BLOCKED. Nothing raw crosses, by default. The backstop, not an afterthought.
What the model actually sees
A worked example with a fictional person. (Real values never appear — not in our logs, not on this page.)
Your textJane Example (ID 0000000000000, [email protected], +27 00 000 0000) asks if her R5 000 loan is approved.
What the brain sees⟦name-673d⟧ (ID ⟦za_id-44cd⟧, ⟦email-3141⟧, +⟦phone-e1ff⟧) asks if her R5 000 loan is approved.
Brain repliesYes — ⟦name-673d⟧'s loan is approved.
You receiveYes — Jane Example's loan is approved.
↑ Every value here is invented (note the .invalid email, the all-zero number). This page contains no real personal data — the same rule the firewall enforces.
⚖️ Honest scope — what it does and doesn't promise
Detection is pattern-based for structured PII (email, SA ID, phone, card) plus a hook for names you supply. It does not magically find every name in free-form prose.
The watcher is the backstop. Because detection isn't perfect, the watcher refuses egress on anything still raw — so a missed class is blocked, not leaked. Defence in depth, fail closed — a named limit, not a promise of perfection.
The vault never leaves. The token→value map lives inside your tenant; the mapping itself is never sent to the model.
A boundary, not a bolt-on. It travels with every model call — the same posture as our assume-breach default and the four-gate policeman.
Proven, runnable, deterministic: the420code/proofs/pii-firewall/ — all checks pass (tokenize round-trip, stable tokens, watcher blocks raw / allows clean). The live-LLM wiring reuses the gated OpenRouter path; the round-trip guarantee is proven first.