Question 1

How is this different from pasting text into ChatGPT?

Accepted Answer

When you paste sensitive text into ChatGPT, Claude, Gemini, or any other cloud LLM, that text leaves your device and ends up on a third-party server — and depending on your settings and tier, it may be retained for training. The Privacy Scanner runs the detection model entirely in your browser. The text never leaves your device, there is no signup or account, and no server has any record of what you scanned. The whole point of the tool is to scrub the prompt before it leaves you.

Question 2

What kinds of PII does it detect?

Accepted Answer

Names of people (PERSON), email addresses (EMAIL), phone numbers (PHONE), postal/street addresses (ADDRESS), dates (DATE), account numbers including IBAN/credit card style (ACCOUNT), identification numbers like passport/SSN (ID), URLs, organization names (ORG), and secret-like patterns such as passwords or API keys (SECRET). The underlying model is OpenAI's privacy-filter, trained specifically for this task — it errs on the side of recall, so review the highlights and copy whichever cleaned version fits your downstream use.

Question 3

Does it work in languages other than English?

Accepted Answer

Yes. The privacy-filter model is multilingual (built on XLM-RoBERTa) and identifies PII across many languages. Quality is best on Latin-alphabet languages — Spanish, Portuguese, German, French, Italian, Dutch — and weaker on highly inflected or non-Latin scripts. Japanese, Chinese, and Arabic work but with lower recall. If you scan a non-English document and find the model missed something, fall back to the redaction modes (Label is safest) and review manually.

Question 4

Is the scanning really private?

Accepted Answer

Yes. The model is downloaded once from Hugging Face's public CDN and cached by your browser. From that point on, every scan runs entirely on your device — no text or scan results are ever sent to FormatFuse, OpenAI, Google, or any other server. You can verify this by opening your browser's Network tab while you scan: after the first-time model download, there are zero outbound requests. We have no server that could log your text even if we wanted to.

Question 5

What should I do with the redacted output?

Accepted Answer

Pick the redaction mode that fits your downstream use. "Mask" replaces each entity with [REDACTED] — the safest default when a human reviewer needs to see that something was removed. "Label" replaces it with [PERSON]/[EMAIL]/etc., which is best when an LLM still needs to know the structural role of what used to be there. "Remove" strips the entity entirely. Always do a final manual reread before sending — no model is perfect, and adding context (like "the customer mentioned in the [REDACTED] ticket") sometimes leaks information indirectly.

Question 6

Why is the first scan slow?

Accepted Answer

On first use the tool downloads the privacy-filter model — about 290 MB at q4 quantization, served from Hugging Face's CDN. Your browser caches it after the download, so every subsequent scan starts instantly (typically under a second for a few thousand characters of text). If your network is slow, the progress bar in the Scan button shows the download percentage. The download happens entirely between you and Hugging Face's CDN — FormatFuse never sees the request.

Question 7

Are there any limits on how much I can scan?

Accepted Answer

There's a 50,000 character per-scan input limit, mostly so very long inputs don't hang the browser. For most use cases — emails, support tickets, contract clauses, chat exports, code comments, CSV rows — that's plenty. For longer documents, split them into chunks and scan one at a time. There's no per-day quota, no signup, no account, and no usage cap on the number of scans — the tool runs on your device, so we have no costs to pass on.

Question 8

What about PDFs and other documents?

Accepted Answer

For PDFs, use our PDF Redact tool — it runs the same on-device privacy-filter engine but applies the findings as black redaction rectangles and rasterizes affected pages on save, so the underlying text is destroyed (not just visually covered). For images of text, use our Image to Text (OCR) tool to extract the text, then paste it here. For Word documents and plain .txt files, copy the contents into the textarea above.

Privacy Scanner — Find & Redact PII in Any Text

About Privacy Scanner — Find & Redact PII in Any Text

Related Tools

Try other apps by RayLabs

Snap a photo. Get a calendar event.