
read receipts
Your Spam Filter Doesn't Read Your Email. It Counts the Tells.
The model isn't a nosy intern skimming your love letters — it's a statistician betting on patterns, and that changes what you should actually worry about.
Google blocks nearly 15 billion unwanted messages a day, and it does it without anything you'd recognize as reading. No model is sitting there mouthing the words of your dentist's reminder. That mental picture — a digital snoop hunched over your inbox, absorbing your secrets — is the fear most people carry, and it's aimed at the wrong target.
Here's what's really happening. When mail hits a filter, the system doesn't start with your prose. It starts with the envelope. Who sent this, and does the sending server actually have permission to use that domain? That's SPF, DKIM, and DMARC — three checks that confirm whether "[email protected]" came from your bank's infrastructure or from a rented server in someone's basement. A scammer can copy a logo perfectly and still fail DKIM, and that single cryptographic mismatch tells the filter more than any sentence in the body ever could.
Then come the cheaper signals. Sender reputation: has this IP address been blasting millions of identical messages? Link analysis: does the "click here" button point somewhere the visible text doesn't admit? URL age, redirect chains, attachment types. Most of a spam verdict is settled before the model thinks hard about language at all.
When language does enter, it's not comprehension — it's math. Old-school filters used Bayesian classifiers, the approach Paul Graham popularized in his 2002 essay "A Plan for Spam," which scored messages on the probability that certain words showed up in junk versus real mail. The word "Viagra" carried a number. Modern systems go further with transformer-based models that turn text into embeddings — long lists of numbers that capture meaning without storing the meaning as words. To the classifier, your email isn't a story. It's a point in high-dimensional space, and the question is whether that point sits near the cluster labeled "phishing."
That distinction matters, because "the AI sees my words" and "the AI converts my words into a vector and compares it to a fraud pattern" are not the same privacy event. One sounds like eavesdropping. The other is closer to a metal detector: it reacts to a signature, not a confession.
So the real question isn't whether a model touches your text. Of course it does — that's the job. The question is two-fold, and it's where you should actually point your suspicion. First: retention. Does the content get stored, and for how long? OpenAI's API policy, for example, states it does not use business API data to train its models and retains it for a limited window before deletion. Anthropic publishes similar commercial terms. Those policies are the thing worth reading, not reassurances about how "private" a logo looks.
Second: where the inference happens. A filter running classification inside your provider's own walls is a different risk profile than text being shipped to a third-party model you've never heard of. "On-device or in-tenant" versus "sent somewhere else" is the privacy fork that decides who could, in theory, ever see anything.
This is also why the "it scans everything" panic backfires. The systems scanning everything are the ones protecting you — the classifiers blocking the fake invoice and the lookalike login page. The dangerous scanning is the quiet kind: the tracking pixel in a newsletter that pings a server the moment you open it, logging your IP and the time you read it. That's a genuine snoop, and it's not the spam filter.
Worry about retention. Worry about destination. The model reading your words for clues is the least of it.
Sources
- Google Workspace Blog — Gmail blocks roughly 15B unwanted messages daily; ML spam filtering overview
- Paul Graham — 'A Plan for Spam' (2002) — Bayesian spam classification
- OpenAI — API data not used for training; limited retention then deletion
- Anthropic — Commercial data handling and retention terms