Methodology
Why every AI detector will eventually lie to you
The graph nobody in the detection business publishes plots accuracy against model release dates. The line slopes down. Not a bug. The structure of the problem.
The first slide in every AI-detection pitch is a benchmark. The vendor names the model they tested against, names their accuracy, and frames the rest of the conversation around making that number larger. Nobody on the buyer side asks the only useful question, which is whether the model in the benchmark is the same one the vendor's customers will be defending against next quarter.
It will not be. The detector is being sold against a target that has already moved.
What actually decays in a detector
A detector is a classifier trained on samples from a set of generators. The training set is a frozen distribution. Every new generation of a generator, every fine-tune, every alignment shift, every change in sampler temperature, perturbs that distribution. A detector trained on GPT-4 and Claude 3 outputs from 2024 is doing different work on the 2026 fleet, even when the underlying transformer architecture is approximately the same.
The decay is faster than the publication cycle. Detection vendors publish accuracy figures against the model they had the most labeled data for, usually four to six months before launch. By the time a procurement department signs a contract, the figure is already covering a different distribution than the one the contract has to defend against.
This is a structural fact of supervised learning on a moving target. Better engineering does not fix it; better engineering is what produces the next number that is also going to decay.
Why the false-positive rate is the real story
A vendor saying 94.0% accuracy on a balanced test set is also saying 6.0% misclassifications. Symmetric tests rarely match the real-world base rate. In production, the prior probability that any given file is AI-generated varies wildly: a stock photography library sees one rate, an undergraduate term-paper submission queue another, a courtroom exhibit a third.
Apply a 6.0% false-positive rate to an inbox of one million human-authored files and the system has flagged sixty thousand humans as machines. The vendor metric described one thing. The operational reality is something else.
The detector industry's response is to publish precision and recall figures with confidence intervals. This is correct statistical practice and irrelevant to the policy question. The platform building a moderation strategy on detector output does not get to plead the confidence interval to the falsely flagged creator. The creator wants to know why their account got suspended.
How attestation changes the question
A detector tries to answer is this AI. An attestation registry answers did a specific human sign this. The first question is probabilistic and unstable. The second is binary and stable.
The Pulse Signature design takes the second path. A creator signs a file using their own biometric signal on their own device. The signing event produces a cryptographic seal that includes the file hash, the witness identifier, and a timestamp. The seal is committed to an append-only registry. The biometric signal itself never leaves the device.
A platform with a registry lookup can verify the seal in milliseconds. There is no probability. The lookup either resolves to a valid attestation or it does not. The output is not is this AI; it is did this creator commit to this file. Those are different products with different operational consequences.
The detector industry has spent six years trying to turn probabilistic content classification into a moderation primitive. The attestation institutions are building the primitive that should have been there from the start.
What happens when the legal pressure arrives
The EU AI Act's machine-readable provenance language is loading into operational force across 2026 for the largest deployments. The legal text does not ask platforms to detect AI; it asks them to disclose. Detection is one possible answer. Provenance manifests are another. Procurement departments at major publishers in Europe have started running both options through their counsel.
Detection is producing the slower legal-review cycle. The counsel question is what happens when the detector is wrong about an enterprise customer. The provenance answer is more tractable: a manifest is either present and verifiable or it is not.
A platform routing serious moderation decisions through detector output is one bad case away from a published opinion that reframes detector confidence as the basis for a wrongful-accusation finding. That opinion will land somewhere in the next two years. Every detector vendor's brochure becomes a defense exhibit on the day it does.
What an honest detector vendor would say
There is a version of the detector pitch that survives a fact check. It runs roughly: this tool is useful for low-stakes pre-sorting at scale. It is not a basis for individual adjudication. Its accuracy is calibrated against a generator distribution that will be out of date the day you ship. Combine it with attestation primitives, where present, before you give a finding any weight.
No vendor sells this version. The market does not pay for it. The market pays for the next benchmark improvement, the next category-leader claim, the next chart that shows a higher number than last quarter. The contracts that reward those charts are six-to-eighteen-month deals. The bag for the consequences gets handed to the platform's legal team after the vendor has booked the revenue.
What survives the next release cycle
In eighteen months a new state-of-the-art generator will ship. Every detector benchmark currently in market will recalibrate. A small number of detectors will recover most of their accuracy within a quarter, after retraining on labeled samples from the new model. None of them will recover the false-positive rate problem, because that is not a model problem; that is a base-rate problem.
The same eighteen months will not change a single attestation record. A Pulse Signature signed today resolves identically in 2027, 2032, and 2056. The seal does not depreciate against model updates. It depreciates only if the keys are lost, which is a different operational problem with known mitigations.
A platform building for the five-year horizon is choosing between two depreciation curves. One slopes downward with every model release. The other does not slope at all.
Frequently asked questions
Are AI detectors useless?
- They are useful for sorting at scale when stakes are low and base rates are favorable. They are unsuited to adjudication, where an accusation has consequences. A spam filter and a courtroom finding are different products even when they share a math layer.
Does Humark try to detect AI-generated content?
- No. The platform records human attestation. If a creator chose not to sign a piece, the registry has nothing to say about it. The absence of a signature is silence, not a verdict.
What changes when an AI lab itself wants provenance?
- The same record. Model developers have a different problem, demonstrating what their model did and did not produce, but the architecture for that record is the same one a human creator needs. Attestation is brand-agnostic once it is identity-bound.