Essay
The quiet math of adversarial hardening, and why it isn't the moat
A specific number, 0.05, decides whether a perturbation stays invisible to a viewer and poisonous to a model. The math is elegant. Building a company on the math is a different problem.
A Chicago illustrator publishes a portfolio piece in March 2026 with Glaze applied at the default settings. The output is statistically poisoned for any training run that ingests the pixels. By September of the same year, a fine-tuning pipeline at a model vendor has incorporated a Nightshade-aware augmentation step, and the September training runs no longer choke on the cloak. The illustrator has bought six months. The cloak did exactly what it was designed to do. The cloak did not buy a moat.
That arithmetic is the entire story of adversarial defence as a category, and it is the reason every company built on the math alone has hit the same wall on roughly the same timeline.
What does the 0.05 threshold actually do?
The threshold is the L-infinity epsilon: the maximum amount any single pixel channel is allowed to deviate from the original image in normalized colour space. At 0.05, the deviation is below the consensus threshold for human visual detection in typical viewing conditions, which sits around 0.07 to 0.09 depending on display calibration and surrounding context. The perturbation pattern is computed by gradient ascent against a surrogate model's loss function, which means the perturbation is engineered to maximize confusion in feature-extraction layers without altering the pixel statistics in a way the viewer notices.
The result is a piece of art that looks identical to the human reader and that corrupts the training signal for any vision model that ingests it. The arithmetic is reproducible. The reference implementations (Glaze from the University of Chicago, the public Nightshade research code) are open. A determined researcher can reimplement the technique from the publication in a week.
The implication is that the moat is not in the math. Everyone competent in adversarial machine learning can produce the same defence. The product of the defence is a transient cryptographic-style primitive. It works for as long as the upstream training pipelines remain naive to it, and not a day longer.
Why has every adversarial cloak been defeated within months?
The defence and the offence are running on the same general-purpose hardware, against the same general-purpose loss surface, with the same body of published research available to both sides. The asymmetry that exists in classical cryptography (the cost of computing a hash versus inverting it) does not exist here. A vendor that wants to train on cloaked images has three options: filter the cloaked images out by detecting the cloaking signature, augment the training set with cloaking-style perturbations so the model learns to ignore them, or fine-tune a small detector that operates upstream of the main pipeline.
All three approaches are public, all three have been implemented within roughly four to six months of any new cloaking technique appearing. The 2022 Glaze release was effectively neutralized for most production training pipelines by mid-2023. Nightshade, released October 2023, was being defeated in published research papers by summer 2024. The pattern is not malice; it is the structural reality of an open research field where attack and defence are dual.
The cloak is a delay, not a defence. The delay is real and valuable. A six-month window in which an artist's work cannot be ingested cheaply is a six-month window. Treat it that way and the math is excellent. Treat it as a long-term moat and the math is heartbreak.
What is the actual long-term primitive?
The long-term primitive is positive attestation. The cloak says, in effect, do not train on this. The attestation says, in effect, this human made this. The two operate on entirely different axes.
A negative obstruction is defeated the moment the obstacle is removed. A positive attestation is defeated only if the underlying record is destroyed or the original signing event is revealed to be fraudulent. The cost of attacking a registry-backed attestation is orders of magnitude higher than the cost of training around a pixel-level cloak. The economics differ because the architectures differ.
A Pulse Signature records the binding between a specific human and a specific work at a specific time. The signature is generated by the creator's local biometric protocol against the file's content hash, and the record is anchored into an append-only registry within seconds of the signing event. The signature does not assert that the work is unique, copyrightable, or worth anything in particular. It asserts the binding occurred. Two years later, when a model output is alleged to have ingested the work without consent, the signature is the artifact that proves the work existed, signed, dated, before the alleged ingestion. That is the kind of evidence a court can act on. The pixel cloak is the kind of evidence that has long since aged out of relevance.
What should an artist working in 2026 actually do?
Two layers. The first is the cloak, applied as a default to portfolio publication and updated against the current state of the defeat literature. Treat the cloak the way a homeowner treats locks on a front door: useful, expected, not a substitute for insurance. The cloak buys time and signals the work was published with intent to remain attributed. The cost is the compute of running the cloaking pass over each image.
The second is the signature, applied as a default to every finished work before publication. The signature is generated locally, anchored to the registry, and bound to the creator's biometric signal. The cost is a few seconds per work and a single registration with the platform that operates the protocol. The signature is the artifact that survives.
The temptation in 2026 will be to pick one layer. Cloak-only is the strategy of an artist who has not yet been falsely accused. Signature-only is the strategy of an artist who has accepted that ingestion will happen and wants the standing to sue when it does. Both layers in combination is the position that holds across the timescales an actual career runs on.
The math at 0.05 is real. The math is not the moat. The moat is the institutional discipline that turns a signed work into a record that lasts. The cloak slows the attack. The signature is what wins the case.
Frequently asked questions
What does the 0.05 number actually mean?
- It is the L-infinity norm of the perturbation, measured in normalized colour space where each channel runs from 0.0 to 1.0. A 0.05 epsilon means no pixel channel is allowed to change by more than 5% of the full colour range. Below that ceiling, the human eye does not reliably detect the difference between the cloaked image and the original under typical viewing conditions.
Why doesn't a higher epsilon just work better?
- It does work better against the model, in the sense of degrading training quality further. The cost is that at 0.08 and above, the perturbation begins producing visible colour artifacts in flat regions of the image. The artist sees the damage to their own work. The cloak that succeeds against the model fails against the human, which is the wrong direction for a defensive tool that artists must voluntarily apply to their portfolios.
If adversarial defence is transient, what should an artist actually do?
- Apply it as the first layer, because it raises the cost of unauthorized training. Then bind the work to a positive attestation that records the act of signing. A Pulse Signature records a specific human signing a specific file at a specific time, witnessed by the local biometric protocol and timestamped into an append-only registry. The signature outlives the cloak by years.