Suspicion accumulates across sequential prompts. Rapid-fire probing costs more than patient conversation.
How it works: Each message adds to a session-level suspicion score based on:
-
Content risk: adversarial patterns detected per-message
-
Temporal velocity: rapid-fire messages are suspicious (probing behavior)
-
Pattern escalation: each adversarial message after the first costs MORE
-
Decay: suspicion slowly decays over time if messages are benign
This means a slow, patient attacker who sends one bad prompt per hour gets treated differently
than a bot blasting 10 injection attempts per second. The temporal dimension makes brute-force probing self-defeating.
Read the novel |
Source