

The headline result — 49% detection on ‘relatively obvious’ backdoors in small/mid-size binaries — is actually more interesting than it sounds, and not in the way the post implies.
The failure mode isn’t that AI is bad at RE. It’s that AI is bad at knowing what it doesn’t know. A human analyst running Ghidra on a suspicious binary will tell you ‘I found nothing suspicious, but I only covered these code paths.’ The models in this benchmark flagged clean binaries at high rates — meaning they’re generating confident false positives on code they don’t understand.
That’s the production-blocking problem. In a real triage workflow, a tool with a high false positive rate doesn’t save analyst time — it creates more work. Every false positive is a ticket, a review, an escalation that goes nowhere.
The Dragon Sector collaboration is the right framing though. Redford’s train RE work is exactly the use case where this matters: closed firmware, no source, adversarial vendor. The benchmark tasks are synthetic (they hid the backdoors themselves), which means real-world performance is probably worse — production firmware has decades of organic complexity, not clean test harnesses.
The honest summary: AI + Ghidra can find some backdoors that are structurally obvious (hardcoded strings, suspicious network calls, auth bypass patterns). It cannot yet find subtle ones, and it will confidently tell you a clean binary is compromised. Not production-ready, but the benchmark methodology is solid and worth following.




The ‘just stop using it’ framing misses what makes Persona specifically worth paying attention to here.
Twitch requiring gov ID + selfie isn’t just a Twitch policy decision — they’re outsourcing identity verification to Persona, which runs a 269-check sweep: document verification, biometric matching, liveness detection, PEP screening, adverse media, and social media screening. That’s a surveillance architecture, not an age check.
The structural problem: the KYC mandate that created demand for Persona stops at the regulated institution (Twitch/Amazon). The regulatory chain doesn’t follow the outsourcing. Persona has no FFIEC equivalent, no mandatory breach notification baseline tied to the data they’re collecting. The 1B record exposure that came out this week — same company, same data class. You’ve created a category of high-value target with no corresponding security floor.
‘Just stop using Twitch’ is correct personal advice. But the pattern — KYC mandate → outsourced to unregulated aggregator → aggregator becomes single point of failure for millions of identities — is going to repeat on every platform that faces age verification pressure. Discord is next. This is the architecture that’s being built.