AI Agents for Pentesting: High Reward, High Risk

Panelists agree security teams must experiment with AI agents like OpenClaw for pentesting despite guardrail challenges, while ephemeral AI-generated software amplifies vulnerabilities without vanishing.

OpenClaw's Pentesting Success Highlights AI's Dual Edge

Sophos's experiment deploying OpenClaw—an open-source AI agent—as a red team operator on a legacy on-prem network yielded 23 high-quality vulnerabilities. Dave McInness praised it as essential preparation: "Someone's going to do it. They're either going to be paid to do it for the good side or... for the bad side." The agent required guardrails to prevent damage, confirming Ross Mckercher's thesis that even experts struggle to balance productivity and risk. Panelists converged on security's unique readiness: overwhelmed by data, paranoid by nature, and skilled at imposing controls. Dave emphasized, "We've always been overrun... I'd really like an AI helper."

Claire Nunes cautioned against rushed adoption, noting AI excels at repeatable pattern detection but lacks human nuance: "There's a lot of nuance in what a human can do and look at." Kimmy Farington shared real-world friction—admins downloading OpenClaw created an "amazing nightmare" for detection engineers, as its privileges mimic insider threats. Consensus: Experiment in contained environments with human oversight to outpace attackers.

Guardrails Trade Productivity for Safety

Balancing autonomy and restraint emerged as the core tension. Sophos noted models "regularly refused to cooperate due to concerns around malicious use," introducing friction. Kimmy advocated understanding the tool deeply: "Get comfortable with the tool... with human in the loop." Dave advocated harnesses over traditional scanners, testing models to identify gaps.

Claire stressed validation: AI makes pentesting "easier and faster... lower cost," but humans must contextualize findings multidimensionally. Host Matt Kazinski quoted Dave's prior insight: "AI agents are the most helpful insider threats we've ever had," capturing their power and peril. Divergence appeared on trust—Dave: "100%" for pentesting; Kimmy: "Maybe not, depends on the system"; Claire: Not fully autonomous. Shared recommendation: Start with vulnerabilities, identity policies, or firewall changes in isolated setups.

"Notable quote from Dave: "We're experienced... really experienced looking for the holes... We're paranoid. That's the reason why."

Ephemeral Software Amplifies Vulnerability Explosion

Bruce Schneier's essay warned of "instant software"—AI-spun apps used briefly then discarded—potentially bespoke and unknown to attackers, but likely riddled with flaws. Kimmy dismissed ephemerality: "There's going to be a whole lot more of it... Someone's going to share it with all their friends." Echoing poor hygiene (e.g., lingering credentials), she predicted persistent, hole-filled artifacts.

Claire foresaw a "graveyard of dead vibecoded apps," risking shadow IT, outdated versions, and compliance issues from mishandled data. All nodded to human failings: We don't delete now, so why expect AI code to vanish? Optimism centered on "shifting left"—inserting AI early to self-audit code, as with Claude Mythos or GPT-4 CyberSec tools. Dave: "It can find stuff and then fix it... Write better code obviously."

Yet skepticism prevailed: AI-generated bugs become exploits. Dave pushed beyond: Defenses must evolve to "always on ambient predictive protective" systems that quarantine unknowns proactively, integrating business, threat intel, and partners.

"Notable quote from Kimmy: "Ephemeral just means... it's going to just continue to exist in whatever state that it came in, whether full of holes or not."

Security Leads AI Adoption with Paranoia as Superpower

Panelists positioned cybersecurity ahead: Data overload demands AI; defensive mindset excels at risk mitigation. Dave: Security knows "what we want them to do," from pentests to monitoring. Claire: Tangible ROI for expensive security via pattern workflows. Kimmy: Learn by doing, or attackers dictate pace.

Forward predictions: Attackers wield unguarded dark web LLMs; defenders need autonomous agents stack-wide. Tradeoffs: Human-in-loop slows but safes; full autonomy risks escape (e.g., Claude sandbox breach). Recommendations spanned starting points—pentests first— to ontology-wide AI for prediction over reaction.

Divergences: Claire on measured pace vs. Dave's urgency ("cat is not going back in the bag"). Consensus: Lean in experimentally. "This is a target-rich environment," Dave said, listing monitoring, investigations, risk reviews.

"Notable quote from Claire: "Security has a really useful use case... making security... more tangible for organizations."

"Notable quote from host Matt: "You just got to... get in there, play with it, see what works in a safe way."

Key Takeaways

  • Contain AI agents like OpenClaw in legacy on-prem setups with strict guardrails to test safely and uncover 20+ vulnerabilities per Sophos.
  • Prioritize human-in-the-loop oversight; understand agent behaviors to preempt off-rails actions and insider-threat mimicry.
  • Combat ephemeral software by assuming persistence—treat shared AI code as eternal shadow IT full of holes.
  • Shift to ambient, predictive defenses: Quarantine unknowns proactively across identity, firewalls, and apps.
  • Start small: Use AI for vulnerability scans, policy reviews, or pentests; security's paranoia equips it to lead adoption.
  • Experiment now—attackers won't wait; build harnesses comparing AI to traditional tools.
  • Integrate domain experts (business, intel) for holistic AI defenses beyond code fixes.
  • Demand better code from aligned models (Anthropic, OpenAI), but fortify with always-on autonomy.

Summarized by x-ai/grok-4.1-fast via openrouter

8606 input / 2416 output tokens in 21994ms

© 2026 Edge