I Used to Ask Claude to Audit My Code. Here Is Why I Stopped.
Mr. Ballaz- Focus
- Founder Story
- Risk
- High
- Stack
- Supabase/Next.js
- Detection
- Ubserve Runtime Simulation
Using AI prompts to check your own security is like asking the person who packed your parachute if they did a good job. Here is what I learned the hard way.
For six months I would paste my code into Claude and ask if it looked secure. It always said yes, with caveats. I felt good. I was wrong to.
For the first six months I was building with AI tools, my security workflow was this: paste the relevant code into Claude, ask "does this look secure?", read the response, feel reassured, move on.
It felt like due diligence. It was not.
The Problem With Prompting Your Way to Security
When you paste code into an AI and ask if it is secure, you are asking it to review what you chose to show it. The AI has no idea what else is in your app. It cannot see your compiled frontend bundle. It cannot check your HTTP response headers. It cannot probe whether your Supabase RLS policies actually block what you think they block at runtime.
It is reviewing a description of your app, not your app.
This sounds obvious when you say it out loud. It was not obvious to me when I was in the middle of a shipping sprint at 1am trying to convince myself the auth logic was solid.
Claude would respond with something like: "This looks generally fine. You might want to consider adding rate limiting to this endpoint, and make sure your JWT secret is stored in an environment variable rather than hardcoded." Good advice. Useful advice. But it was responding to the code I showed it, not the five other routes I did not think to include, or the environment variable that was technically in .env.local but had already made it into a public commit three weeks earlier.
What AI Prompts Cannot See
There are entire categories of vulnerability that live outside the code you write, in the runtime behaviour of your app:
Compiled bundle exposure. API keys that end up in your JavaScript bundle are not visible in your source files in the way you think. They get inlined during the build process. The only way to know if a key is in your bundle is to scan the bundle itself, the way an attacker would — by loading your deployed app and searching the downloaded assets.
HTTP header configuration. Content Security Policy, Strict-Transport-Security, X-Frame-Options, Permissions-Policy — none of these exist in your code. They exist in your server configuration and middleware. An AI reviewing your Next.js route handlers cannot tell you whether your deployed app is actually setting these headers correctly.
RLS policy gaps. You can write a Supabase RLS policy that looks correct and is still wrong. Policies can be enabled on a table without actually covering every operation. They can pass in your local tests and fail under specific conditions in production. The only way to know is to probe the live database with the correct session context, not to read the SQL.
Exposed secrets in previous commits. If a secret was ever committed to your repository, even if you deleted it in a later commit, it still exists in your git history. An AI reviewing your current code cannot see your git history.
The False Confidence Problem
The specific danger of AI security reviews is not that they give you bad advice — usually the advice is reasonable. The danger is that they give you a feeling of having done something when you have not actually scanned your app.
I shipped three features with this workflow before I ran an actual scan. When I finally did, it found two things I would have never thought to ask Claude about: a leaked key in my compiled bundle that came from a dependency, not my own code, and an endpoint that my RLS policy did not cover for update operations even though it covered selects.
Both were invisible to any prompt-based review. Both would have been exploitable.
What Actually Works
I am not arguing that AI is useless for security. AI is genuinely good at explaining vulnerabilities, suggesting fixes, reviewing code patterns, and helping you understand why something is dangerous.
What it cannot do is replace scanning the live app. Those are two different things, and conflating them is where founders get into trouble.
The workflow that actually works:
- Write your code, use AI to help where it is useful
- Before every deploy, run a scan against the live or staging app — not the source code
- Use AI to understand and fix what the scan finds
The scan tells you what is actually exposed. The AI helps you fix it. Neither replaces the other.