Guide8 min read · updated May 9, 2026

Inside the exploitation phase

On this page▾

Most static scanners flood you with maybes. Early on we had to decide what would count as a finding in a Violet report, and we knew the answer would shape every scan that followed.

We landed on a simple rule: a finding is a reproducible request paired with a verifiable effect. The agents in the exploitation phase don't stop at “this code path looks suspicious.” They have to send the request and observe the result — a leaked record, an authentication bypass, a state change that shouldn't have been possible. If the loop can't produce that pairing, the hypothesis gets dropped before it ever reaches the report.

The trade-off is real. Dropping unverified findings makes Violet's reports shorter than the ones you'd get from a scanner that prints every theoretical vulnerability it suspects. We think that's the right call. False positives erode trust faster than missed bugs do — once a report contains three things that aren't real, the reader stops believing the seven that are. We'd rather hand you a smaller report you can act on than a larger one you have to second-guess.

Where exploitation sits

The exploitation phase is the fifth phase in Violet's pipeline. Understanding where it sits explains why it works the way it does.

Pre-recon — Initial reconnaissance against the target. Port scanning, subdomain enumeration, technology fingerprinting.
Recon — Deeper discovery of the attack surface. Every endpoint, every authentication boundary, every parameter that accepts input.
Orchestration — Violet reviews the recon output and decides which attack categories are relevant to this target.
Vulnerability analysis — The agents form hypotheses about specific vulnerabilities. This phase is allowed to be wrong. It produces a queue of things worth testing, not a list of confirmed findings.
Exploitation — The agents test those hypotheses against the live application. This phase is not allowed to be wrong. A hypothesis that survives analysis but fails exploitation gets dropped here.
Reporting — The final report gets written from the confirmed findings produced by exploitation.

What makes exploitation different from analysis is that distinction: analysis produces candidates; exploitation produces evidence. A hypothesis that the analysis phase forwards to exploitation is a suspicion, not a finding. It becomes a finding only if the exploitation phase can produce a reproducible request and a verifiable response.

If you've used a static-only scanner before, the difference is that those tools never get past analysis. They produce a list of suspicions and hand it to you. We hand you a list of confirmed bugs and the requests that prove them.

Why find + prove beats find alone

The "find" phase of any security tool generates candidates. The number of candidates is large, the precision is mixed. A static analyzer might flag two hundred potentially vulnerable code paths in a typical codebase. Most of them are actually fine — the input is validated upstream, the output is encoded downstream, the framework handles the case automatically. Engineers learn quickly that scanner output is mostly noise. After enough false alarms, they stop reading.

"Find + prove" inverts the economics. If we say a vulnerability exists, we have a request that triggers it and a response that proves it. The reader doesn't have to verify our claim from scratch — the evidence block in the report shows them how we verified it. Reading a finding becomes thirty seconds of skimming the proof, not thirty minutes of investigation. The cost of a false positive becomes effectively zero, because we don't ship false positives.

The trade-off, again: we ship fewer findings than a tool that ships everything. We think that's a feature. A twenty-finding report you can act on in an afternoon is more useful than a two-hundred-finding report that requires a week of triage before anyone knows what to fix first.

The loop

The exploitation agents follow a four-step loop on every hypothesis:

Hypothesize — Read the analysis output. Pick one hypothesis to test.
Probe — Send a small, targeted request that distinguishes vulnerable from not-vulnerable.
Confirm — Interpret the response. Did the probe behave the way a vulnerable system would?
Evidence — Capture the request, the response, and the interpretation. Save it.

Here's what that looks like against a concrete hypothesis: SQL injection in a login form. The hypothesis from analysis is "this login endpoint may be vulnerable to SQL injection." The probe: send a request with a single-quote character in the username field. The confirm step: did the server return a SQL error, or did it return "Invalid credentials"? If the server returned a SQL error, we have a boolean oracle — we can now send conditional payloads and infer database contents from the response timing and shape. The evidence: the original request, the error response, and the follow-up probes that demonstrated we could extract data.

Each step is a checkpoint. If the probe doesn't show vulnerable behavior, the loop terminates and the hypothesis is dropped. There is no "well, it might still be exploitable if you squint" branch. Either we confirmed it, or it goes away.

Browser vs CLI probing

Some bugs require a real browser to confirm. Stored XSS, for example: the payload has to be rendered in a real DOM, and a real script has to fire in that context. Confirming it with a raw HTTP request alone isn't sufficient — you haven't proven client-side execution. Other bugs only need a raw HTTP request — SQL injection, command injection, server-side request forgery. A browser adds nothing to those proofs; it just adds latency.

The agents pick the right tool for the bug class:

Browser-driven probing — for client-side execution proofs: XSS, DOM-based attacks, content injection that requires rendering in a real page context.
HTTP request tools — for server-side proofs that don't require a browser: SQL injection, server-side request forgery, insecure direct object reference, mass assignment, path traversal, command injection.

Choosing the wrong tool produces inconclusive results. A stored XSS payload that "fires" in a raw HTTP response isn't a finding — it just means the payload was reflected. We use the browser when the browser is the only way to prove the impact.

What counts as confirmed

Two requirements. Both must hold.

Reproducibility. The same request, sent again, produces the same result. We do not report intermittent or first-try-only behavior. If the response varies on repeated probes — different error, different status code, inconsistent output — we don't have a confirmed finding. We may have found an interesting behavior worth investigating manually, but it doesn't clear our bar.

Verifiable effect. The response must include something an attacker would actually want — leaked data, a privilege change, an authentication bypass, a successful action that should have been blocked. A 500 error doesn't qualify on its own. A 500 error means something broke; it doesn't prove anything was exploitable. A leaked database row qualifies. An authentication bypass qualifies. A command execution output where there shouldn't have been any command execution qualifies.

The non-destructive boundary

The hard rules, exactly as the agents follow them:

No modification of existing data. No UPDATE, DELETE, DROP, or TRUNCATE statements against production tables. No deleting files outside temporary directories. No changing existing user accounts, session state, or stored records.
No bulk traffic. No denial-of-service payloads, no resource-exhaustion attacks, no concurrent connection storms. We respect HTTP 429 responses and back off.
Minimum extraction. When we prove data exfiltration, we extract one or two rows — enough to demonstrate the capability, not a full table dump.
Read-only command execution. When we prove command injection, we run identity commands — the kind that reveal who we're running as and where we are. We don't run anything destructive, and we don't leave persistent processes behind.

These rules apply automatically. Operators don't have to opt in. The agents will not pursue a hypothesis if doing so would require breaking one of them — they drop the hypothesis instead.

We can't accept a customer's permission to be destructive. The rules apply to every scan.

What we may do as part of proof

We have to be honest about the inverse. Some proofs require adding data. The non-destructive policy doesn't forbid this — it forbids modifying or deleting existing data. Creating new data for the purpose of a proof is a different operation.

Specifically, we may:

Register new accounts. Proving a mass-assignment vulnerability or a registration logic flaw often requires creating a fresh account with elevated or unexpected attributes. We do. The new account is listed in the finding so you can clean it up.
Insert into non-critical tables. Proving SQL write access requires an INSERT. We use a clearly-named test row, and the row is listed in the finding.
Upload new files. Proving a file-upload flaw requires an upload. The file is small, named with a pentest- prefix, and listed in the finding.

Every artifact we create gets listed in the report. Cleanup is on you, but the inventory is on us.

If you find an account named pentest-violet-mass-assignment after a scan, that's us. Delete it.

What happens to unconfirmed hypotheses

Every hypothesis from the analysis phase that doesn't survive exploitation gets dropped. They don't appear in the report. They aren't surfaced as "potential issues to investigate." They don't show up in a separate tier of lower-confidence findings. They're gone.

This is a deliberate choice. A "potential issues" section trains readers to ignore findings, because the maybes mix in with the confirmed. Over time, teams that see enough unverified findings in the same list as confirmed ones start treating all of them skeptically — which defeats the purpose of the confirmed findings. We'd rather report twenty-one confirmed findings than twenty-one confirmed plus fifty maybes, even though those fifty maybes contain real bugs we just couldn't reach within the time budget.

If you want to know what we hypothesized but couldn't confirm, our per-scan logs preserve the full reasoning chain. Most operators never look. The report is for what we proved.

FAQ

Does it create accounts?

Sometimes. When proving an authentication, authorization, or registration flaw requires a fresh account, we create one. The accounts we create are always listed in the finding so you can find and remove them after the scan completes.

Does it write to my database?

It will INSERT a test row to prove SQL write access if a vulnerability is suspected and the hypothesis clears our confirmation bar. It never UPDATEs, DELETEs, DROPs, or TRUNCATEs existing data. The row it inserts is listed in the finding.

Does it leave files behind?

It may write to the server's temporary directory to prove command execution. It never modifies or deletes files outside that directory. Uploaded files use a pentest- prefix and are listed in the finding.

Does it delete data?

No. Never.

Will it bring down my application?

No. We don't send denial-of-service payloads, we don't generate bulk traffic, and we respect HTTP 429 rate-limit responses. Production scans are routine — the application shouldn't notice anything beyond normal traffic patterns.

Next up

How to read a pentest report

Now that you know how findings get confirmed, this is the guide that explains how the report presents them.

Read →