On this page
Connecting your repo to Violet is optional. Plenty of scans run black-box — Violet hits the live application from the outside, with no source code access, the way an external attacker would. The trade-off is real, and we'll explain it. The short version: if we can read your code, we find more, find it faster, and tell your engineers exactly which line to fix. We default to recommending repo access for that reason.
Two modes, both valid
In black-box mode, Violet only sees what an unauthenticated user — or an authenticated user, if you provide credentials — can see from the outside. We treat the application as opaque. There is no access to routing logic, to ORM queries, to service wiring. We probe inputs and observe outputs. The application is a black box, exactly as it would appear to an external attacker.
In source-connected mode, Violet has read-only access to your repo through a GitHub or GitLab integration. The recon and analysis agents read the source while the exploitation agents probe the live application. The two halves correlate — a suspicious code path in the analysis becomes a targeted probe in exploitation. We're operating with the same information a well-prepared insider attacker would have.
Both modes produce real findings. Source-connected scans produce more findings, with more precision, in the same wall-clock time.
What source access unlocks
False-positive reduction. Without code, we sometimes have to flag a hypothesis as suspected because we can't see whether the input is sanitized before it reaches the database. With code, we can read the path from request handler to query and confirm or rule out the hypothesis before we even send the first probe. That distinction — confirmed versus suspected — is the difference between a finding that lands in the report and one that gets dropped before it ever reaches you.
Exact line references. Findings can name app/routes/login.py:167 instead of "the login endpoint somewhere." Engineers fix faster when they don't have to grep. A line reference collapses a 30-minute investigation into a 30-second lookup.
Cross-service dataflow. Modern applications span multiple services. Reading source lets us trace user input across service boundaries — through queues, through helper modules, through ORM layers. A black-box scan observes the entry point and the exit point. It cannot follow the path in between. A source-connected scan can walk that path and identify where sanitization breaks down, even if no single step looks dangerous in isolation.
Style-matched fixes. When we propose a fix, we can write it in the same style as your codebase. If your team uses Pydantic, the fix uses Pydantic. If your team uses raw psycopg2, the fix uses parameterized psycopg2. Engineers are more likely to merge a fix that already looks like their code. A fix written in an unfamiliar pattern adds a review burden on top of the security problem.
Each of those improvements compounds. A scan with five fewer false positives, line-precise references, dataflow context, and style-matched fixes turns into a triage session that takes a Monday morning instead of a Monday.
When black-box is the right call
There are real scenarios where black-box is the correct mode. They're worth naming plainly.
- You're testing third-party software you don't own — a SaaS application you licensed, a vendor's portal, a partner integration. You can't share their code anyway, and the question you're answering is "can an outsider exploit this," not "is the code well-written."
- You're running a compliance audit that explicitly requires external attacker perspective testing. Some frameworks require demonstrating that the application can withstand attack from a party with no privileged access. Black-box is the audit-acceptable mode.
- You're red-teaming. The point of a red-team engagement is to simulate what an outside attacker actually does — no source access, no insider knowledge. Giving us the code undermines the premise of the exercise.
- You're operating in an environment where the codebase contains regulated data and you cannot share it with any third party. This is uncommon, but it's a real constraint in certain government and healthcare contexts.
- The codebase is in a language or framework we don't support yet. We support most of what teams are likely running. If you're on something unusual, reach out and we'll tell you the current coverage.
Outside those scenarios, we connect the repo. The information asymmetry is too large to throw away.
The trust model
We use OAuth with read-only scope. We can read your repository contents. We cannot push, open pull requests, change settings, or modify branches. The access is deliberately minimal — we need to read the code, nothing else, and the OAuth scope reflects that.
We pull the repository at scan start, store it in an isolated workspace for the duration of the scan, and remove it when the scan completes. The workspace is per-scan. One customer's workspace cannot see another's. Isolation is structural, not policy-based.
We do not retain repository contents beyond the scan window. The artifacts produced by the analysis — deliverables, evidence files, the final report — reference code by file path and line number, not by including full source dumps. The report cites your code; it does not embed it.
You can disconnect the integration at any time from the dashboard. Disconnecting revokes future access. It does not delete past report artifacts, because those artifacts are structured references, not source copies. We also support per-scan repo selection — you can connect five repos and tell a specific scan to use only one of them.
Connecting a repo
The process takes under two minutes:
- Open Settings → Integrations in the dashboard.
- Choose GitHub or GitLab. Click Authorize. You'll be redirected to the provider's consent screen.
- On the consent screen, choose which organizations or repositories to grant access to. The default is all repositories in the selected organization. You can restrict this to specific repositories — that restriction is enforced by the provider, not by us.
- Once connected, the New Pentest wizard's "Source code" step shows your connected repositories. Select up to five per scan.
The maximum of five repositories per scan is deliberate. Connecting more than that usually signals that the scan's scope is too broad. We'd rather run multiple targeted scans — each with a focused repository selection — than one large scan where the analysis surface exceeds what the exploitation phase can cover meaningfully in the same time budget.
When code and live app disagree
This happens. Code says one thing; the live application behaves like another. Two reasons it usually matters:
- The deployed version isn't the latest commit on the main branch. This is very common — feature flags, branch deploys, hotfixes that didn't get back-merged, staged rollouts that aren't complete. What we read in source reflects the repository; what we probe in exploitation reflects what's actually serving traffic.
- Runtime configuration overrides what's in code. Environment variables, feature flags, a WAF sitting in front of the application — all of these can cause behavior that differs from what the source code suggests.
We handle this by trusting the live application for behavior and the code for explanation. If the live application accepts an injection payload, that's the truth, regardless of what the code says. If the code shows a fix that the live application doesn't appear to have, we report the live behavior — with a note that the code looks correct — so your team can investigate the deployment gap.
If you see a finding where the code seems to handle the case but the live application doesn't, check what's actually deployed.
A one-line decision rule
We've stated the trade-offs honestly. The cases where black-box is the right call are real but narrow. For most teams running scans against software they own, source access is a free upgrade — same scan, same time, more findings, more precision. The recommendation isn't subtle: connect it.