Vulnerability SLA
Vulnerability SLAWhat Is a Vulnerability SLA?
A Vulnerability SLA (Service Level Agreement) is a formal policy that defines the maximum allowed time to remediate a vulnerability after it’s discovered.
“Remediate” is broader than “patch.” In mature programs, the SLA recognizes three valid closure states:
- Fix: Apply a patch or upgrade that removes the vulnerability.
- Mitigate: Implement compensating controls that materially reduce exploitability or exposure (for example, disabling a vulnerable feature, restricting access, adding filtering).
- Accept (Exception): Formally accept the risk for a limited time, with documented rationale, compensating controls (when possible), and an expiration date.
A vulnerability SLA also defines critical operational details:
- When does the clock start? (first detection, vendor disclosure, ticket creation)
- When does the clock stop? (deployment verified, mitigation verified, exception approved)
- What assets are in scope? (prod vs non-prod, internet-facing, regulated systems)
- How do you escalate misses? (leadership notification, gating, forced review)
In plain language: a vulnerability SLA is the organization’s “patching contract with itself.”
Vulnerability SLA vs Policy vs SLO
These terms get mixed up, and confusion leads to weak enforcement.
- Security policy/standard: The rulebook (the “what” and “why”).
- Vulnerability SLA: The measurable time commitment (the “by when”).
- SLO (Service Level Objective): An internal performance goal you aim for (often stricter than the SLA), used to improve over time.
Many teams say “SLA” even when it’s internal. That’s fine as long as it’s measurable and enforced consistently.
Why Vulnerability SLAs Matter
1) They reduce “time at risk” in the real world
Vulnerabilities are not static. Over time, public write-ups appear, exploit code matures, scanners look for it, and attackers automate campaigns. An SLA reduces the window where exploitation is likely to succeed.
2) They prevent the “infinite backlog” problem
Without deadlines, vulnerability backlogs rarely shrink. They simply age. SLAs force prioritization and create a routine cadence for keeping the backlog healthy.
3) They align teams on urgency without constant negotiation
Security, engineering, and operations can disagree on what “urgent” means. An SLA makes urgency a shared agreement instead of a recurring debate.
4) They support customer trust and compliance expectations
Many customer security questionnaires ask about patch timelines. Even when a framework doesn’t mandate exact timeframes, being able to show a documented SLA and evidence of adherence is a strong maturity signal.
What a Good Vulnerability SLA Must Include
1) Scope and asset categories
Define what the SLA covers. If you don’t specify scope, you’ll get loopholes and inconsistent reporting. Common categories:
- Production, internet-facing
- Production, internal-only
- Non-production (dev/test/staging)
- Endpoints (laptops, workstations)
- Servers/VMs
- Kubernetes nodes and cluster components
- Container images in registries
- CI/CD and build infrastructure
- Network devices and security appliances
- SaaS configurations and integrations
Many organizations also define “crown jewel” systems (authentication, customer data stores, payment flows) with stricter timelines.
2) A severity model that reflects real risk (not just a number)
If you prioritize purely by CVSS, you’ll waste time. A strong SLA uses severity plus contextual modifiers, such as:
- Exploit status (exploited in the wild, public exploit availability)
- Exposure (internet-facing vs internal, authenticated vs unauthenticated)
- Reachability (is the vulnerable code path actually used?)
- Asset criticality (data sensitivity, business impact)
- Blast radius (how widely deployed the component is)
This is how you avoid patching a theoretical issue in a dev tool ahead of an exploited vulnerability on a public-facing service.
3) Clear start/stop rules for the SLA clock
Ambiguity here makes SLA metrics meaningless. Common start choices:
- First detected by tooling (best for accuracy)
- Ticket created (easier, but can be gamed)
- Vendor disclosure date (useful for certain classes, but inconsistent)
Common stop choices:
- Fix deployed and verified in the affected environment
- Mitigation applied and verified
- Exception approved (with expiration)
Avoid counting “PR opened” or “upgrade scheduled” as stop conditions.
4) Ownership and routing
Every finding must have an owner. A typical mapping:
- App library dependency CVEs → application team
- Base image OS package CVEs → platform or image owners (shared model)
- Node OS/kernel CVEs → SRE/platform
- Cloud misconfigurations → cloud platform/security engineering
- SaaS vulnerabilities (vendor-managed) → security + vendor management, with mitigation owner internally
Ownership rules prevent “ticket ping-pong.”
5) Exceptions that are controlled, time-bounded, and visible
You will have legitimate blockers: vendor delays, legacy systems, fragile apps, regulated change windows. Without a real exception process, teams will either ignore the SLA or hide issues.
A good exception process defines:
- who can approve (by risk level)
- what evidence is required
- what compensating controls must be applied
- expiration date and re-review requirements
- maximum exception duration (to prevent permanent deferral)
6) Escalation and enforcement
If an SLA miss has no consequence, it’s not an SLA. Enforcement doesn’t need to be punitive, but it must be real:
- auto-escalate overdue Critical/High to leadership
- require weekly review of aging high-risk vulnerabilities
- deployment gating for specific categories (for example, production internet-facing Critical)
- scorecards and reporting by team/service
Example Vulnerability SLA Table
This template is intentionally conservative and common in many programs. You should tailor it to your deployment frequency and risk appetite.
Recommended “override rules” (to make the SLA accurate)
Tighten timelines if:
- exploited in the wild
- internet-facing exposure
- no authentication required
- affects identity/auth, secrets, cryptography, or remote management interfaces
- impacts a shared component used across many services
Allow measured flexibility if:
- feature is disabled and cannot be enabled by an attacker
- strong compensating controls exist (segmentation, WAF, strict auth)
- the vulnerable component is present but not reachable in practice (with evidence)
These override rules should be written down so triage is consistent and auditable.
What Counts as “Remediation”
Fix: patch or upgrade
A fix removes the underlying vulnerable code or configuration. Examples:
- upgrading a library version
- applying an OS package update
- rebuilding and redeploying a container image with patched packages
- updating a network device firmware
Fixes are best, but they may require testing, maintenance windows, or coordination.
Mitigation: reduce exploitability or exposure
Mitigation is essential for meeting SLAs when patching takes time. Examples:
- disabling the vulnerable endpoint or feature flag
- restricting network access (VPN-only, IP allowlist, internal-only)
- adding WAF protections for a specific route
- enforcing authentication where it was optional
- segmenting network paths to reduce reachable attack surface
- tightening egress to prevent data exfiltration even if exploited
Mitigations must be verifiable. A mitigation that exists only in a ticket comment doesn’t reduce risk.
Acceptance: time-bounded exceptions with governance
Acceptance is a formal decision that the organization will tolerate the risk temporarily. A strong acceptance includes:
- business justification (why it cannot be fixed now)
- security risk assessment
- compensating controls (where feasible)
- an expiration date and re-review plan
If exceptions don’t expire, SLAs quietly collapse.






