Vulnerability SLA

What Is a Vulnerability SLA?

A Vulnerability SLA (Service Level Agreement) is a formal policy that defines the maximum allowed time to remediate a vulnerability after it’s discovered.

“Remediate” is broader than “patch.” In mature programs, the SLA recognizes three valid closure states:

Fix: Apply a patch or upgrade that removes the vulnerability.
Mitigate: Implement compensating controls that materially reduce exploitability or exposure (for example, disabling a vulnerable feature, restricting access, adding filtering).
Accept (Exception): Formally accept the risk for a limited time, with documented rationale, compensating controls (when possible), and an expiration date.

A vulnerability SLA also defines critical operational details:

When does the clock start? (first detection, vendor disclosure, ticket creation)
When does the clock stop? (deployment verified, mitigation verified, exception approved)
What assets are in scope? (prod vs non-prod, internet-facing, regulated systems)
How do you escalate misses? (leadership notification, gating, forced review)

In plain language: a vulnerability SLA is the organization’s “patching contract with itself.”

Vulnerability SLA vs Policy vs SLO

These terms get mixed up, and confusion leads to weak enforcement.

Security policy/standard: The rulebook (the “what” and “why”).
Vulnerability SLA: The measurable time commitment (the “by when”).
SLO (Service Level Objective): An internal performance goal you aim for (often stricter than the SLA), used to improve over time.

Many teams say “SLA” even when it’s internal. That’s fine as long as it’s measurable and enforced consistently.

Why Vulnerability SLAs Matter

1) They reduce “time at risk” in the real world

Vulnerabilities are not static. Over time, public write-ups appear, exploit code matures, scanners look for it, and attackers automate campaigns. An SLA reduces the window where exploitation is likely to succeed.

2) They prevent the “infinite backlog” problem

Without deadlines, vulnerability backlogs rarely shrink. They simply age. SLAs force prioritization and create a routine cadence for keeping the backlog healthy.

3) They align teams on urgency without constant negotiation

Security, engineering, and operations can disagree on what “urgent” means. An SLA makes urgency a shared agreement instead of a recurring debate.

4) They support customer trust and compliance expectations

Many customer security questionnaires ask about patch timelines. Even when a framework doesn’t mandate exact timeframes, being able to show a documented SLA and evidence of adherence is a strong maturity signal.

What a Good Vulnerability SLA Must Include

1) Scope and asset categories

Define what the SLA covers. If you don’t specify scope, you’ll get loopholes and inconsistent reporting. Common categories:

Production, internet-facing
Production, internal-only
Non-production (dev/test/staging)
Endpoints (laptops, workstations)
Servers/VMs
Kubernetes nodes and cluster components
Container images in registries
CI/CD and build infrastructure
Network devices and security appliances
SaaS configurations and integrations

Many organizations also define “crown jewel” systems (authentication, customer data stores, payment flows) with stricter timelines.

2) A severity model that reflects real risk (not just a number)

If you prioritize purely by CVSS, you’ll waste time. A strong SLA uses severity plus contextual modifiers, such as:

Exploit status (exploited in the wild, public exploit availability)
Exposure (internet-facing vs internal, authenticated vs unauthenticated)
Reachability (is the vulnerable code path actually used?)
Asset criticality (data sensitivity, business impact)
Blast radius (how widely deployed the component is)

This is how you avoid patching a theoretical issue in a dev tool ahead of an exploited vulnerability on a public-facing service.

3) Clear start/stop rules for the SLA clock

Ambiguity here makes SLA metrics meaningless. Common start choices:

First detected by tooling (best for accuracy)
Ticket created (easier, but can be gamed)
Vendor disclosure date (useful for certain classes, but inconsistent)

Common stop choices:

Fix deployed and verified in the affected environment
Mitigation applied and verified
Exception approved (with expiration)

Avoid counting “PR opened” or “upgrade scheduled” as stop conditions.

4) Ownership and routing

Every finding must have an owner. A typical mapping:

App library dependency CVEs → application team
Base image OS package CVEs → platform or image owners (shared model)
Node OS/kernel CVEs → SRE/platform
Cloud misconfigurations → cloud platform/security engineering
SaaS vulnerabilities (vendor-managed) → security + vendor management, with mitigation owner internally

Ownership rules prevent “ticket ping-pong.”

5) Exceptions that are controlled, time-bounded, and visible

You will have legitimate blockers: vendor delays, legacy systems, fragile apps, regulated change windows. Without a real exception process, teams will either ignore the SLA or hide issues.

A good exception process defines:

who can approve (by risk level)
what evidence is required
what compensating controls must be applied
expiration date and re-review requirements
maximum exception duration (to prevent permanent deferral)

6) Escalation and enforcement

If an SLA miss has no consequence, it’s not an SLA. Enforcement doesn’t need to be punitive, but it must be real:

auto-escalate overdue Critical/High to leadership
require weekly review of aging high-risk vulnerabilities
deployment gating for specific categories (for example, production internet-facing Critical)
scorecards and reporting by team/service

Example Vulnerability SLA Table

This template is intentionally conservative and common in many programs. You should tailor it to your deployment frequency and risk appetite.

| Severity / Condition | Production (Internet-facing) | Production (Internal) | Non-Production | | --------------------------------------------------------------- | ---------------------------- | --------------------- | -------------- | | Critical (RCE, auth bypass, wormable, or exploited in the wild) | 24–72 hours | 7 days | 14–30 days | | High | 7 days | 14–30 days | 30–60 days | | Medium | 30 days | 60–90 days | 90 days | | Low | 90–180 days | 180 days | Best effort |

Recommended “override rules” (to make the SLA accurate)

Tighten timelines if:

exploited in the wild
internet-facing exposure
no authentication required
affects identity/auth, secrets, cryptography, or remote management interfaces
impacts a shared component used across many services

Allow measured flexibility if:

feature is disabled and cannot be enabled by an attacker
strong compensating controls exist (segmentation, WAF, strict auth)
the vulnerable component is present but not reachable in practice (with evidence)

These override rules should be written down so triage is consistent and auditable.

What Counts as “Remediation”

Fix: patch or upgrade

A fix removes the underlying vulnerable code or configuration. Examples:

upgrading a library version
applying an OS package update
rebuilding and redeploying a container image with patched packages
updating a network device firmware

Fixes are best, but they may require testing, maintenance windows, or coordination.

Mitigation: reduce exploitability or exposure

Mitigation is essential for meeting SLAs when patching takes time. Examples:

disabling the vulnerable endpoint or feature flag
restricting network access (VPN-only, IP allowlist, internal-only)
adding WAF protections for a specific route
enforcing authentication where it was optional
segmenting network paths to reduce reachable attack surface
tightening egress to prevent data exfiltration even if exploited

Mitigations must be verifiable. A mitigation that exists only in a ticket comment doesn’t reduce risk.

Acceptance: time-bounded exceptions with governance

Acceptance is a formal decision that the organization will tolerate the risk temporarily. A strong acceptance includes:

business justification (why it cannot be fixed now)
security risk assessment
compensating controls (where feasible)
an expiration date and re-review plan

If exceptions don’t expire, SLAs quietly collapse.