Introduction
On March 8, 2017, Apache disclosed CVE-2017-5638, a remote code execution vulnerability in Apache Struts. A patch was available the same day. Equifax's security team received the disclosure. Their internal process required scanning for the vulnerability within 48 hours of notification and patching within 30 days. The scan ran. The vulnerable system was not identified because it was not in scope for the scan. On May 13, 2017, an attacker began exploiting the vulnerability on that system. The breach was not detected until July 29. By then, the personal data of 147 million people had been exfiltrated.
The Equifax breach is the canonical example of a remediation program failure, and not because the organization had no program. They had a documented process with defined SLAs. The process failed because the asset was out of scope, the scan did not cover it, and the verification step (confirming the patch had been applied across all assets) was not sufficient to catch the gap.
An external attack surface remediation plan has to solve two problems that most security programs treat as one. The first is finding the right things to fix. The second is ensuring that fixes actually get applied, verified, and stay applied. Most teams have a solution to the first problem. Scanners, EASM platforms, and vulnerability tools produce findings reliably. The second problem is where remediation programs routinely break down, and it breaks down in specific, predictable ways. This guide covers how to build a remediation plan that addresses both.
Why remediation plans fail
Before designing a plan, it is worth being specific about the failure modes. Each one has a structural cause that a well-designed plan addresses directly.
- No named owner — A finding assigned to a team is a finding assigned to nobody. Teams diffuse responsibility. When a critical finding ages past its SLA and someone asks why it was not fixed, the answer from a team assignment is always some version of 'I thought someone else was handling it.' Every finding needs a named individual responsible for the fix, even if they need to coordinate with others to implement it.
- CVSS severity does not translate to remediation effort — A CVSS 9.8 finding (critical) for a missing DMARC record takes approximately two minutes to fix: add a DNS TXT record. A CVSS 5.3 finding (medium) for a deprecated TLS 1.0 configuration on a legacy payment processing endpoint may require a 3-week coordinated change involving compliance sign-off. Prioritizing by CVSS alone produces a remediation queue where low-effort high-severity fixes compete with high-effort medium-severity fixes, and teams default to the path of least resistance rather than actual risk reduction. The remediation plan needs a two-axis view: severity and effort.
- No SLA enforcement — A finding with no due date has an implicit due date of never. Teams working through sprint cycles, customer escalations, and engineering priorities will deprioritize security findings that have no stated urgency. SLAs that exist in a policy document but are never tracked or reported on are not enforced. The plan needs SLAs that are visible in the same tools developers use and that are tracked as a program metric.
- Verification via self-attestation — A developer marks a ticket as resolved. The same finding appears in the next scan. The finding gets re-assigned. The developer says it was fixed. A re-scan on the specific asset verifies whether the fix actually took effect. Self-attestation is not verification. The only reliable verification for an external attack surface finding is a re-scan of the specific asset after the fix is applied.
- Findings mixed with no formal risk acceptance process — Not every finding gets fixed. A finding on a system that is scheduled for decommission in six weeks may be a reasonable accept. A finding that cannot be fixed without breaking a critical integration may need a compensating control. Without a formal documented risk acceptance process, accepted-risk findings stay open indefinitely in the same queue as actionable findings, contaminating the remediation metrics and making the program look worse than it is.
Step 1: triage before you plan
A remediation plan without triage produces an overwhelming backlog. Triage happens before assignment and before SLA calculation. Every finding gets placed into one of four buckets, and that placement determines what happens next.
The four buckets are not severity levels. They are action categories. A critical-severity finding can legitimately land in the 'scheduled' bucket if the system is being decommissioned. A low-severity finding can land in 'fix now' if it takes two minutes and eliminates a monitoring gap.
- Fix now — High-confidence findings with confirmed impact and a fix that can be applied within 24 to 48 hours without significant coordination. Examples: a CNAME record pointing to an unclaimed Heroku app (delete the DNS record), a DMARC policy at p=none (update the DNS TXT record to p=quarantine or p=reject), a TLS certificate expiring in 7 days (renew the certificate). These findings go directly to a named owner with a same-day or next-day due date.
- Fix in sprint — Confirmed findings that require engineering work, coordination, or testing before deployment. Examples: disabling TLS 1.0 on a legacy endpoint (requires testing that nothing depends on TLS 1.0), removing a publicly accessible admin panel (requires verifying the panel is not actively used and adding authentication). These findings get written as engineering tickets and assigned to the next available sprint with an agreed completion target.
- Scheduled maintenance — Findings that require significant architectural or infrastructure changes, vendor coordination, or scheduled maintenance windows. Examples: migrating a legacy service to TLS 1.2 (requires vendor support), rotating all API keys exposed in a historical Git commit (requires coordinating with every service that uses those keys). These findings get a project-level owner, a milestone date, and interim compensating controls where possible.
- Accepted risk — Findings where the cost or complexity of remediation exceeds the risk level in the current business context, or where the system is scheduled for replacement. Risk acceptance requires a documented decision: the finding, the reason for acceptance, the accepting party (a named person with appropriate authority), the review date, and any compensating controls in place. Accepted-risk findings leave the active remediation queue but are re-evaluated at the review date.
Step 2: establish SLAs by severity
SLAs give findings a defined urgency that persists through sprint planning, quarterly reviews, and team priority shifts. The specific time bounds matter less than their consistent application and visibility. The following SLA structure is a starting point; adjust the time bounds to match your organization's deployment cadence and risk tolerance.
SLAs should be stated in calendar days from the triage date, not from the scan date. The scan date is when the finding was identified. The triage date is when a human reviewed it and made a decision. Using scan dates without triage creates pressure to rush triage, which leads to miscategorization.
-
Critical severity: 24 hours — Reserved for findings with immediate exploitation potential and confirmed impact: active subdomain takeover candidates where the resource is claimable right now, exposed databases with no authentication on internet-facing ports, credentials confirmed active in public-facing JavaScript. A critical finding at 24 hours means work starts today, not this sprint.
; SLA tracking in Jira (example custom field) ; Finding: CNAME Takeover - old-app.example.com ; Severity: Critical ; Triage date: 2026-06-15 ; SLA due: 2026-06-16 (24 hours) ; Owner: @engineer-name ; Status: In Progress - High severity: 7 days — High-confidence findings with significant impact that require more than a one-line fix: exposed admin panels without authentication, DMARC missing entirely on primary sending domains, open ports for management services (RDP, SSH) accessible from the internet without VPN requirement, S3 buckets publicly readable.
- Medium severity: 30 days — Findings with real impact but lower exploitation urgency or higher fix complexity: deprecated TLS 1.0/1.1 support, missing security headers (CSP, HSTS with short max-age), DMARC at p=none on secondary domains, SPF with ~all rather than -all.
- Low severity: 90 days — Informational or low-impact findings: missing optional security headers on non-critical assets, certificate expiry beyond 60 days, subdomains serving redirect-only responses with minor configuration gaps. These findings rarely require dedicated engineering time; they are typically addressed in maintenance windows or as part of broader configuration reviews.
Step 3: write remediation tickets developers will act on
The most common reason a valid security finding does not get fixed is that the ticket written to track it does not give the developer enough information to act. A ticket that says 'fix DMARC misconfiguration on example.com' requires the developer to understand what DMARC is, look up the current record, determine what the correct policy should be, find the documentation for their DNS provider, and implement the fix. At any point in that chain, the ticket gets deprioritized because it requires research the developer does not have time to do during an active sprint.
A ticket that gets acted on contains six specific elements.
- 1. The finding in plain language — What is wrong, stated without security jargon. Not 'DMARC policy enforcement level is insufficient.' Instead: 'Anyone can send email that appears to come from @example.com. Your DMARC record is set to monitoring-only, which means your domain can be used in phishing emails targeting your customers and your employees without any email server rejecting them.'
- 2. The affected asset — The exact subdomain, IP, port, or DNS record. Not 'your email infrastructure.' The specific record: _dmarc.example.com currently returns: v=DMARC1; p=none; rua=mailto:dmarc@example.com
-
3. The exact fix — The specific configuration change, command, or DNS record to apply. Not 'update your DMARC policy.' The exact record to set:
; Current (vulnerable): _dmarc.example.com TXT "v=DMARC1; p=none; rua=mailto:dmarc@example.com" ; Fix: update to quarantine or reject _dmarc.example.com TXT "v=DMARC1; p=quarantine; pct=100; rua=mailto:dmarc@example.com" ; Verification command (run after fix): dig TXT _dmarc.example.com ; Expected output: p=quarantine or p=reject - 4. The verification step — The exact command the developer runs after the fix to confirm it worked. Including this in the ticket removes the ambiguity about how to verify, makes the developer's job feel complete when they run it, and produces the evidence needed to close the ticket with confidence.
- 5. The SLA due date — The calendar date by which this finding must be resolved, visible in the ticket's due date field (not buried in a description). This makes the urgency visible in sprint planning views and backlog grooming without requiring a separate reminder.
- 6. Links and references — A link to the finding in the EASM platform, a link to the relevant documentation for the fix, and if applicable a link to the policy or compliance requirement that makes this mandatory. Tickets with supporting links get actioned faster because the developer does not need to leave the ticket to understand the context.
Step 4: integrate with your existing workflow
A remediation plan that lives in a separate security tool that developers do not open will not get executed. The findings need to appear in the same place developers do their work: Jira, GitHub Issues, Linear, or whichever issue tracker your engineering team uses for sprint planning.
Most EASM platforms, including Externalsight, support webhook output that can feed into issue trackers automatically. A finding that triggers a webhook can create a Jira ticket, assign it to the correct team based on the affected service, and set the due date based on severity. This automation eliminates the manual step where a security engineer exports findings to a spreadsheet and then re-enters them into Jira, which is the step where information gets lost and remediation velocity drops.
For teams without webhook integration, a consistent export cadence works: export new findings weekly, create tickets in the engineering backlog using the six-element format above, and assign them in the weekly sprint planning meeting. The key is that security findings go into the same backlog prioritization process as engineering work, not a separate queue that engineers never look at.
-
Externalsight webhook integration for automatic ticket creation — Externalsight's Fortress plan supports per-domain webhook overrides. Configure a webhook pointing to your issue tracker's API to automatically create tickets when new findings are detected.
# Example: Externalsight webhook payload (new finding) { "event": "finding_detected", "domain": "example.com", "asset": "old-app.example.com", "finding": { "code": "SUBDOMAIN_TAKEOVER", "severity": "high", "title": "Subdomain takeover candidate: old-app.example.com", "description": "CNAME points to example-old-app.herokuapp.com which is unclaimed", "remediation": "Delete DNS record: old-app.example.com CNAME example-old-app.herokuapp.com", "verification": "dig CNAME old-app.example.com - expected NXDOMAIN after fix" }, "sla_due": "2026-06-22", "scan_id": "scan_abc123" } # This payload can be parsed by a middleware service that creates # Jira tickets, GitHub issues, or Linear tasks automatically
Step 5: verify with a re-scan, not a checkbox
A finding should only be closed when a re-scan of the specific asset confirms the fix is in place. Developer self-attestation ('I fixed it, closing the ticket') is not sufficient because the fix may be incomplete, the deployment may have failed silently, or the change may have been applied to the wrong environment.
For external attack surface findings, verification is fast and unambiguous. A DMARC record fix takes 30 seconds to verify with a dig query. A closed port takes 30 seconds to verify with nmap or naabu. A removed CNAME takeover candidate takes 10 seconds with a dig query. The verification step takes less time than writing the ticket update.
When EASM platform continuous monitoring is running, the verification happens automatically: if the finding is absent in the next scheduled scan after the fix date, it is confirmed resolved. If it reappears, either the fix was incomplete or a subsequent change regressed it. The re-appearance of a finding that was previously closed is a high-priority signal: it indicates either a deployment process that is overwriting the fix, a miscommunication about which environment was patched, or a change management gap.
-
Verification commands by finding type — Fast verification commands for the most common external attack surface finding types.
; DMARC policy fix dig TXT _dmarc.example.com | grep -oP 'p=\w+' ; Expected: p=quarantine or p=reject ; CNAME takeover remediation dig CNAME old-app.example.com ; Expected: NXDOMAIN (record deleted) or response pointing to live resource ; TLS 1.0/1.1 disabled openssl s_client -tls1 -connect example.com:443 2>&1 | grep -E 'error|Protocol' ; Expected: handshake failure error ; Port closed nmap -p 6379 target.example.com ; Expected: 6379/tcp filtered or closed ; Security header added curl -sI https://example.com | grep -i 'strict-transport-security' ; Expected: header present with correct max-age ; S3 bucket restricted curl -s https://example-assets.s3.amazonaws.com ; Expected: AccessDenied XML response, not 200 with file listing
How to assess your current remediation posture
Before building a new plan, a brief assessment of the current state tells you which failure mode you are primarily solving. Run a scan of your external attack surface and answer these four questions against the current finding list.
First: what percentage of findings from your last scan more than 30 days ago are still open? A high percentage indicates either an SLA problem or a workflow integration problem. Second: for findings that were closed in the last quarter, how many have re-appeared since closure? Re-appearances indicate verification failures or regression without monitoring. Third: what is the age of your oldest open finding? If the oldest finding predates your current engineering team, the plan has never been operationally functional. Fourth: how many open findings have no assigned owner? Any non-zero count indicates the ownership model is broken.
Measuring whether the program is working
A remediation program without metrics cannot be improved. Three metrics cover the operational health of an external attack surface remediation program without requiring complex tooling.
-
Mean time to remediate (MTTR) by severity — The average number of days between triage date and verified close date, calculated separately for each severity level. MTTR for critical findings should be below your critical SLA (24 hours). MTTR for high findings should trend toward your 7-day SLA. Track this monthly. A rising MTTR for a specific severity level indicates either a capacity problem (too many findings, not enough engineering time) or a workflow problem (findings not reaching the right owner fast enough).
; Calculate MTTR from your issue tracker export ; Pseudocode for a findings CSV with columns: ; finding_id, severity, triage_date, close_date python3 << 'EOF' import csv from datetime import datetime from collections import defaultdict mttr = defaultdict(list) with open('findings.csv') as f: for row in csv.DictReader(f): if row['close_date']: triage = datetime.strptime(row['triage_date'], '%Y-%m-%d') closed = datetime.strptime(row['close_date'], '%Y-%m-%d') days = (closed - triage).days mttr[row['severity']].append(days) for severity, days_list in mttr.items(): avg = sum(days_list) / len(days_list) print(f'{severity}: {avg:.1f} days average MTTR ({len(days_list)} findings)') EOF - Open finding count trend by severity — The total count of open findings per severity level, tracked weekly or monthly. A declining trend means the program is closing findings faster than new ones are being discovered. A flat trend means the program is keeping pace. A rising trend means either the organization's attack surface is growing faster than remediation velocity, or the program is not working. This metric should be reported to engineering leadership monthly alongside the MTTR.
- Repeat finding rate — The percentage of closed findings that reappear in a subsequent scan within 90 days. A repeat finding means either the fix was not applied correctly, the fix was applied to one environment but not all environments, or a subsequent deployment regressed the configuration. A repeat finding rate above 10 percent indicates a systematic deployment or change management problem that the remediation program cannot solve on its own. It requires a conversation with the engineering team about how configuration changes get applied and verified across environments.
Real-world context: the Equifax breach (2017)
The Equifax breach that exposed 147 million records in 2017 is the most comprehensively documented remediation program failure in the public record. CVE-2017-5638, a remote code execution vulnerability in Apache Struts 2, was disclosed on March 8, 2017. A patch was available the same day. Equifax's internal security policy required scanning for the vulnerability within 48 hours of notification and applying the patch within 30 days of disclosure.
The scan ran. The vulnerable system was not identified because it was running a version of Apache Struts that the scanner did not detect correctly, and the system was not in the scan scope maintained by the team responsible for patching it. The system belonged to a legacy web application that had been acquired through a business unit acquisition and was not fully integrated into the main vulnerability management program's asset inventory.
Exploitation began on May 13, 2017, 66 days after the patch was available. The breach ran undetected for 78 days. The failure was not a missing patch. The failure was an asset that was not in scope, a scan that did not cover it, and a verification process that did not confirm the patch had been applied across all affected systems. The Senate Commerce Committee hearing and the FTC's subsequent investigation both identified the asset inventory and scan scope gap as the primary causal factor.
The direct applicability to external attack surface management: a remediation program that operates against a manually maintained asset list will always have gaps. The asset that causes the breach is consistently the one that was not in scope. Automatic asset discovery, continuous monitoring, and verification via re-scan rather than manual checklist address each of the three failure points the Equifax investigation identified.
Key takeaways
- Triage before assignment. Every finding goes into one of four categories (fix now, fix in sprint, scheduled, accepted risk) before it gets assigned to an owner. Skipping triage produces a backlog where critical two-minute fixes compete with six-week infrastructure projects.
- Every finding needs a named individual owner, not a team. Team assignments diffuse responsibility to the point where no one acts. The named owner coordinates with others but is accountable for the finding reaching verified closure.
- CVSS severity alone does not determine remediation priority. A two-axis view (severity and estimated effort) produces a more actionable queue. Fix low-effort critical findings immediately. Track high-effort medium findings through project planning.
- Remediation tickets written for developers need the finding in plain language, the exact affected asset, the specific fix with syntax, the verification command, and the SLA due date. Tickets missing any of these elements consistently get deprioritized.
- Verification via re-scan, not self-attestation. A finding is closed when a scan confirms it is gone, not when a developer updates a ticket status. Repeat findings (findings that reappear after closure) indicate a systematic deployment or change management problem, not a security problem.
- MTTR by severity, open finding count trend, and repeat finding rate are the three metrics that tell you whether the remediation program is working. Any other reporting is noise until these three are healthy.
Frequently asked questions
- How do we handle a finding that keeps reappearing after we fix it?
- A finding that reappears after a confirmed fix is a deployment problem, not a security problem. The fix was applied to one environment and a subsequent deployment overwrote it, or the fix was applied to one of several servers behind a load balancer and the others were missed, or the configuration change was not included in the infrastructure-as-code so it gets reset on the next provisioning run. The remediation step is to find where the configuration lives at the source level (the IaC template, the nginx config in the base AMI, the DNS record in the Terraform module) and fix it there. Fixing it at the instance level without fixing the source produces exactly the reappearance pattern you are seeing.
- We have hundreds of open findings. Where do we start?
- Start with triage, not remediation. Take every open finding and place it into one of the four buckets: fix now, fix in sprint, scheduled, accepted risk. You will likely find that 70 to 80 percent of open findings are low-severity informational items that can be accepted or scheduled for quarterly maintenance. The fix-now bucket for most organizations has 5 to 15 findings. Start there. Getting critical and high findings under control before addressing the medium and low backlog is more risk-effective than working through the full list in order.
- Should we fix everything before doing the next scan?
- No. The next scan should run on its scheduled cadence regardless of whether the previous findings are resolved. Running scans only when the previous cycle is complete means your monitoring cadence degrades to match your slowest remediation time. New assets appear, configurations drift, and new findings emerge continuously. The scan cadence and the remediation cadence are independent. What matters is that you have a clear picture of what is open, what is being worked on, and what the trend line looks like.
- How do we handle findings on systems owned by another team?
- Assign ownership to the team lead of the team that owns the system, not to your security team. Security engineers do not have production access to every system in the organization and should not be the ones implementing fixes on systems they do not own. Your role is to communicate the finding clearly (using the six-element ticket format), set the SLA, track the ticket, and escalate if the SLA is missed. If the other team disagrees with the severity or the fix, that is a conversation to have in triage, not after the SLA has passed.
Track remediation across your full external attack surface
Externalsight surfaces findings across your full external attack surface with per-finding severity ratings, remediation steps, and verification-ready output. Continuous monitoring flags regressions when a finding reappears after closure, so you know immediately whether a fix held or was overwritten by a subsequent deployment.