Introduction

Most breaches do not start with a zero-day. They start with reconnaissance. An attacker spends anywhere from thirty minutes to several days mapping your infrastructure using tools that are freely available, documented in public tutorials, and require no special skill to operate. By the time they identify a vulnerable target, they already know more about your external surface than your internal team does.

This post walks through real attacker recon technique by technique. Not at a theoretical level, but with the actual tools, commands, and data sources used in practice. The goal is not to teach offensive security. The goal is to show you exactly what someone running a targeted recon against your organization sees, so you can find those same assets first.

What you need before you start

This guide assumes you are running recon against a domain you own or have written authorization to test. Every technique described here is detectable and, run against infrastructure you do not own, creates legal liability regardless of intent. The tools referenced are: crt.sh (browser), Shodan (browser or CLI), Google (browser), subfinder (CLI), dnsx (CLI), httpx (CLI), nuclei (CLI), GitHub (browser), and theHarvester (CLI). None require paid licenses for basic use.

Step 1: map the certificate footprint

The first thing an attacker does is pull certificate transparency logs. This takes about thirty seconds and requires nothing but a browser. Every TLS certificate ever issued for your domain is publicly logged. The SANs on those certificates reveal subdomains, internal naming conventions, and infrastructure patterns that were never meant to be public.

Go to crt.sh and search for %.example.com. The wildcard prefix returns all certificates where your domain appears anywhere in the SAN field, including certificates issued for subdomains of subdomains. Sort by date to see recent issuances first.

  • CLI version with deduplication — Returns all unique subdomain names from CT logs, sorted alphabetically. This is the baseline asset list that everything else builds on.

    curl -s 'https://crt.sh/?q=%.example.com&output=json' | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u
  • What attackers look for in the output — Naming patterns reveal infrastructure architecture. Subdomains like api-staging, admin-old, jenkins, grafana, kibana, vpn, and dev immediately signal high-value targets. Any subdomain with a date or version number in the name is likely a forgotten environment.

Step 2: run passive subdomain enumeration

CT logs cover certificates. Passive DNS covers everything that ever resolved, certificate or not. An attacker queries multiple passive DNS databases simultaneously to build a more complete picture than CT logs alone provide.

  • Subfinder against all passive sources — Subfinder queries over 40 passive sources in parallel: SecurityTrails, VirusTotal, Censys, Shodan, AlienVault, HackerTarget, and more. The output is a flat list of every subdomain found across all sources.

    subfinder -d example.com -silent -o passive_subs.txt
  • Resolve to confirm what is live — Passive results include historical data. Running the list through dnsx confirms which subdomains currently have active DNS records and returns their IP addresses.

    dnsx -l passive_subs.txt -silent -a -resp -o live_subs.txt
  • What attackers look for at this stage — The gap between CT results and passive DNS results is significant. Subdomains that appear in passive DNS but not in CT logs were running without valid TLS certificates at some point. That often means internal tooling, development environments, or services that bypassed normal deployment processes.

Step 3: search Shodan and Censys for exposed infrastructure

Shodan and Censys continuously scan the entire internet and index every service they find by IP address, port, banner, and SSL certificate. An attacker can search for your organization by name, IP range, or domain and get a list of every internet-facing service you're running, including ones that were never supposed to be public.

This step is entirely passive from the attacker's perspective. No traffic reaches your infrastructure. They are reading a cached index.

  • Shodan search by organization name — Returns all services Shodan has indexed that advertise your organization name in their SSL certificate or banner. This surfaces cloud instances, exposed databases, and services running on non-standard ports.

    shodan search 'org:"Example Corp"' --fields ip_str,port,hostnames,org
  • Shodan search by domain in SSL certificate — Returns all IPs serving SSL certificates that contain your domain name. This catches origin servers sitting behind CDNs and load balancers that mask the real IP from DNS.

    shodan search 'ssl.cert.subject.cn:example.com' --fields ip_str,port,hostnames
  • Censys query for your ASN range — If your organization controls an ASN, Censys can return every IP in that range that has open ports indexed. This finds assets that have no DNS record at all.

    censys search 'autonomous_system.asn:12345' --index-type hosts
  • What attackers look for — Open ports 27017 (MongoDB), 5432 (PostgreSQL), 6379 (Redis), 9200 (Elasticsearch), and 8080/8443 (admin panels and development servers) are primary targets. Any of these appearing on an IP associated with your organization without authentication is a critical finding.

Step 4: Google dorking for exposed files and panels

Google indexes everything it can crawl. Misconfigured web servers, exposed directory listings, login panels, and sensitive files all get indexed if they are publicly reachable. Attackers use search operators to find these without sending any traffic to your servers.

  • Find exposed admin and login panels — Returns pages under your domain that contain admin or login UI elements. These are often legitimate, but the list frequently includes staging panels, internal dashboards, and development tools that should not be internet-facing.

    site:example.com inurl:admin OR inurl:login OR inurl:dashboard OR inurl:portal
  • Find exposed configuration and environment files — Developers sometimes push .env, config.php, or wp-config.php files to public web roots. Google indexes them if the server does not restrict access.

    site:example.com ext:env OR ext:config OR ext:conf OR ext:xml filetype:config
  • Find open directory listings — Web servers with directory listing enabled expose file trees publicly. This often surfaces backup files, log files, and build artifacts.

    site:example.com intitle:"index of /" OR intitle:"directory listing"
  • Find exposed API documentation — Swagger and OpenAPI UIs are commonly deployed without authentication on staging and internal environments that get accidentally exposed to the internet.

    site:example.com inurl:swagger OR inurl:api-docs OR inurl:openapi

Step 5: mine GitHub for leaked credentials and infrastructure details

GitHub is one of the most consistently productive sources of sensitive information about an organization's infrastructure. Developers commit API keys, database connection strings, internal hostnames, and deployment scripts to public repositories, usually accidentally and usually without realizing the information is there.

Attackers search GitHub using the organization name, domain name, and any product names they identified in earlier steps. They are specifically looking for committed secrets that are still valid.

  • Search for committed secrets by domain — Searches all public GitHub content for your domain name appearing alongside common credential patterns. Check both the organization's own repositories and repositories from individual employees.

    site:github.com "example.com" password OR apikey OR secret OR token OR credentials
  • Search for internal infrastructure references — Internal hostnames, database names, and deployment configuration in public repos reveals infrastructure that is otherwise invisible to external enumeration.

    site:github.com "example.com" "staging" OR "internal" OR "prod" OR "deploy"
  • Automated scanning with truffleHog — truffleHog scans Git repositories for high-entropy strings and known secret patterns. Run against your own organization's public repositories to catch what attackers would find.

    trufflehog github --org=example-corp --only-verified
  • What attackers do with found credentials — An API key found in a public commit from two years ago may still be valid. Developers rotate credentials inconsistently. Attackers test every credential they find, even from old commits, against the corresponding service. A working AWS key found in a public repo provides direct access to cloud infrastructure.

Step 6: fingerprint live services for vulnerabilities

By this point an attacker has a list of live subdomains, their IP addresses, any additional IPs found through Shodan, and a set of potentially valid credentials from GitHub. The next step is fingerprinting each live service to identify the software stack, version numbers, and any immediately obvious vulnerabilities.

  • HTTP fingerprinting with httpx — httpx probes every live subdomain for status codes, response titles, tech stack, and server headers. This tells the attacker what software is running and whether it returns anything interesting.

    cat live_subs.txt | httpx -silent -status-code -title -tech-detect -server -o fingerprint_results.txt
  • Port scanning with naabu — naabu is a fast port scanner designed for use with large subdomain lists. It identifies open ports on each live host that httpx did not probe.

    cat live_subs.txt | naabu -top-ports 1000 -silent -o open_ports.txt
  • Automated vulnerability detection with nuclei — nuclei runs a large template library against each live target looking for misconfigurations, exposed panels, takeover candidates, and known CVEs. Security researchers and attackers both use it.

    cat live_subs.txt | nuclei -t exposures/ -t takeovers/ -t misconfiguration/ -silent -o nuclei_findings.txt
  • Checking email security posture — SPF, DMARC, and DKIM configuration is checked for every discovered domain. A missing or permissive DMARC policy means the domain can be spoofed in phishing emails with no authentication failure.

    cat live_subs.txt | dnsx -silent -txt -resp | grep -E 'spf|dmarc|dkim'

Common mistakes that make this easier for attackers

Several patterns make an attacker's recon significantly faster and more productive.

  • Predictable subdomain naming: using the same naming pattern across environments (api-dev, api-staging, api-prod) makes permutation scanning trivial. An attacker who finds api-dev immediately generates api-staging and api-prod as high-confidence guesses.
  • Leaving DNS records after deprovisioning services: CNAME records that outlive their target service are the single most common finding in external recon. The DNS record is the only signal needed to confirm the subdomain exists and check it for takeover.
  • Committing to public repos without secret scanning: secrets committed even briefly to a public repo are permanently accessible through Git history. Deleting the file does not remove it from history. Rotating the credential is the only fix.
  • Running development tooling on internet-facing infrastructure: Grafana, Kibana, Jupyter notebooks, phpMyAdmin, and similar tools are frequently deployed for convenience on the same hosts as production services, then left accessible from the public internet.
  • Using the same ASN or IP block for production and development: cloud resources in the same IP range as production systems get swept in the same Shodan and Censys queries. Development instances with weaker security controls sit next to production in the index.

How to automate and monitor this continuously

Running this recon once gives you a snapshot. Your infrastructure changes constantly. New subdomains get created, new services get deployed, credentials get committed and forgotten. The recon process needs to run on a schedule to stay useful.

For teams running their own tooling, a basic automation pipeline can be built with subfinder and httpx running on a cron schedule, with results diffed against the previous run and new findings sent to a Slack channel. Tools like Amass support scheduled passive enumeration natively.

SurfaceGuard runs continuous subdomain discovery, Shodan-equivalent port and service detection, credential exposure scanning, and email security checks across all your monitored domains automatically. When a new subdomain appears or an existing asset changes, an alert fires immediately rather than waiting for the next manual recon run.

Key takeaways

  • Attacker recon starts passively and leaves no footprint on your infrastructure. CT logs, Shodan, Censys, and GitHub searches all run without sending a single request to your servers.
  • The most valuable targets found during recon are usually the ones nobody on your team remembered: forgotten staging environments, deprovisioned services with live DNS records, and development tools left internet-facing.
  • GitHub is consistently one of the most productive recon sources. Committed credentials from months or years ago are often still valid because rotation is inconsistent.
  • Predictable naming conventions are an attacker advantage. If your team uses a standard pattern for environment naming, an attacker can generate a full list of likely subdomains from a handful of confirmed examples.
  • Running this process against your own infrastructure before someone else does is the most direct way to understand what your external exposure actually looks like. A manual audit once a quarter is not enough given how fast infrastructure changes.
  • Continuous monitoring closes the gap between what you know about your surface and what an attacker can find. The window between a new asset appearing and it being exploited is sometimes measured in hours.

Frequently asked questions

Is passive recon against my own domain legal to run?
Yes. CT log queries, Shodan lookups, and Google dorking are entirely passive. They query third-party databases that have already indexed public information. No traffic reaches your infrastructure. Active techniques like port scanning and DNS brute-force also target infrastructure you own, which is legal. The legal risk only applies when running these techniques against domains you do not own or have no authorization to test.
How long does a full external recon take?
A passive-only recon covering CT logs, passive DNS, and Shodan can be completed in under 30 minutes for most organizations. Adding active DNS brute-force against a large wordlist takes 1 to 3 hours depending on the domain size. A full pipeline including nuclei scanning across all live subdomains can run for several hours for organizations with hundreds of subdomains. Attackers typically run the passive phase first and then selectively run active techniques against the most promising targets.
What should I do if I find an exposed asset during my own recon?
Treat it as a confirmed finding requiring immediate action. Document the asset, identify the owner, and take one of three actions: take it offline if it should not be public, apply authentication if it needs to remain accessible, or remove the DNS record if the underlying service is already deprovisioned. Then add it to your monitored asset inventory so it does not get forgotten again.
Does a WAF protect against recon?
No. Passive recon never touches your infrastructure, so a WAF has nothing to intercept. Active techniques like port scanning and HTTP probing will be seen by a WAF, but the information those scans collect (open ports, response banners, tech stack) is often available from Shodan's cached index anyway. A WAF reduces exploitability of some findings but does not prevent the discovery of your attack surface.

See your infrastructure the way attackers do

SurfaceGuard runs the same discovery pipeline described in this post against every domain you monitor, continuously. Subdomain enumeration from CT logs and passive DNS, service fingerprinting, credential exposure detection, email security analysis, and subdomain takeover checks all run automatically on every scan cycle. When something new appears or changes, you get alerted before anyone else finds it.