Subdomain Enumeration - What it is, why you care and how to do it well (free!)

Hi everybody! Welcome back to Open Sourcery, where we share sources, methods, tips and tricks for cyber analysts, threat hunters and anyone else who wants to learn about OSINT.

In today’s post, we’re talking about subdomain enumeration, and as usual, we’re going to talk about what it is, why you might care, and how to try your hand at it at little or no cost, with tools that require no programming skills (though we also mention a few of those).

Q: What is it?

A: Subdomain enumeration is just what it sounds like, “enumerating” or attempting to identify and count all the subdomains or “fully qualified domain names” (FQDNs) for a given registered second-level (or “apex”) domain. If those terms are unfamiliar, see our “Domain Name Basics” post here.

Q: As an OSINT practitioner, why would I care?

A: There are many use cases for subdomain enumeration, but let’s talk briefly about three of the most common.

Reason 1: Discovery of under-protected attack surface - As a security or intel person, you may well wake up every day thinking about how security activity affects the business, but the business folks generally do not wake up every day and think about how business activity affects security. As such, it is common for businesses to set up new subdomains for all manner of business purposes and not think to inform the security team.

A frequent result? Those FQDNs may not be properly secured. They might lack a TLS/SSL certificate, may not sit behind a proper Web Application Firewall, or they might be hosted outside the IP space covered by your DDoS protection. They be a pre-prod environment like QA or staging, but might lack all the top-notch protections given to “prod” environments. If your network is flat and unsegmented, that could offer an entryway to an attacker who can move into or disrupt Production because a QA box was unsecured.

Regularly and proactively checking for all extant FQDNs under an apex domain is a good way to keep an eye out for new or insecure parts of your own attack surface that the business may have created without letting the Security team in on that fact.

Reason 2: Discovery of “Shadow IT” - IT and engineering teams, constantly under pressure to deliver results and capabilities to the business, will often spin up a new FQDN for any number of purposes, from a one-weekend internal hackathon to a new environment for remote, outsourced QA testers to bang on draft code. Often undocumented, these FQDNs can live on for years, even long after the original use is forgotten, or the creators and users have left the company. Still, that host may continue to sit out there on the public internet, still accessible, and often containing juicy bits of intel, rapidly obsolescing, and going unpatched.

Reason 3: It’s what the bad guys are going to do - Subdomain enumeration is an absolutely fundamental step in the Recon phase of a cyber attack. For exactly the reasons noted above, small, outdated and unused FQDNs are a rich stalking ground for adversaries. See your own domain the way they do. Whenever I’m playing on the “red” team, one of my favorite things to do is enumerate the subdomains for a target web site exactly as outlined here, then do a simple text search on the list of subdomains I’ve gathered for strings like Dev, Test, QA, Stage and Stg. I’ve rarely seen a large or even medium-sized company that didn’t have at least a few lower-environment servers - often less well-patched or attended to than “Prod” - exposed on the public internet.

Recent vulnerabilities in Atlassian products also demonstrate the value of checking for various obvious brand and product names that can tell you if a domain uses certain third-party technologies. For example, I would also check for “jira” and “confluence” as well as Christmas gifts like “owa.example.com” and “webmail.example.com” which often require only a user email and password (WAY too many companies still haven’t enabled MFA on email). At least a few recent compromised credentials for almost any domain can often be found on Pastebin, the dark web or hacking forums in a few minutes.

None of this is good news, mind you, but better we do it to ourselves than wait for the bad guys to find the holes for us, right?

Q: OK, I get it! It’s important! So, HOW do I do it?

A: While there are certainly paid platforms and sources, you can often do a good-to-excellent job of subdomain enumeration without spending a penny. There are command line tools like AMASS, Sublist3r and Subfind3r available on github if you’re comfortable with a CLI and running some python, but as always, we focus here primarily on approaches that almost anyone can use.

Since most open-source methods are not typically as good as some of the commercial options, we’re going to use not one but several free tools and deduplicate the overlapping results. I suggest using all five of the options below to ensure the most complete discovery.

Put your domain of interest into the search on each of these pages and then export (some offer it) or screen-scrape the results with your mouse. Since most sources will include a bunch of additional data/columns, I suggest you open an Excel workbook and paste the results from each source into its own tab or column using “Paste Special - Plain Text”.

Output formatting is highly varied, but you can use Excel’s Data menu including SORT, FILTER and CUSTOM FILTER options along with “Remove Duplicates” to clean up each list. Then grab just the subdomain list from each cleaned up source and paste them ALL into Column A on a “merged sources” tab. (You can download an example workbook here where I’ve done exactly this for the Twitter.com domain so you can see how it’s done.)

Using the “Data” menu to deduplicate again, sort the results in Column A on the merged tab. Presto!

By using the differing and usually-incomplete results from these multiple sources, you should get a pretty darn good estimate of all subdomains for that apex domain, certainly one similar to what an outside adversary would see, since many attackers will use a similar approach. For example, here are the totals by source for the “twitter.com” exercise:

  • C99.nl: 424 subdomains
  • WHOISXML: 696 Subdomains
  • DNSDumpster: 100 subdomains (capped, so not great for large domains)
  • HackerTarget: 246 subdomains
  • OSINT.sh: 424

MERGED, UNIQUE TOTAL: 810

In other words, even the best single source, in this case WHOISXML had, at maximum, about ¾ of the actual total. That’s why we use multiple sources. I’ve also highlighted in the sample file more than a dozenrows that contain potentially interesting examples like “dev” “test” etc. in the hostname as discussed above.

If you want to go even deeper, sign up for a free account on Censys.io and check out our post on “Using Certificate data to find fraud and security holes.”

Happy (subdomain) hunting!

Contact

Get in Touch

Like everyone in cyber, I'm kinda busy, so I can't promise how fast I'll get back to you, but feel free to shoot me a note using the form below. Thanks!