image

Thousands of Corporate Secrets Were Left Exposed. This Guy Found Them All

If you know where to look, plenty of secrets can be found online. Since the fall of 2021, independent security researcher Bill Demirkapi has been building ways to tap into huge data sources, which are often overlooked by researchers, to find masses of security problems. This includes automatically finding developer secrets—such as passwords, API keys, and authentication tokens—that could give cybercriminals access to company systems and the ability to steal data.

Today, at the Defcon security conference in Las Vegas, Demirkapi is unveiling the results of this work, detailing a massive trove of leaked secrets and wider website vulnerabilities. Among at least 15,000 developer secrets hard-coded into software, he found hundreds of username and password details linked to Nebraska’s Supreme Court and its IT systems; the details needed to access Stanford University’s Slack channels; and more than a thousand API keys belonging to OpenAI customers.

A major smartphone manufacturer, customers of a fintech company, and a multibillion-dollar cybersecurity company are counted among the thousands of organizations that inadvertently exposed secrets. As part of his efforts to stem the tide, Demirkapi hacked together a way to automatically get the details revoked, making them useless to any hackers.

In a second strand to the research, Demirkapi also scanned data sources to find 66,000 websites with dangling subdomain issues, making them vulnerable to various attacks including hijacking. Some of the world’s biggest websites, including a development domain owned by The New York Times, had the weaknesses.

While the two security issues he looked into are well-known among researchers, Demirkapi says that turning to unconventional datasets, which are usually reserved for other purposes, allowed thousands of issues to be identified en masse and, if expanded, offers the potential to help protect the web at large. “The goal has been to find ways to discover trivial vulnerability classes at scale,” Demirkapi tells WIRED. “I think that there’s a gap for creative solutions.”

Spilled Secrets; Vulnerable Websites

It is relatively trivial for a developer to accidentally include their company’s secrets in software or code. Alon Schindel, the vice president of AI and threat research at the cloud security company Wiz, says there’s a huge variety of secrets that developers can inadvertently hard-code, or expose, throughout the software development pipeline. These can include passwords, encryption keys, API access tokens, cloud provider secrets, and TLS certificates.

“The most acute risk of leaving secrets hard-coded is that if digital authentication credentials and secrets are exposed, they can grant adversaries unauthorized access to a company’s code bases, databases, and other sensitive digital infrastructure,” Schindel says.

The risks are high: Exposed secrets can result in data breaches, hackers breaking into networks, and supply chain attacks, Schindel adds. Previous research in 2019 found thousands of secrets were being leaked on GitHub every day. And while various secret scanning tools exist, these largely are focused on specific targets and not the wider web, Demirkapi says.

During his research, Demirkapi, who first found prominence for his teenage school-hacking exploits five years ago, hunted for these secret keys at scale—as opposed to selecting a company and looking specifically for its secrets. To do this, he turned to VirusTotal, the Google-owned website, which allows developers to upload files—such as apps—and have them scanned for potential malware.

VirusTotal’s Retrohunt feature allows a year’s worth of uploaded files to be scanned and uses YARA rules, which can look for specific patterns in data. “What if we reuse those tools and VirusTotal’s petabytes of data, and now we look for secrets instead,” Demirkapi says. Using a complex serverless setup, Demirkapi says he scanned through more than 1.5 million samples for secrets and validated that the patterns he found were active secret keys. To determine the secrets and keys hadn’t expired, he performed API calls on them. In total, Demirkapi has found more than 15,000 active secrets of all kinds.

Within the vast number of exposed keys were those that could give an attacker access to the digital assets of companies and organizations, including the potential to obtain sensitive data. For instance, a member of Nebraska’s Supreme Court had uploaded details of usernames and passwords linked to its IT systems, and Stanford University Slack channels could be accessed using API keys.

Nebraska State Court Administrator Corey R. Steel says all the exposed details were immediately changed, there is no evidence that the details were abused, and policies have been changed to stop similar future instances. Stanford University did not respond to a request for comment; however, correspondence seen by WIRED indicates the issues were quickly fixed after they were reported.

Demirkapi also scoured passive DNS replication data, to search for websites with dangling subdomain issues. Vulnerable websites can be impersonated, used to deploy malware or phishing pages, steal cookies, and more. “Dangling domains are widespread, and it’s pretty easy for attackers to find high-valuable targets,” says Daiping Liu, a senior research manager at Palo Alto Networks. Liu says tens of thousands of dangling records are exposed at any one time, adding that larger domains can be more susceptible to the issue as they’re harder to manage and there’s more chance for human error.

For example, Demirkapi briefly published an (almost convincing) satirical article on a New York Times production domain with the headline “U.S. Declares War Against Russia Amid Escalating Tensions, Sending Shockwaves Through International Community.” This was removed after around a week, Demirkapi says. A spokesperson for The New York Times declined to comment.

The researcher says by starting with dangling cloud resources instead of looking for issues with a specific domain or set of domains allows for issues to be discovered systematically. Overall, he found more than 78,000 dangling cloud resources linked to 66,000 apex domains. Pointing to academic research that followed a similar technique using passive DNS replication data, but starting with URLs, Demirkapi says his approach was able to find magnitudes more issues.

No Easy Fixes

Finding thousands of vulnerable websites and exposed secrets is one thing—getting them fixed is another. While Demirkapi says it has not been possible to alert all websites with dangling domain issues to the problems; he has managed to find ways to clean up the 15,000 hard-coded secrets.

Some Demirkapi directly reported to impacted companies. But he also turned to those providing credentials to their customers to see if there was a more efficient way to report the exposed secrets. In February, the researcher reported more than 1,000 exposed OpenAI API keys. The firm provided him with a public self-service API key that allows the exposed details to be automatically revoked. (OpenAI company spokesperson Niko Felix says the API “enables automatic deactivation of any keys detected as compromised” and allows customers to be kept safe.) 

Other instances didn’t go so smoothly. GitHub, which hosts more than 420 million code repositories, has for years run its own “secret scanning” tool that can detect tokens and keys that are uploaded to its website. It partners with external companies so these keys can be reported and potentially revoked. Demirkapi asked GitHub, in March, if it had a publicly available endpoint where he could report secrets so the thousands he found could be quickly flagged. A company spokesperson says it doesn’t have systems available for individuals.

Demirkapi turned to Amazon Web Services, but the company refused to provide him with access to existing reporting tools it has for its vendors. “We believe firmly that customer credentials, including security keys, belong solely to customers. AWS does not grant external users access to manage or revoke security keys as that would violate security policies and erode customer trust,” says Aisha Johnson, an AWS spokesperson, adding people can email its security team and it will tell customers when it becomes aware of exposed keys.

To get around the limitations, Demirkapi turned to a GitHub and started uploading secrets to trigger the company’s secret scanning and get them reported. “I found a way of not making it exposed to the public at all,” Demirkapi says of the automated method he hacked together to upload secrets in notes.

Ultimately, Demirkapi says he picked low-hanging fruit for the research. “Detecting a hard-coded secret or detecting if a resource is dangling, those are fairly trivial classes of vulnerability," he says, adding that more complex vulnerabilities could potentially be detected in big data sources. There may be plenty of untapped databases that can help fix security issues. “I think that we need to think more about leveraging these large data sources to derive value from them in unconventional ways,” Demirkapi says.