Why We Shouldn’t Commit Secrets into Source Code Repositories

Committing secrets into source code repositories is one of the most frequent problems I see in application security code review, and has been so for at least 5 years. I’m speaking as one who has reviewed numerous code repositories for a variety of different companies. It is a problem that never seems to go away.

Like other common security problems, education and tooling can help improve the situation. This blog plays on the education side. A list of examples is provided where secrets in source code repositories were exploited, or could have been exploited, for serious damage. Related information is also included, such as how these secrets tend to be found and reports about the frequency of this development sin happening.

This problem is not yet in the OWASP top 10, but maybe some day it will make it if things continue the present way.

Attackers love finding secrets in source code because it enables lateral movement. That is, compromise of one system leading to compromise of another. It can also sometimes lead to privilege escalation, particularly when a secret allows one to have higher privileges than what the developers are supposed to have.

Companies that suffered for committing secrets into source code repos

We start out with three juicy examples, where it was shown that attackers abused the commit of secrets into source code. They are Uber, Stack Overflow, and Ashley Madison. In all three cases here, the repositories were private, which did not stop the attackers.

In the Uber incident, details were given by Uber CISO John Flynn in his US Senate Testimony. The attacker somehow gained access to a private Uber GitHub repository — how the intruder got access has not been published. Within the repo was a commit of AWS S3 credentials, and that S3 bucket contained private data of approximately 57 million Uber users. Uber was so concerned about the seriousness of the breach that they made the poor judgment decision to try to hide it and pay off the attackers in exchange for taking their good-will promise of deleting the stolen data. Uber later acknowledged that this was a big mistake. More about the Uber attack can be found in the following articles: 3 lessons learned from Uber’s 57-million user hack, Uber Hack: Software Code Repository/VCS Leaked Credential Usage Detection, Uber Paid Hackers to Delete Stolen Data on 57 Million People.

Information about the Stackoverflow breach was provided in a detailed Stackoverflow blog in January 2021. Starting at the end of April and continuing through much of May 2019, an attacker had gained moderator and developer level access across all of the sites in the Stack Exchange Network, and then exfiltrated the source code and also personally identifiable information of 184 users of the Stack Exchange Network. The attacker took advantage of secrets in source code and other bad practices for protecting secrets a number of times for both lateral movement and privilege escalation. The blog writes:

“This incident brought to light shortcomings in how we’d structured access to some of our systems and how we managed secrets, both at build time and in source code.”

“Bad secret hygiene—we had secrets sprinkled in source control, in plain text in build systems and available through settings screens in the application.”

Stackoverflow blog

Their advice to others includes:

“Guard secrets better. TeamCity has a way to protect secrets, but we found we weren’t using it consistently. Educate engineers that “secrets aren’t just passwords.” Protect SSH keys and database connection strings too. When in doubt, protect it. If you must store secrets in a Git repo, protect them with git-crypt or Blackbox .”

Stackoverflow blog

The third example is Ashley Madison, which was hacked in July of 2015 by a group calling themselves “The Impact Team.” Ashley Madison is an online dating service targeted towards married people who want to cheat on their spouses. In this attack, The Impact Team leaked private details of 30 million users as a punishment for their infidelity, and also leaked the website source code, which included hardcoded database passwords, AWS S3 credentials, secret tokens for other applications, and private TLS keys. It has not been proven that the source code lead to the theft of the other data, but it appears highly likely based upon what was in it. More information here: Credentials stored in Ashley Madison’s source code might have helped attackers, Credentials in the Ashley Madison Sources.

Sometimes ethical hackers find the problems first, sometimes they’re not first

In the awesome report No need to hack when it’s leaking by Jelle Ursem & DataBreaches.net, nine health care related companies are identified that committed secrets to source code which led to the leakage of personally identifiable information and private health information of “150,000 – 200,000 patients, and possibly many more” — see screenshot below. While some of these accidents appear to be first discovered by the ethical hackers, they do note that one company (Texas Physician House Calls) had been hacked in the past and had malware on their live servers. There may be some cases where we never know whether the ethical hackers were there first.

Excerpt from “No need to hack when it’s leaking” report

Another example comes from SolarWinds, who supplies network monitoring software to the US government and a number of Fortune 500 companies. A supply chain attack in 2020 led to a foreign actor intruding in on thousands of organisations, including the US federal government. While it is not known whether this was used in the attack, an ethical researcher found that a SolarWinds developer committed ftp credentials to a public repository on GitHub in 2018. More info here: SolarWinds: Intern leaked passwords on GitHub.

Moreover, Microsoft was a victim of the SolarWinds breach, and reported that the intruders had accessed source code for 3 products. In getting access to these source code files, the intruders were searching for secrets in the source code:

“The search terms used by the actor indicate the expected focus on attempting to find secrets. Our development policy prohibits secrets in code and we run automated tools to verify compliance. Because of the detected activity, we immediately initiated a verification process for current and historical branches of the repositories. We have confirmed that the repositories complied and did not contain any live, production credentials.”

Microsoft Blog

Given that they were searching for credentials in source code when they got Microsoft source, it is hard to believe that they did not use the freely available ftp server credentials in some way when they were targeting SolarWinds.

In another example, the United Nations had a .git directory exposed on a production website. An ethical hacker group reports finding: “multiple sets of hardcoded credentials which allowed them to take control of a MySQL data and an internal survey management platform.” They later obtained access to another private GitHub repository that contained
secrets for 7 databases belonging to the UNEP.

A similar event happened in 2018, when an ethical hacker got access to the source code for ebay Japan. The person reports:

“I found out, that http://www.ebay.co.jp runs a version control repository on their production site which was not properly secured. This gave me access to the .git folder in the webroot directory which further allowed me to download the entire source code of http://www.ebay.co.jp including database passwords and much more.”

Last, it’s worth having a read of how Komodo Research performs red team exercises. In their research, they looked into some Fortune 500 companies to see how easy it is to exploit secrets committed to source code for such big organisations. Using only a few hours of effort, they had critical data of 10 of them, which included

“Enterprise admin creds, Domain admin creds, many more ‘regular’ AD creds, multiple database credentials (there is something about these connection strings that just keep popping up in repositories), SAP creds, mainframe passwords, SMTP, FTP, SSH – you name it, we had it.”

Komodo Research blog

To emphasize, your private repos are not safe to hold secrets!

Many of the examples above were from private repositories, but others were from public. Sometimes an attacker is able to find his way to your code, other times mistakes give your code away for free. For example, when Github gave valid session cookies to wrong users, or the Nissan source code leak through repo misconfiguration, or the Mercedes source code leak that was due to an error letting anybody register a fake email address belonging to that company. For more motivation, see the section on Misuse of GitHub in the No need to hack when it’s leaking report. Bottom line: never assume that only good guys are reading your code!

Of course for public repos, the situation is worse: bots are scraping code repository websites like GitHub all the time in search of private data. In 2014, a developer named Rich described his $500 screwup — accidentally committing AWS access keys to GitHub. Related: Don’t upload your important passwords to GitHub, Dev put AWS keys on Github. Then BAD THINGS happened, Slack bot token leakage exposing business critical information.

The State of Secret Sprawl on GitHub

The reader may observe that GitHub is often (not always) the holder of source code repositories that have credentials taken from them. This is a consequence of its size: GitHub has more than 50 million developers and more than 60 million repositories hosted on it. GitHub has responded by building tools to help developers, such as GitHub secret scanning. They have also published a report about secrets committed to GitHub.

I find two things surprising in this report: (1) developers may expose corporate secrets through their own personal public repositories, and (2) more often than not, secrets are committed by accident:

“Secrets present in all these repositories can be either personal or corporate and this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developers’ personal repositories”

GitHub Secret Sprawl report

“A user that first writes his code with credentials in the code so that it is easier to write/debug, he then forgets to remove it from all his files after his work is done. He then commits and pushes his changes. When he understands that he made a mistake, either he does a deletion commit or a push force so that the secrets do not appear in his current version. Most of the time, he forgets that git and the internet are not forgiving: Secrets can be accessed in the git history even if they aren’t in the current version of code anymore, and public data hosted on GitHub can be duplicated and cloned into multiple different locations.”

GitHub Secret Sprawl report

Final Remarks

As I said at the beginning, I believe that education and tooling are important for improving this frequent and very serious coding problem. This blog addresses only the education side by providing many examples of the devastating consequences of committing secrets to source code repositories. It does not address tooling to prevent it (such as pre-commit hooks) or what developers should do instead of committing secrets to source code repositories (such as injecting secrets during the production build using environmental variables or enterprise secret management solutions) — both of these are substantial topics that can be covered in other blogs.

Leave a comment