Why We Shouldn’t Commit Secrets into Source Code Repositories

Committing secrets into source code repositories is one of the most frequent problems I see in application security code review, and has been so for at least 5 years. I’m speaking as one who has reviewed numerous code repositories for a variety of different companies. It is a problem that never seems to go away.

Like other common security problems, education and tooling can help improve the situation. This blog plays on the education side. A list of examples is provided where secrets in source code repositories were exploited, or could have been exploited, for serious damage. Related information is also included, such as how these secrets tend to be found and reports about the frequency of this development sin happening.

This problem is not yet in the OWASP top 10, but maybe some day it will make it if things continue the present way.

Attackers love finding secrets in source code because it enables lateral movement. That is, compromise of one system leading to compromise of another. It can also sometimes lead to privilege escalation, particularly when a secret allows one to have higher privileges than what the developers are supposed to have.

Companies that suffered for committing secrets into source code repos

We start out with three juicy examples, where it was shown that attackers abused the commit of secrets into source code. They are Uber, Stack Overflow, and Ashley Madison. In all three cases here, the repositories were private, which did not stop the attackers.

In the Uber incident, details were given by Uber CISO John Flynn in his US Senate Testimony. The attacker somehow gained access to a private Uber GitHub repository — how the intruder got access has not been published. Within the repo was a commit of AWS S3 credentials, and that S3 bucket contained private data of approximately 57 million Uber users. Uber was so concerned about the seriousness of the breach that they made the poor judgment decision to try to hide it and pay off the attackers in exchange for taking their good-will promise of deleting the stolen data. Uber later acknowledged that this was a big mistake. More about the Uber attack can be found in the following articles: 3 lessons learned from Uber’s 57-million user hack, Uber Hack: Software Code Repository/VCS Leaked Credential Usage Detection, Uber Paid Hackers to Delete Stolen Data on 57 Million People.

Information about the Stackoverflow breach was provided in a detailed Stackoverflow blog in January 2021. Starting at the end of April and continuing through much of May 2019, an attacker had gained moderator and developer level access across all of the sites in the Stack Exchange Network, and then exfiltrated the source code and also personally identifiable information of 184 users of the Stack Exchange Network. The attacker took advantage of secrets in source code and other bad practices for protecting secrets a number of times for both lateral movement and privilege escalation. The blog writes:

“This incident brought to light shortcomings in how we’d structured access to some of our systems and how we managed secrets, both at build time and in source code.”

“Bad secret hygiene—we had secrets sprinkled in source control, in plain text in build systems and available through settings screens in the application.”

Stackoverflow blog

Their advice to others includes:

“Guard secrets better. TeamCity has a way to protect secrets, but we found we weren’t using it consistently. Educate engineers that “secrets aren’t just passwords.” Protect SSH keys and database connection strings too. When in doubt, protect it. If you must store secrets in a Git repo, protect them with git-crypt or Blackbox .”

Stackoverflow blog

The third example is Ashley Madison, which was hacked in July of 2015 by a group calling themselves “The Impact Team.” Ashley Madison is an online dating service targeted towards married people who want to cheat on their spouses. In this attack, The Impact Team leaked private details of 30 million users as a punishment for their infidelity, and also leaked the website source code, which included hardcoded database passwords, AWS S3 credentials, secret tokens for other applications, and private TLS keys. It has not been proven that the source code lead to the theft of the other data, but it appears highly likely based upon what was in it. More information here: Credentials stored in Ashley Madison’s source code might have helped attackers, Credentials in the Ashley Madison Sources.

Sometimes ethical hackers find the problems first, sometimes they’re not first

In the awesome report No need to hack when it’s leaking by Jelle Ursem & DataBreaches.net, nine health care related companies are identified that committed secrets to source code which led to the leakage of personally identifiable information and private health information of “150,000 – 200,000 patients, and possibly many more” — see screenshot below. While some of these accidents appear to be first discovered by the ethical hackers, they do note that one company (Texas Physician House Calls) had been hacked in the past and had malware on their live servers. There may be some cases where we never know whether the ethical hackers were there first.

Excerpt from “No need to hack when it’s leaking” report

Another example comes from SolarWinds, who supplies network monitoring software to the US government and a number of Fortune 500 companies. A supply chain attack in 2020 led to a foreign actor intruding in on thousands of organisations, including the US federal government. While it is not known whether this was used in the attack, an ethical researcher found that a SolarWinds developer committed ftp credentials to a public repository on GitHub in 2018. More info here: SolarWinds: Intern leaked passwords on GitHub.

Moreover, Microsoft was a victim of the SolarWinds breach, and reported that the intruders had accessed source code for 3 products. In getting access to these source code files, the intruders were searching for secrets in the source code:

“The search terms used by the actor indicate the expected focus on attempting to find secrets. Our development policy prohibits secrets in code and we run automated tools to verify compliance. Because of the detected activity, we immediately initiated a verification process for current and historical branches of the repositories. We have confirmed that the repositories complied and did not contain any live, production credentials.”

Microsoft Blog

Given that they were searching for credentials in source code when they got Microsoft source, it is hard to believe that they did not use the freely available ftp server credentials in some way when they were targeting SolarWinds.

In another example, the United Nations had a .git directory exposed on a production website. An ethical hacker group reports finding: “multiple sets of hardcoded credentials which allowed them to take control of a MySQL data and an internal survey management platform.” They later obtained access to another private GitHub repository that contained
secrets for 7 databases belonging to the UNEP.

A similar event happened in 2018, when an ethical hacker got access to the source code for ebay Japan. The person reports:

“I found out, that http://www.ebay.co.jp runs a version control repository on their production site which was not properly secured. This gave me access to the .git folder in the webroot directory which further allowed me to download the entire source code of http://www.ebay.co.jp including database passwords and much more.”

Last, it’s worth having a read of how Komodo Research performs red team exercises. In their research, they looked into some Fortune 500 companies to see how easy it is to exploit secrets committed to source code for such big organisations. Using only a few hours of effort, they had critical data of 10 of them, which included

“Enterprise admin creds, Domain admin creds, many more ‘regular’ AD creds, multiple database credentials (there is something about these connection strings that just keep popping up in repositories), SAP creds, mainframe passwords, SMTP, FTP, SSH – you name it, we had it.”

Komodo Research blog

To emphasize, your private repos are not safe to hold secrets!

Many of the examples above were from private repositories, but others were from public. Sometimes an attacker is able to find his way to your code, other times mistakes give your code away for free. For example, when Github gave valid session cookies to wrong users, or the Nissan source code leak through repo misconfiguration, or the Mercedes source code leak that was due to an error letting anybody register a fake email address belonging to that company. For more motivation, see the section on Misuse of GitHub in the No need to hack when it’s leaking report. Bottom line: never assume that only good guys are reading your code!

Of course for public repos, the situation is worse: bots are scraping code repository websites like GitHub all the time in search of private data. In 2014, a developer named Rich described his $500 screwup — accidentally committing AWS access keys to GitHub. Related: Don’t upload your important passwords to GitHub, Dev put AWS keys on Github. Then BAD THINGS happened, Slack bot token leakage exposing business critical information.

The State of Secret Sprawl on GitHub

The reader may observe that GitHub is often (not always) the holder of source code repositories that have credentials taken from them. This is a consequence of its size: GitHub has more than 50 million developers and more than 60 million repositories hosted on it. GitHub has responded by building tools to help developers, such as GitHub secret scanning. They have also published a report about secrets committed to GitHub.

I find two things surprising in this report: (1) developers may expose corporate secrets through their own personal public repositories, and (2) more often than not, secrets are committed by accident:

“Secrets present in all these repositories can be either personal or corporate and this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developers’ personal repositories”

GitHub Secret Sprawl report

“A user that first writes his code with credentials in the code so that it is easier to write/debug, he then forgets to remove it from all his files after his work is done. He then commits and pushes his changes. When he understands that he made a mistake, either he does a deletion commit or a push force so that the secrets do not appear in his current version. Most of the time, he forgets that git and the internet are not forgiving: Secrets can be accessed in the git history even if they aren’t in the current version of code anymore, and public data hosted on GitHub can be duplicated and cloned into multiple different locations.”

GitHub Secret Sprawl report

Final Remarks

As I said at the beginning, I believe that education and tooling are important for improving this frequent and very serious coding problem. This blog addresses only the education side by providing many examples of the devastating consequences of committing secrets to source code repositories. It does not address tooling to prevent it (such as pre-commit hooks) or what developers should do instead of committing secrets to source code repositories (such as injecting secrets during the production build using environmental variables or enterprise secret management solutions) — both of these are substantial topics that can be covered in other blogs.

No, Java is not a Secure Programming Language

If you ask Google, you will be brought to a fantasy land of fairies, unicorns, and Java being the quintessential example of a secure programming language. Whoever are writing these web pages clearly do not live in the same world as me — an Application Security Specialist (there is no acronym for that title, BTW) who spends his every day with developers to help them uplift secure coding practices.

This blog is intended to correct one of the most perpetuated security myths by showing why Java is lagging far behind in security design in comparison to modern competitive languages. The problems are twofold:

  • Many Java security bugs are due to insecure defaults. As a consequence, developers need to have advanced development knowledge just to write simple code that cannot be easily exploited.
  • Java has really poor documentation: it is not hard to make things work, but it is often very unclear how to do things the ‘right way.’

To illustrate this, we go through a prominent examples from OWASP Top 10 and compare Java to the .Net framework. The problems with Java security are not restricted to web design, but this should give a pretty good indication of the security state. We also provide a few other examples outside of OWASP Top 10.

Note to Java developers: This is not a “Let’s bash Java because it’s fun” blog. While no doubt the Java developers will not like the title, it is hoped that they get something useful out of it: not just a display of the problems, but indications on how to work around them, and hence have less insecure software.

Java and the OWASP Top 10

We start out with 4 prominent examples of Java security failures with comparison to .Net framework. These are very common problems that often have very serious consequences:

  • XXE (XML External Entity), which typically results in letting the attacker have access to any file on your file server (example). By default, every XML parser in Java has external entities enabled. This is a feature that is very rarely legitimately needed, yet it is on by default. If you load or parse XML documents, you better follow this wonderful OWASP guide. The .Net framework also had problems with default values in old versions, but they fixed it in newer versions.
  • Deserialization vulnerabilities, which is wonderfully described in detail here. If you’re hit by this in Java, you’re toast — an attacker has a shell on your system. Unfortunately, Java does not offer a fix — it’s up to you to figure out to protect yourself. Deserialization vulnerabilities are in other languages too, but if we want to focus on .Net, it is secure by default. How to avoid this in Java? It’s not so easy. The best approach is not to deserialize untrusted input, but that may require a huge redesign for legacy applications. One idea that should work is to bundle the serialized data with an hmac signature and verify the signature before deserialization.
  • Cross site scripting, which allows an attacker to run JavaScript of his choice on other users for your website. If you write JSP code, then you cannot output untrusted data the default way. Instead you need to use c:out or fn:escapeXml or else you’re in trouble. Compare that to the Razor engine in .Net Core, which takes a secure by default approach, and has very visible warnings about not using the default approach.
  • Sensitive data exposure, and my issue here is specifically about cryptography. As a former cryptographic researcher, I hold this one dear to my heart. Where to start here? Well, let’s look at the warning in bold at the end of the JCA introduction (see screenshot below). Yep, Java is not here to help you, go get your PhD in crypto and you’re on your own (and don’t copy the insecure examples they have in their documentation!). Think you can write secure crypto in Java? I’ll bet good money that you’re wrong. Have a look at my blog Top 10 Developer Crypto Mistakes (see also reddit /r/programming and /r/netsec comments). The Java developer crypto problem is well documented in academia as well, including in mobile applications.
    This is in stark contrast to .Net, where they have put an enormous amount of effort into making their APIs really well documented and generally less clumsy. It is safe to copy-and-paste code from .Net documentation and they always provide examples to make your life easier. Java is the exact opposite.
    Unfortunately, this is too large of a topic to provide a right answer for Java developers, but a good start is from Luke Park’s Secure Compatible Encryption Examples. For other crypto topics, follow the advice of Maarten Bodewes on StackOverflow — he knows it better than anyone I have seen. And of course, there is always the libsodium option, which is idiot-proof by design.
    Fortunately, most of the implementation flaws are not exploited in practice simply because cryptanalysis is a niche skill.
Warning from the JCA documentation page

Other Examples of Problems With Java

I’ve spent so many years puzzled by Java documentation trying to understand what is really happening under the hood. It’s not easy.

A great example is SecureRandom( ), which is what it says it is. This is one case where it is secure by default, but the problem is that there are so many ways of screwing it up and I have seen it happen too often. Since Java documentation does not tell you the right way to use it, we are left to read websites going into deep dive details about how to use it properly and how not to, such as this and this (another one I just came across is here, which I have yet to review in detail, but it should give you an idea of the complexity). Please, developers, do not make it too complicated, and also avoid fiddling with APIs such as setSeed( ) because you’re only making things worse.

Another common problem with Java is logging, which always seems to be vulnerable to carriage return/line injection attacks. To my knowledge, you need to provide your own protection to this — Java does not have one out of the box. This short guide should be helpful.

The state of Java security is most pronounced when we compare with .Net, which is continuously making it easier for developers to write safe code without being security experts. Microsoft is even putting in protections like CSRF tokens by default, which is wonderful. In contrast, Java never seems to evolve from a security perspective: same language, same defaults, same poor documentation. Good luck, you’re on your own with Java!

But the Web Says Java is Secure!

The examples above are indisputable: Java has problems. Despite this, many websites tout the good properties of Java while ignoring all the real-world security problems. It’s quite like this short video clip, where we need to rename “CiSO” to “Java”:

Am I just a Microsoft Fanboy?

My programmer days were in the 1990s. Believe it or not, back then I ranked in the top 10 producers of obscenities directed at Bill Gates and the Microsoft monopoly. Actually, I just made that up, but if there was such a list, I think I would have been on it.

Back in the 1990s, Java was a breath of fresh air: a break away from the Microsoft monopoly, and a development language that was not prone to buffer overflow vulnerabilities like C was. It had cryptography as part of the language, freeing people from paying excessively high costs to vendors like RSA. It was wonderful for its time.

The problem with Java from a security perspective is that it has never evolved: the same problems exist from version to version and they never get fixed. The documentation never gets better. Java will always be Java, whereas its competition has chosen not to suffer the same fate.

Fighting Bots with the Client-Puzzle Protocol

In 1999, Ari Juels and John Brainard came up with an elegant protection against denial of service attacks, known as the client-puzzle protocol. Their idea was patented (US patent 7197639), which might have inhibited its uptake. However that patent expired in early 2020, so it is now free for anybody to use. And it should be used.

The client-puzzle protocol is not widely known or implemented, but we do note that Akamai picked up the concept very soon after the patent expired for their bot manager. Akamai adds advanced intelligence to it (remark: it appears that Clouflare may be doing the same, see also Kasada), but the basic client-puzzle protocol is easy to implement and can be used by anybody without spending a dime.

This blog:

  • Explains the basic idea (slightly simplified) behind the Juels-Brainard client-puzzle protocol with the focus on web applications,
  • Links to proof-of-concept source code,
  • Links to a demo of the source code hosted on Heroku,
  • Explains why it is preferable over rate limiting when protecting against malicious bots,
  • Suggests implementation enhancements to get the most benefit of it,
  • Provides references to related work.

Client-Puzzle Protocol

The objective in the client-puzzle protocol is to slow down bots so that they become near the speed of humans. Slowing them down impedes a number of different attacks, such as web scraping, brute forcing, and certain types of denial-of-service.

To accomplish this, the proof-of-work concept is used. Now many people may be thinking about Bitcoin, but the proof-of-work concept existed in cryptographic literature more than a decade before Bitcoin was invented. In fact, Ari Juels was one of the pioneers of the concept.

The client-puzzle protocol serves a cryptographic puzzle to clients that must be solved before their request is served. The puzzle may take a fraction of a second to solve — which has little impact on legitimate humans, but slows down bots significantly. The verification of the puzzle solution is very fast, so the protocol imposes negligible impact on the server. Also, the protocol is entirely stateless.

To be concrete, the puzzle typically involves finding part of the pre-image of a cryptographic hash function. See the diagram below.

In this construct, two levels of hashing are used. First, the client request data (query parameters/request body) along with a server-side secret and timestamp are hashed. This produces hash h1, which is hashed to produce hash h2. The puzzle consists of h2 and most of the bits of h1.

Cryptographic hash functions are designed to be pre-image resistant, so if you are given h2, then it would be very difficult/time consuming to find h1. On the other hand, if you are given h2 along with most of the bits of h1, then you could brute force the remaining bits provided that the number of remaining bits is not too large. You would simply try each possibility for the remaining bits, hash the candidate pre-image, and see if the hash matches h2. If k bits are missing, then this requires up to 2k trials. It is easily done for small k, but requires some computational effort.

In the client-puzzle protocol, h2 and most of the bits of h1 are given to the client. The client must brute force the remaining bits. When the client succeeds, it sends back the puzzle solution (h1), the timestamp that the server provided, and the request data.

Now consider how the server verifies the result. This is the most elegant part because the server does not need to remember h2. Instead, it just recomputes h1 by hashing its secret along with the client provided request data and timestamp. If the hash matches what the client provided, then the puzzle is solved!

Clients cannot forge fake puzzles as long as they do not know the server secret. Attempts at providing fake puzzles are easily caught and rejected from the computation just described. This computation is quick and easy to do for the server: a single hash computation with no database usage. Similarly, clients cannot lie about the timestamp because the hash computation will not check out.

What’s the purpose of the timestamp? You can set an expiry time by which the puzzle must be solved. This gives the option of denying the request if the solution took too long, which is especially useful if the response for the request may change over time. For example, consider prices of items on an ecommerce website. Without the timestamp, the attacker can provide the same puzzle solution every week without recomputing it to get weekly price updates. With the timestamp, it forces him to recompute every time.

Generally, the use of this protocol would look like this for a web application:

When Juels and Brainard wrote about this protocol in 1999, they described protecting against threats such as TCP SYN flooding, email bombs, and so on. The internet has flourished since then, and today it is more clear that there are many applications of this technology not only for preventing denial of service attacks, but more generally for impeding bot activity such as the way Akamai is using it.

Source Code and Demo

I am not a professional programmer and I am just learning Node.js, but despite that it was little effort for me to build a proof of concept. You can see the source code on GitHub. This PoC takes user input for a search query and returns a random gif from Giphy related to the search query. The server side uses a Giphy API key. By using client-puzzle protocol, clients need to do a computation every time a request is made through the server to Giphy, which controls the number of requests going to Giphy through the server.

The main functions on the server side are compute_puzzle( ), which computes the puzzle from the original request, and check_puzzle_solution( ), which verifies the solution. The client side has code to brute force the puzzle solution.

You can see the demo here. The puzzle strength is set at 217 and the expiry time is 10 seconds. This usually takes less than a second on most of my devices, with exception to my old iPad 2, where it can take a few seconds. Note that every request is using my Giphy development API key, which is rate limited by Giphy. Although I do not know what the rate limit settings are, I’m counting on the client-puzzle protocol to (hopefully) keep my anonymously accessible demo application below the bar.

If you’re going to use my GitHub code, here are a few remarks you should be aware of:

  • The code generates the secret from crypto.randomBytes( ) upon startup, which is fine if you have only a single backend server. If you’re using multiple servers, the secret should be shared among them to deal with the case that the server that responds to the puzzle solution might not be the same as the server that created the puzzle.
  • The whole security of the system depends upon the strength of the secret, so don’t use something silly like P@55w0rd123. The secret needs to be long and not brute-forceable.
  • The program is most entertaining with a Giphy API key that you can obtain quickly and for free following guidance here. Once you have the key, set it as the environmental variable giphy_api_key.
  • You can also set the puzzle strength (environmental variable puzzle_strength, 16 by default which means 216 effort) and expiry time (environmental variable time_limit, 5000 milliseconds by default).
  • Ultimately the client-side JavaScript code should be optimised. This is because the JavaScript code is what legitimate users will be using whereas a good hacker would use custom code to solve the puzzle as fast as possible. The more he can make his code faster than the JavaScript code, the more beneficial it is to him (more details in last section below). To optimise the legitimate client code, the focus should be on making sha256 as fast as possible.

The limits of rate limiting

Rate limiting is very important, but there are certain attack scenarios where rate limiting does not provide sufficient protection.

If you have an API that is accessible anonymously, the typical approach for rate limiting is to limit by IP address. However, nowadays hackers easily get around that restriction by rotating their IP address using tools such as Fireprox. Hackers can also spoof geographical locations, and easily blend in to look like legitimate users. As a consequence, it is hard to distinguish between the bad guy versus legitimate users — you either let the bad guy requests in for free or you take the risk of dropping legitimate user requests.

When battling malicious clients that are accessing anonymously accessible content, rate limiting is most effective if a decision can be made with certainty on whether the client is malicious. This certainty will not exist against a clever adversary.

The client-puzzle protocol handles the uncertainty much better by requiring every client to solve a puzzle. Legitimate users (people) do not make numerous requests per second, so they will see little impact. Malicious bots on the other hand will see a huge impact because they are forced to do computations for each of their numerous requests, and that builds up. With the client-puzzle protocol, making requests is no longer free.

Of course, there could be legitimate bots where there is a need to do many requests per seconds. Those can be identified in any number of ways (API key, known IP address, etc…) and whitelisted to allow them through. Everybody else must do the computation per request.

Implementation enhancements

Before talking about enhancements, we must talk about limits. The client-puzzle protocol does not stop bots, it only slows them down. A malicious entity can still get requests through, but suddenly it has to “pay” per request, where the payment is by computation time. If the entity has substantial computing power, it can erase much of the benefit that the protocol offers.

In the original publication, Juels and Brainard suggested only using the protocol when the server is under attack and tuning the parameters according to the severity of attack. They also had in mind attacks such as TCP SYN flooding, where a server can only handle so many connections at once. Our focus is more at the application layer, so we will discuss application-specific enhancements. Of course tuning parameters according to severity is still a valid protection.

The following implementation enhancements can also be considered:

  • Similar to Akamai/Cloudflare/Kasada, we might have some intelligence about our adversary, allowing us to adjust the puzzle strength according to the likelihood of the request being malicious. For example, maybe we know that the attacker is using a particular API key or user agent header — we can offer tougher puzzles to those requests than for other requests.
  • If we are trying to defend from an attacker brute forcing specific targets, then we can cache failed efforts and increase puzzle strength for those specific targets upon each failure. These increases in strength should be temporary: long enough to frustrate the enemy, not too long to lock out the good guys.
  • We can always whitelist known good clients to allow them through with little effort, while requiring higher puzzle strength for the unknown.

Remark: Although it will reduce impact, the client-puzzle protocol is not by itself strong enough to prevent credential stuffing attacks because the attacker has a significant probability of success per request. For a more appropriate defence, see our OT2FA blog. The client-puzzle protocol does help impede other forms of brute force where the likelihood of success per request is smaller.

Related work

One of the shortcomings of the client puzzle protocol is that the attacker might have significant more computational resources than legitimate users. This is known as the resource disparity problem. Guided Tour Puzzle Protocols were designed to address this disparity. I have not yet researched their practicality or intellectual property considerations.

In private communication, Ari Juels suggested that client puzzles could potentially be used for mining a cryptocurrency such as Monero (having an ASIC-resistant proof-of-work scheme), “meaning that users are effectively paying for services.” After some number crunching, he said it would unfortunately not yield much currency because the clients on the whole aren’t doing a huge amount of work. Ari also pointed out how this was discussed in a related paper from 1999 with Markus Jakobsson where they applied the idea to a cryptocurrency protocol known as MicroMint.

Understanding Certificate Pinning

Certificate pinning (“cert pinning” for short) is a technique used for mobile applications to add an extra layer of protection to secure communications. Some people additionally use the technique to prevent people from reverse engineering APIs via intercepting proxies, however this latter objective is hard to achieve against a determined hacker.

Certificate pinning offers very high security, but it does come with some downsides that need to be considered by the business. This blog explains the security and business considerations for certificate pinning, and shows the trade-offs that can be made according to the need of the organisation implementing it.

The takeaways from this blog are:

  • Understanding why we do cert pinning
  • Understanding the public key infrastructure (PKI)
  • Making an analogy with browser security
  • Certificate pinning typically comes at the cost of forced mobile app upgrades, which can be a particularly painful user experience for short-live certificates such as those provided by Let’s Encrypt
  • Apps that are cert pinned need to roll out new versions in coordination with operations teams that rotate certificates because upgrading an app requires time (including AppStore review) — otherwise there will be downtime
  • The option of Certificate Authority Pinning is less strict than full Certificate Pinning. It requires less maintenance while offering slightly less but still strong security

What is Cert Pinning and Why Would I Use it?

In order to understand certificate pinning, one must understand PKI. An excellent blog on this is provided by Technophile: SSL/TLS for dummies part 3 – Understanding Certificate Authority — embarrassingly, their certificate is expired at the time of writing, so just read below.

The short summary is that secure communication over the internet is done via TLS (often people wrongly say “SSL”, which was a predecessor of TLS that was deprecated due to security weaknesses). The security of TLS depends upon a small set of organisations called certificate authorities (CAs), whose role is to verify the identity and then issue certificates to websites, which allow them to use TLS security. To make this all work, operating systems and potentially other software providers need to be supplied with a small set of certificates from the certificate authorities — these certificates are the “root level” certificates that the everything else in TLS depends upon. When you visit a website via TLS, the certificate provided by that website needs its signature checked against one of the root level CAs to verify its validity — this asserts that you are visiting the website that you think you are visiting and communication is secure to that website, assuming the CAs have not been breached and the OS/software vendor has not been breached and you do not have a bogus CA installed and the website you are visiting has adequately protected its private cryptographic keys. If any of those assumptions are wrong, all bets are off.

As mentioned, there are many ways that TLS can break, but the most severe would be if a certificate authority is breached since it affects everybody. Such security incidents have happened, the most prominent example being DigiNotar.  Less than having a CA breach, there are other examples where PKI security shortcomings have affected a large number of people, such as the Lenovo Superfish incident.

It must be emphasised that if any single CA is breached, then all of TLS breaks. It doesn’t matter what CA you used for your website, it only matters what CA your attacker uses. I’ve seen major organisations that are supposed to be security savvy having polices of only getting certificates from a subset of CAs that they trust the most — they entirely miss the point that the CAs they trust don’t matter: the design of PKI means we all depend upon every single one of them not being breached.  At the end of this blog, we will show there is one scenario that can such policies meaningful: however I have yet to see it implemented in practice.

The fact that every CA needs to be completely secure for PKI to work is one reason that some people do not trust PKI. However, this is where certificate pinning comes to the rescue for mobile apps: the main reason to use it is to make up for the shortcoming of PKI. The concept of certificate pinning is to make the mobile app only trust the exact certificate that is used on the website that it is communicating with, and thus no longer depend upon the security of CAs. One way to implement this (other, more practical ways will be mentioned later) is to either provide the exact certificate (or else the cryptographic hash of it) in the mobile app source files during development and write code that will verify that this certificate matches what the website is sending during the initial stages of the TLS communication. If there is not an exact match, then reject the communications and do not proceed. This provides very strong security, but it does come with some potential gotchas that we will explain later.

There is a second reason why some people use certificate pinning, which is to prevent people from reverse engineering your mobile app via an intercepting proxy. Without certificate pinning, anybody can view the communications between their mobile app and the server using well known techniques (see here and here).   This is not breaking TLS — other people cannot inspect your communications, only you can inspect your own communications by installing an intercepting proxy certificate on your mobile device. By adding a new certificate that the OS trusts and for which you know the private key and nobody else does, you have made it so you can snoop on the TLS communications but nobody else can. This does not directly work if the certificate is pinned because the application stops the interception — it does not trust the newly installed certificate, but instead only trusts the exact certificate that is pinned.

However, this defence has limited efficacy because it is possible to bypass certificate pinning using well known techniques (see here and here), and because another approach to understanding how APIs work is the old-fashioned reverse engineering of the binary. As a consequence, to prevent adversaries from understanding your API, you also need obfuscation and jailbreak/root detection. Such defences will stop many hackers, but a skilled and determined hacker with a lot of free time can ultimately succeed.

Concluding, it makes sense to do certificate pinning to prevent potential PKI insecurities, but cert pinning does not stop people from understanding your APIs — it merely slows down the good hackers. For the rest of this blog, our primary focus is on the use of cert pinning to deal with potential PKI problems. Cert pinning to slow down reverse engineering is outside the scope of this blog.

If a CA becomes breached, the manufacturer should push a security update

OS vendors / mobile device manufacturers have a strong interest in ensuring security: people trusting in their products is important to their business success. As a consequence, they usually push security updates to their customers when a CA becomes no longer trusted. For example, Apple released an update when DigiNotar was compromised. Similar updates were done by browser manufacturers as well as described here.

So this gives us some assurance that even without certificate pinning, we can still have decent security.  However there are some catches:

  1. Users don’t always install their security updates promptly, so there is gap when exploitation is possible.
  2. This only works for known compromises — whereas certificate pinning will protect against the unknown compromises.
  3. The situation for Android is less assuring because the security updates come from Google via the mobile device vendor, and the vendors have not always been so prompt in pushing security updates to their users.

Related to (3), Google does provide a solution for app developers to protect clients against discovered TLS vulnerabilities without relying on security updates from the vendor by using API installIfNeeded() or installIfNeededAsync().  It’s not clear to me from the documentation whether this fixes only TLS implementation bugs, or if it also updates root certificates on the device.

So, while there is some fallback, there is definitely room for improving security with a technique such as certificate pinning. However there is a practical downside to certificate pinning that one needs to be aware of — the app breaks when the certificate changes. We will discuss that more later.

Comparison to Browser Based Security

It’s interesting that we mandate such a high standard for mobile security, but what about the corresponding web browser security? Our ability to have similar controls is somewhat restricted, so one might feel that we are treating the mobile app more holy than the browser.

In 2015, the security community tried to compensate for the discrepancy between mobile security guidance and the web browser security capabilities by introducing a similar concept to the web browser. A new http header was added that allowed websites to instruct browsers to pin their certificates, so the browser would not accept any other certificate that it was presented from that particular website (for the duration of the validity specified in the header). This was known as HTTP Public Key Pinning (HPKP).

It didn’t take long before the security community discovered problems with this proposal. Within 2 years, one of the authors of HPKP announced the intent to deprecate it due to a number of problems that were not previously anticipated. The reasons for deprecation include the difficulty in choosing a set of pins that is guaranteed to work, the risk of locking users out if an error was made in pinning, and the risk of an attacker abusing HKPK if he could obtain a misissued certificate. A real example of locking out users is here. With the deprecation, A replacement was recommended : Expect-CT header. Expect-CT header is formally documented here.

Expect-CT is not as strong as certificate pinning, but is safer to use (still, the authors suggest using it with caution). In a nutshell, it is used to instruct browsers that the website must have a published certificate, in compliance with Google’s Certificate Transparency effort. With some configuration, the browser can be instructed to block connections if the site does not comply. Thus if a malicious actor secretly got an unpublished, misissued certificate for a website he would not be able to abuse it with this website. In other words, for the malicious actor to abuse a misissued certificate, it needs to be one that is published! But hopefully those abuses can be caught by certificate revocation (or maybe not).

Expect-CT is not supported by a number of browsers (including Firefox) at the time of writing this blog. It is a big step towards fixing PKI shortcomings, but falls slightly behind the security one gets from mobile certificate pinning.

Implementation and Caveats

The company Infinium has written excellent blogs for developers on how to implement certificate pinning. These guides can be found here:

A key downside of certificate pinning, which needs emphasis, is that changing certificates implies that you need to release a new version of the app (with an exception, to be discussed below) and force upgrade your users to the latest version. This is because the app only works with the exact certificate that was pinned — communication breaks if any other certificate is used. If you are using short-lived certificates, such as those from Let’s Encrypt, then this requires frequent maintenance and is an unpleasant user experience.

Another points is that the rotation of the certificate will need to be coordinated with the release of the updated app to avoid downtime. This means that the development team will need sufficient advanced notice to update the app with the new certificate, test it and review it, and get it through App Store review, or else users may be locked out for some period of time. This is a very real concern: I’ve seen it happen, and the business was not happy.

The exception mentioned above is that if only the public keys (as opposed to the full certificate) are pinned then one can rotate a certificate without releasing a new app provided that the same public keys are used in the new certificate. That is, you change the certificate when it expires but use the same cryptographic keys as before. This results in a more friendly user experience, but it requires those that manage certificates (typically an operations team) to follow the requirement from the development team. Of course, if any key changes, a new app needs to be released and a forced upgrade needs to happen.

There is another option that offers slightly less security than full certificate pinning but
is much more likely to avoid the problem of needing to release a new app and force upgrade the users…

Alternative Option: Certificate Authority Pinning

With the goal of eliminating the need to upgrade the app when a new certificate is released but still wanting better-than-TLS security, developers can pin the certificate authority public key only. In other words, the app will now only accept certificates that are signed by the specific CA (or CAs) that you allow and no others, in addition to all the normal certificate checks that are required for security.

Recall our discussion above about the problems with PKI: if any CA is breached, then everybody becomes vulnerable. However, by pinning the certificate of the CA or CAs that we trust, we no longer have that risk. Instead, the exact CAs that we trust need to be compromised for us to be vulnerable.

This is not as strong as full certificate pinning because if an intermediate key within your chain of trust becomes compromised, then you are still vulnerable. However, it is better than TLS-only because now you do not depend upon all CAs being secure: instead you only depend upon the CA or CAs that you are using being secure.

With this approach, the app only needs to be updated if you change CAs or if the CA changes its public key. If your company has a policy on which CAs they will restrict to, then you can pin the public keys of those exact CAs and quite likely you will not need to worry about updating the app or forced upgrades for a long, long time. This is especially appealing for those using short-lived certificates.

For Android devices, pinning the CA public key with the TrustManager
(see also this).

For iOS, certificate authority pinning is discussed here.

Some Useful AppSec Resources

While no doubt OWASP has earned the prestige of being the #1 AppSec resource, there are many other good information sources across the web that I have collected over the years that have been very helpful to me and to others whom I have shared with.  I especially enjoy a blog that explains things simply and clearly while at the same time being technically correct.  Below is a list of my favourite such resources.  I am greatly appreciative to those who can reciprocate with their own list.

Blogs / General AppSec

Certificate Pinning

Cookie Security

CORS

Cross Site Scripting

Cryptography

Deserialization

DevSecOps

Http Security Headers

Input Validation

  • Validating Input – This is old, but is a classic.  For more recent guidance, see the Martin Fowler website blog on The Basics of Web Application Security (linked above)

JWTs

Logging

Mobile Security

Oauth

  • An Illustrated Guide to OAuth and OpenID Connect – Most people want to dive into the technical details of Oauth before they really understand its purpose. Slow down, read this, and then you will have a better insight to the complex protocol

Passwords

PHP

Race Conditions

Server Side Request Forgery

SSL/TLS

SQL Injection

 

Thoughts on the Capital One Security Breach

Whenever one reads about a security breach like what happened to Capital One, security experts are eager to find out the anatomy of the attack.  Little by little, details have emerged.  Initially called a firewall misconfiguration problem, later reports seemed to suggest a Server Side Request Forgery attack (SSRF) vulnerability.  These conflicting stories do not seem to be publicly resolved.  In fact, even a recent story from Wired is still suggesting firewall misconfiguration.  One thing that is clear is that Amazon is sticking to the firewall misconfiguration story while trying to remove themselves from any blame:

It’s inaccurate to argue that the Capital One breach was caused by IAM in any way. The intrusion was caused by a misconfiguration of a web application firewall and not the underlying cloud-based infrastructure

Which is it, firewall misconfiguration or SSRF, and if Amazon is not to blame, then who is?

Firewall Misconfiguration or SSRF?

It is not so common to hear of a web application being taken over due to a misconfigured firewall, so this sounded curious from the beginning.  The closest I have been able to find to make any sense of this comes from Krebs:

“According to a source with direct knowledge of the breach investigation, the problem stemmed in part from a misconfigured open-source Web Application Firewall (WAF) that Capital One was using as part of its operations hosted in the cloud with Amazon Web Services (AWS).

“Known as “ModSecurity,” this WAF is deployed along with the open-source Apache Web server to provide protections against several classes of vulnerabilities that attackers most commonly use to compromise the security of Web-based applications.

“The misconfiguration of the WAF allowed the intruder to trick the firewall into relaying requests to a key back-end resource on the AWS platform.”

Krebs then goes on to explain how the attacker performed SSRF on the web application to access the Amazon instance metadata, which allowed her to access IAM Role credentials and own the EC2 instance.

From this, we are actually starting to get some insight.  Indeed, the root cause of the problem does not seem to be a misconfigured firewall — instead it was an application SSRF vulnerability, which is a common theme for AWS hacks — in fact many tutorials about SSRF talk exactly about abusing AWS EC2 instances.

For those not familiar with SSRF, I strongly recommend this Contra Application Security tutorial that shows how Capital One might have been breached.  Assuming the accuracy, I must emphasize that it did not take a hacking genius to find this vulnerability.  This attack is low hanging fruit.

For Amazon and others to suggest that the problem was a misconfigured firewall shows a fundamental misunderstanding of web security.  Quite frankly, I find it shocking that one of the top cloud providers is is going with this line of argument, so time to speak out:

WAFs are not a magic solution to your security problems

What is evident from the above quotes is that a WAF configuration is being blamed for  web application vulnerability.  This is entirely the wrong security mindset.

When a web application is built, security has to be part of the design. It is not something that is added on at the end: “Now turn on your WAF, then you’re secure!”  Nonsense!  Sure, WAF vendors are there to sell a product, so they like to claim things that are stretching the truth.  But a company like Amazon should know better, and for them to point the finger at the WAF is very telling into the security immaturity of Amazon.

WAFs simply do not solve all security problems.  WAFs are a backup protection — if security protections were built into the application itself, then the WAF would offer no value.  In reality, developers make mistakes, so the WAF is a fallback security mechanism that can help when other things fail.  It is however by no means the primary form of defence.

WAF vendors need to be kept honest.  I’ve seen more than one make ridiculous claims: “Turn on your WAF and you are protected from the OWASP Top 10!”  It simply is not true.  WAFs can detect and stop a lot of common attacks, but there are so many things they cannot detect and/or stop.

WAFs work by searching for dangerous (“blacklist”) patterns and blocking requests that fit those patterns.  Blacklist validation can never be perfect, because there are an infinite number of possible inputs yet the blacklist must be finite.  Therefore it is just a matter of time before a good hacker finds the right pattern that gets past the WAF.

Moreover, there a number of strategies that hackers can use to bypass WAF blacklists, such as changing the encoding.  For good overviews, see WAF through the eyes of hackers and WAF Evasion Techniques.

Last, the suggestion that a WAF can stop all OWASP Top 10 issues (which some vendors will claim) is absurd particularly since some of the attacks on the list do not go through the server at all.  For example, DOM-based cross site scripting happens between the attacker and victim without going through the server or WAF.  The vulnerability is present due to servers serving up vulnerable JavaScript to the victim.  As another example, if the server is sending data via http rather than https, then any person can eavesdrop on it without sending any data to the server at all.  WAFs just cannot and do not sprinkle magic fairy dust to make these problems go away.

In the Capital One breach, Amazon is blaming Capital One for not having their WAF stop the SSRF.  The reality is that the WAF is the backup protection, and the primary protection should have been at the application level.  As I explained in my Understanding Input Validation blog from February 2018 (which by the way talks about how SSRF is often abused on Amazon cloud computing), input validation is the proper way to stop SSRF.  That solves the problem exactly where the vulnerability exists — in the code — rather than expecting some add-on security protection to suddenly turn an insecure application into a secure one.

So if it wasn’t a WAF misconfiguration, then whom do we blame?

The joy of recriminations!  In fact, I see a lot of failures which are far more significant than the WAF configuration failure.  For example:

  • Was the application penetration tested?  If not, that is a major failure in security process.  If it was, then it is a bit surprising to see that this vulnerability was missed — a good penetration tester should have found it.
  • Was the application security code reviewed?  A good code reviewer with a decent SAST tool could have found the vulnerability.  But we could only say that if we know whether Capital One invested in application security — that I do not know.
  • Were developers provided application security education?  While this is one that is harder to depend upon, it is a recommended best practice of today.
  • Maybe Amazon is to shoulder some blame for not making SSRF harder to abuse in their infrastructure?  I’ll elaborate on that in the next section.
  • Whoever made the business decision to go with AWS, was security part of that decision?  I’ll elaborate on that in the next section too.

AWS is like a car without seatbelts!

Recently, Evan Johnson from Cloudflare Inc wrote a blog Preventing the Capital One Breach.  That blog hits the nail on the head.

Let’s cut to the chase: The three biggest cloud providers are Amazon AWS, Microsoft Azure, and Google Cloud Platform (GCP).  All three have very dangerous instance metadata endpoints, yet Azure and GCP applications seem to never get hit by this dangerous SSRF vulnerability.  On the other hand, AWS applications continue to get hit over and over and over again.  Does that tell you something?

Microsoft is a company that learned about security the hard way.  It took them a long, time before they understood that it is their responsibility to make products that are secure by default and hard for the user to misuse:  Putting the responsibility for security in the hands of the user is dangerous.  A manifestation of this is in their instance metadata endpoint, which builds in protections to stop SSRF attacks:

Azure_metadata.png

I can’t say Google has learned security the hard way, instead I say they just hire a lot of smart people and their business clearly understands the importance of security to their business model.  Similar to Microsoft, they built SSRF defences into their instance metadata endpoint:

gcp_metadata.png

The point is that it is not sufficient for the attacker to be able to just control the url, but the attacker must also be able to set a http header in order to access the metadata endpoint.  In most cases, that is outside of the attacker’s control, which makes SSRF a lot less likely to exploit.

If Amazon had similar protections like Microsoft and Google, then it is unlikely that we would be talking about the Capital One security breach right now: it simply would not have happened.  So, why won’t Amazon put such protections in their metadata endpoint?  Because Amazon believes it is not their responsibility to make their services hard to abuse, instead it’s the customer’s responsibility to get everything perfect themselves.

And that’s where the seatbelt analogy comes in.  If these vendors were selling cars, the Microsofts and Googles would be selling the cars with seatbelts — understanding that you might crash, but they have designed the systems to reduce the impact to you.  Amazon on the other hand would be the vendor selling the car without seatbelts: if you crash, it’s your own fault.  If you die, don’t blame them.  They provided a car with a lot of nice features and if you read the manual and drove it exactly as you are expected to, you would have no problems.  But let’s be clear, if everything is not perfect, then you accept the consequences and they will let you know it’s your fault and not theirs.  See it in their own words:

“The intrusion was caused by a misconfiguration of a web application firewall and not the underlying infrastructure or the location of the infrastructure.  AWS is constantly delivering services and functionality to anticipate new threats at scale, offering more security capabilities and layers than customers can find anywhere else including within their own datacenters, and when broadly used, properly configured and monitored, offer unmatched security—and the track record for customers over 13+ years in securely using AWS provides unambiguous proof that these layers work.”

Concluding remarks

Amazon lacks security maturity.  They do not understand key concepts that those weathered in the industry have learned over many years of experience.  Trying to suggest a firewall is the fix for an application security problem is fundamentally wrong.  Relying on people to be experts at configuring firewalls to prevent attacks is a bad strategy: instead they should learn from the Microsofts and Googles about how to build infrastructure that is less fragile and less dependent upon the perfect users.  That’s not to say that the other security controls do not have a place, but instead they need to understand that they are backup defences and not primary defences.

Hosting in AWS is like buying a car without seatbelts.  If your application gets hit as well, then maybe the stakeholders who pushed for AWS should shoulder the vast majority of the blame.  Next time security should be part of the decision making when choosing a cloud provider, which means both Azure and GCP should be preferred over AWS.

 

Don’t Underestimate Grep Based Code Scanning

Static analysis tools (SAST) are perhaps the most common tool for an AppSec team in the endless effort to move security to the left. They can be integrated into development pipelines in order to offer quick feedback to the developer to catch security bugs, resulting in faster remediation times and improved return on investment for developing secure software.

The SAST market is dominated by a small number of big players that charge high licencing fees for tools that do sophisticated analysis. One of the main features of such tools is the data flow analysis, which traces vulnerabilities from source to sink, potentially going across a number of files. More information about what such tools can do can be found in this Synopsis report.  These tools typically can take hours or even days to complete a scan, and may use a large amount of memory. Furthermore, some of the tools have a large number of false positives, and potentially their own eccentricities. For organisations with deep pockets, the benefits outweigh the costs.

There are lower cost, more efficient SAST tools, but they typically do lack the sophistication in terms of quality of security bug findings. Perhaps the most popular low-cost alternative is SonarQube, which is quite popular amont developers. SonarQube does very quick scans, but lacks the data flow analysis capability that the more expensive tools have. As a consequence, security findings are represented in one place (not source-to-sink analysis).

In this blog, we’re going to talk about grep-based code scanning, which is an old fashioned way of SAST scanning — but we argue that good grep based scanning can do reasonably well compared to expensive SAST tools in terms of quality of bugs found. While grep based-scanning in the form we present it cannot do data flow analysis, we claim that the major shortcomings in what we miss are not extensive. We also show examples of things we found that our commercial tool at the time missed. While it is certainly capable that the commercial tools can build up their rule sets to catch everything we catch, it falls upon them to add those rules.

I have experience we three leading SAST tools, but considerable experience with only one of them. I’m not going to call out any names of tools, but I do generally see room for improvement in them.

In order to make this blog the most beneficial to the readers, we will provide a starting pack of grep-based rule sets that we have used. The tools we have developed are not in a state that they can be open sourced, and we do not have the time to fix them up. However, the rules we provide open the door for somebody with that time availability to do so. It takes very little effort to build a PoC in this way.

This is joint work with Jack Healy.

Tool philosophy

False positives seem part of the game in the SAST market. While I have heard of one tool trying to put considerable focus into removing false positives, I have not experienced such features in the tools that I have worked with — at least not “out of the box.” As a consequence, many tools seem to require manual inspection of the findings to remove the junk and focus on the issues that seem real. In our grep-based scanner, we play by the same rules: we assume that the results are going to be inspected manually, and the person running the tool has a basic understanding of security vulnerabilities and how to identify them in code. Thus, grep-based code scanning assumes a certain level of security competence and language expertise by the person running the code scan.

We take the philosophy that is okay to have a large number of false positives if we are looking at something that is very serious and too often a problem — for example, SQL injection. We take the philosophy that it is not okay to have a lot of false positives if the issues is not that serious (by CVSS rating). Generally, we are not trying to find everything, but instead trying to maximise the value we get out of our code scanning under a manual inspection time constraint. It’s worth spending extra time for catching the big fish, not for the small ones.

Depending upon how the tool is used, we may under some conditions want to report only things that are high confidence, and ignore everything else. A common example is in a CICD environment, where you want to break a build only if you are quite confident that something is wrong. So in our case, we have an option to only report very high confidence results. We also allow developers to flag false positives similar to how SonarQube does it, thus making it developer friendly.

What do we lose by not having data flow analysis?

While the expensive tools have the advantage of data flow analysis, our belief is that we can get a lot of what they get without it. The biggest exception is cross-site scripting, which our grep-based scanner does not find. We consider alternative tooling such as DAST to be an option for finding this serious issue.

Another example of an important issue that we are unlikely to find is logging of sensitive information. Typically that does require some type of data flow analysis — you’re almost never going to find code that directly does something direct like Log(password).

But something like SQL injection, we have found a number of times. The reason being — we report every SQL query we find. Yes, it’s noisy, but it takes a code reviewer at most 10 seconds of manual inspection time to see if there is a string concatenation in the query, and if that’s the case, it may be vulnerable. If not, the reviewer can quickly dismiss it as a false positive. Obviously more sophisticated code scanning can make this less noisy (example: regular expression code scanning), but this is our starting point.

Certainly tools with data flow analysis are more efficient for something important like SQL injection. However the benefit from that efficiency is often lost when weighed against how much manual time is lost on false positives for issues that are not that important and/or low confidence. Furthermore, some tools do not do so well in terms of rating the seriousness of a vulnerability: they can be over-zealous in calling too many issues critical or high risk (though this may be configurable). Because of this, the overall usage of our tool — at least for us — seems to require similar manual effort as some of the popular commercial tools despite the simplicity of the design.

Examples of things we found that our main commercial tool missed

Because of the simplicity of our design and because we assume that a competent reviewer is going to look a the code, we were able to put in rules that help the reviewer find things that big commercial tools tend to miss.

One very fruitful area in finding security bugs is bad crypto. As mentioned in Top 10 Developer Crypto Mistakes, bad crypto is more prevalent than good crypto. Part of the reason for this is bad crypto APIs. Another part of the reason is that the tools don’t help developers catch mistakes. Vendors of these tools could find a lot more problems if they were to listen to cryptographers. Our tool flags any use of “crypt” and any other key words that are indicative of bad cryptography, such as rc4, arcfour, md5, sha1, TripleDES, etc…. We go into more details about crypto bugs in the rules sets.

Another issue that we find surprisingly often that our main commercial tool missed was http://. Of course it should always be https://. Similarly one can look for ftp:// and other insecure protocols.

There are also a number of dangerous functions that we flag, and often these open up discussions about coding practices. For example, the Angluar trustAsHtml or React dangerouslySetInnerHTML. In our case, we had to educate the developers on the right approach to handle untrusted data, because they thought it was safe to sanitise data prior to putting in the database and then use trustAsHtml upon display. We did not agree with this coding style and worked to change it.

One thing that we like to always look at is MVC controller functionality because sometimes that’s where input validation should be applied (in other times you might do it in the model). We often find a lack of input validation, which does not necessarily imply a vulnerability but does show the need for improved defensive programming. Even more interesting, one time when we looked at a controller, we found that it was obviously not doing access control checking that it should be doing, and this was one of the most major findings that we caught. Normally access control problems are not something you would find with a SAST tool, but our differing approach to how to use the tool made it work for us.

Sample rules: starter pack

Rules can go on forever, so here I want to just focus on some of the more fruitful ones to get you started. Generally we do a case insensitive grep to find the key word, and then we grep out (“grep -v”) things that got in by chance but are not actually the key word we are looking for. We call these “exceptions” to our rules.  For example “3des” is indicative of the insecure triple DES encryption algorithm, but we wouldn’t want to pull in a variable like “S3Destination”, which our developers might use for Amazon S3 storage (nothing to do with triple DES). We learn the exceptions over time.

A very common grep rule that we need to make exceptions for is “http:”.  While we are looking for insecure transport layer communications, the problem is that “http:” is also used in XML namespaces that have nothing to do with transport layer security.  This means we have to grep out things like “xmlns”.  Also, writing logic to ignore commented code helps a lot — otherwise you will pull in many false positives like license example (http://www.apache.org/licenses) or stackoverflow links, which often occur in comments.

It may look like we omitted some obvious rules in some cases, but sometimes there are subtle reasons. For example, even though we do a lot of crypto rules, we don’t do an “ECB” (insecure electronic code book mode of operation) search simply because it is too noisy. Those characters “E”, “C”, and “B” all look like hexadecimal numbers, and we often got false positives from hexadecimal values in code. It was deemed not fruitful enough to keep such a rule (as explained in our design philosophy above).

We also try to group our rule by language to reduce false positives. However some rules apply to all languages. In our case we are primarily working on web software so we don’t specify languages like C/C++, etc…

Below is the starter pack of rules. Some rules are clearly more noisy than others — people can pick and choose the ones they want to focus on.

Grep string Look for Languages
password, passwd, credential, passphrase Hardcoded passwords, insecure password storage, insecure password transmission, password policy, etc…. all
sql, query( sql injection (string concatenation) all
strcat, strcpy, strncat, strncpy, sprintf, gets dangerous C functions used in iOS iOS
setAllowsAnyHTTPSCertificate, validatesSecureCertificate, allowInvalidCertificates, kCFStreamSSLValidatesCertificateChain disables TLS cert checking iOS
crypt hardcoded keys, fixed IVs, confusing encryption with message integrity, hardcoded salts, crypto soup, insecure mode of operation for symmetric cipher, misuse of a hash function, confusing a password with a crypto key, insecure randomness, key size too small.  See Top 10 Developer Crypto Mistakes all
CCCrypt IV is not optional (Apple API documentation is wrong) if security is required iOS
md5, sha1, sha-1 insecure, deprecate hash function all
3des, des3, TripleDES insecure deprecate encryption function all
debuggable do not ship debugabble code android
WRITE_EXTERNAL_STORAGE, sdcard, getExternalStorageDirectory, isExternalStorageWritable check that sensitive data is not being written to insecure storage android
MODE_WORLD_READABLE, MODE_WORLD_WRITEABLE should never make files world readable or writeable android
SSLSocketFactory dangerous functionality — insecure API, easy to make mistakes java
SecretKeySpec verify that crypto keys are not hardcoded java
PBEParameterSpec verify salt is not hardcoded and iterations is at least 10,000 c#
PasswordDeriveBytes insecure password based key derivation function (PBKDF1) c#
rc4, arcfour deprecated, insecure stream cipher all
exec( remote code execution if user input is sent in java
eval( remote code execution if user input is sent in javascript
http: insecure transport layer security, need https: all
ftp: insecure file transfer, need ftps: all
ALLOW_ALL_HOSTNAME_VERIFIER, AllowAllHostnameVerifier certificate checking disabled java
printStackTrace should not output stack traces (information disclosure) java, jsp
readObject( potential deserialization vulnerability if input is untrusted java
dangerouslySetInnerHTML dangerous React functionality (XSS) javascript
trustAsHtml dangerous Angular functionality

(XSS)

javascript
Math.random( not cryptographically secure javascript
java.util.Random not cryptographically secure java
SAXParserFactory, DOM4J, XMLInputFactory, TransformerFactory, javax.xml.validation.Validator, SchemaFactory, SAXTransformerFactory, XMLReader SAXBuilder, SAXReader, javax.xml.bind.Unmarshaller, XPathExpression DOMSource, StAXSource vulnerable to XXE by default java
controller MVC controller functionality: check for input validation c#, java
HttpServletRequest check for input validation java
request.getParameter check for input validation jsp
exec dynamic sql: potential for sql injection sql
getAcceptedIssuers If null is returned, then TLS host name verification is disabled android
isTrusted If returns true, then TLS validation is disabled java
trustmanager could be used to skip cert checking java
ServerCertificateValidationCallback If returns true, then TLS validation is disabled c#
checkCertificateName If set to false, then hostname verification is disabled c#
checkCertificateRevocationList If set to false, then CRLS not checked c#
NODE_TLS_REJECT_UNAUTHORIZED certificate checking is disabled javascript
rejectUnauthorized, insecure, strictSSL, clientPemCrtSignedBySelfSignedRootCaBuffer cert checking may be disabled javascript
NSExceptionDomains, NSAllowsArbitraryLoads, NSExceptionAllowsInsecureHTTPLoads allows http instead of https traffic iOS
kSSLProtocol3, kSSLProtocol2, kSSLProtocolAll, NSExceptionMinimumTLSVersion allows insecure SSL communications iOS
public-read publicly readable Amazon S3 bucket — make sure no confidential data stored all
AWS_KEY look for hardcoded AWS keys all
urllib3.disable_warnings certificate checking may be disabled python
ssl_version can be used to allow insecure SSL comms python
cookie make sure cookies set secure and httpOnly attributes all
kSecAttrAccessibleAlways insecure keychain access iOS

Collection of References on Why Password Policies Need to Change

Organisations like NIST and the UK National Cyber Security Centre (NCSC) are pushing password security policies that are much different from the past. Most notably, password expiry and character composition rules are being dropped, and replaced by other more user friendly recommendations.  Despite their efforts, many organisations are very slow in changing to the modern guidance, and instead remain with password policy practices that are characteristic of 2004 guidance. If you’d like to work towards change in your organisation, it helps to have a useful set of references to pass on to those who write the policies.

Is it worth trying to make the change? In my judgement, an important part of building a positive security culture is considering pain versus value in every decision you make. A lot of historical guidance for password security is high pain and little to no security value according to what we have learned, and sometimes causes more harm than good in ways that policy makers never anticipated. We also must remember that availability is one of the three pillars of security: when users get locked out of their account for reasons that can at least partially be attributed to password security policies, then we are not putting the best face of security forward.  This is especially true when there are better ways to do things:  Security policy writers need to keep their knowledge up to date.

This blog contains a information from various sources including research papers, reports/white papers, popular blogs, government websites, and news organisations. Dates of publications are included so readers can keep in mind that more recent publications are based upon more recent knowledge. While there are many existing blogs on password security, the main contributions here are:

  1. Assembling a large number of modern sources together,
  2. Providing compact summaries of why the sources are relevant to the purpose of making password policy change within an organisation,
  3. Including research publications that show where new recommendations are derived from (thus going beyond “appeal to authority” arguments — the research is there for anybody who cares to check it themselves).

Overview of modern password security guidance

  • (WSJ news article – requires subscription, 2017) The Man Who Wrote Those Password Rules Has a New Tip: N3v$r M1^d!. This article talks about how the author of the classic NIST document that proposed composition rules and similar guidance (from NIST Special Publication 800-63 Appendix A) now rejects those recommendations. Those recommendations seemed to be okay back then, but by what we have learned today, they no longer serve the original intent. New recommendations are written that balance security with usability needs.
  • (Naked Security news article, 2016) NIST’s new password rules – what you need to know. This is a compact summary of the changes to NIST’s password security guidance. Because the article is short, it is a good way to open up the conversation with people who have limited time or attention span. It tells what policy makers need to change to be in compliance with NIST guidelines.
  • (Troy Hunt blog, 2017) Passwords Evolved: Authentication Guidance for the Modern Era. This is quite a long, but very well written article about the changes to NIST’s password guidance and why the changes are being made. It also draws from the UK government’s guidance and Microsoft’s guidance to strengthen the argument. The section on “Listen to Your Governments (and Smart Tech Companies)” makes a great appeal to authority argument on why companies should take these recommendations seriously. Overall, it is a good read for those who have time and interest to read it.

The new recommendations — original sources

  • (NIST publication, 2017) NIST Special Publication 800-63B. This is right from the horse’s mouth, but it’s not a document to start the conversation with because the document is large and uses terminology that will take time for many readers to interpret. It is nevertheless the proof you will need if people question whether NIST is really making such a recommendation. For example, the requirement not to expire have composition rules for passwords (“memorized secrets”) is given in section 10.2.1. That section also says these secrets should not be required to change periodically. The requirement to not allow hints to recover passwords is given in section 5.1.1.2. The requirement to not allow secret questions for account recovery is better found in NIST’s FAQ.
  • (NIST web page FAQ, 2019) NIST Special Publication 800-63: Digital Identity Guidelines Frequently Asked Questions. This is an easier place to start than the NIST original publication, as it is more human readable. It gets right to the point on questions like why composition rules are no longer recommended, why password expiry is no longer recommended, and why secret questions for account recovery are no longer permitted.
  • (NCSC publication, 2018) Password administration for system owners. These recommendations are similar in nature to the NIST recommendations. See sections on “Don’t enforce regular password expiry” and “Do not use complexity requirements”. But additionally worth pointing out is “Reduce your organisation’s reliance on passwords” (we have too many passwords already) and “Implement technical solutions”, which includes other ways guidance on helping protect user passwords while not locking users out of their accounts long-term.
  • (NCSC infographic, 2018) NCSC Password Policy Advice for system owners. A simple graphic that shows the risks and how to help improve password security within an organisation. The right half in purple gives password policy recommendations (unfortunately, while most of the infographic is good, there is one serious blunder in it where it recommends SHA-256 for password hashing. SHA256 was not designed for this type operation. Those who know the cryptography understand this, those who don’t can get the understanding from an old Troy Hunt article).
  • (NCSC blog, 2016) The problems with forcing regular password expiry.  A simple blog that explains why password expiry causes more harm than good. In short, users choose similar passwords to previous passwords and minimally meet the complexity requirement to try create a password they can memorise. There is also an increased tendency of forgotten passwords, that causes a productivity cost to an organisation.
  • (Microsoft publication, 2016) Microsoft Password Guidance. This report is great, because it is easy to read and gets right to the point. On the first page it enumerates 7 recommendations to system administrators which include things not to do (do not enforce password expiry or composition rules) and what to do (8 character minimum password, ban common passwords, etc…). Further in the document it goes into more details and includes references to research reports that justify why the changes are being made. Overall, the clarity, simplicity, and completeness (especially research references) of this publication make it a top source for referencing to others. However, because it predates the NIST and NCSC changes, it lacks the references to NIST’s new guidelines. It also lacks details on how organisations can implement the risk based multi factor authentication that it recommends.

Research on why password policies need to change

  • (Research paper, 2010) The Security of Modern Password Expiration: An Algorithmic Framework and Empirical Analysis. The main goal of password expiry is limiting the amount of time an attacker has access to an account in the event of password compromise. This paper questions whether that goal is met by analysing a dataset of 7700 accounts to determine whether knowing one password allowed recovering other passwords for that user. The fallacy around password expiry recommendations is that security administrators assume users choose each password randomly and independently from other passwords, but the reality is that the vast majority of users don’t . The paper shows even for websites where users have an incentive in protecting their accounts (for example, it holds payroll data), new user passwords tend to be strongly related to previous ones. The study found that could derive 17% of new passwords within 5 guesses and 41% of new passwords within seconds in an offline attack with little effort. The authors write:

    “Combined with the annoyance that expiration causes users, our evidence suggests it may be appropriate to do away with password expiration altogether, perhaps as a concession while requiring users to invest the effort to select a significantly stronger password than they would otherwise (e.g., a much longer passphrase).”

  • (Research paper, 2010) Testing Metrics for Password Creation Policies by Attacking Large Sets of Revealed Passwords.  When NIST wrote their 2004 Special Publication 800-63 that had the recommendations for password policy, they included entropy estimates on how strong the passwords complying to the policy would be. This research does password cracking on sets of real user data to prove that those estimates are far too high. Part of the reason for this is that users tend to follow similar patterns when forced to comply with a password policy. For example, when required to use digits, users tend to either choose all digits or else put the digits at the end of the password (nearly 85%); When required to use an upper case letter, users tend to use either all upper case or put only the first letter as upper case (89%); and when required to use a special character users tend to put it at the end of the password (28.5%). An attacker can use known human behaviour to increase his chances of success for password cracking attacks. The paper notes that using a password blacklist of 50,000 passwords helps significantly, but not as much as NIST predicted. The authors conclude

    Our findings were that … most common password creation policies remains vulnerable to online attack. This is due to a subset of the users picking easy to guess passwords that still comply with the password creation policy in place, for example “Password!1”

  • (Research paper, 1999) Users are not the Enemy.  This paper surveys a large number of users to understand the problems they have with password security. While it is true that many users do not assume the responsibility they should for protecting their accounts, the study also finds that part of the problem is the difficulty users have with complying with password security policies. Users have a large number of passwords that need to change regularly and each having different password complexity requirements, which makes it much more likely that users will do things that they should not do just so they do not lose access to their accounts. The section on Security needs user-centered design notes that

    Many of these [security] mechanisms create overheads for users, or require unworkable user behavior. It is therefore hardly surprising to find, that many users try to circumvent such mechanisms.”

    The paper concludes with a set of recommendations to help users with passwords. However, this paper is old (from 1999) so therefore the recommendations are also subject to the knowledge of the time.

  • (Research paper, 2010) The True Cost of Unusable Password Policies: Password Use in the Wild.  The authors note that users are generally cogent in their understanding of security needs, but found compliance to passwords policies too difficult. To cope with security demands, users developed their own strategies, which end up introducing their own problems. As a consequence, the complexity of complying with security policies results in an adverse effect on the security posture of the organisations. For example, one user dealt with a policy that forced him to change his password in a way that it was not similar to any 12 of his previous passwords by just writing down the password so he would not forget it. In the section on Towards Holistic Password Policies, the authors note that just looking at the technical side and ignoring the user side, does not encourage security awareness — instead it introduces problems that antagonise users. Instead, password policies need to be design for the context in which users use the systems for, with an emphasis on eliminating the risks that they are likely to face in that context.
  • (Research paper, 2007) Do Strong Web Passwords Accomplish Anything?. There are many ways that an attacker may go after a user, and password policies only address defence against brute forcing attacks. They do not address phishing, key logging, shoulder surfing, insecure password storage on local machines, and guessing based upon special knowledge about the user. Although strong passwords do help in some cases, there are other more user-friendly security controls that could be put in place of the complex password policy. These other controls make strong password policies less important. The paper argues

    Since the cost is borne by the user, but the benefit is enjoyed by the bank user resistance to stronger passwords is predictable. We argue that there are better means of addressing brute-force bulk guessing attacks.

    However this paper is from 2007. See next item which is related but more recent:

  • (Blog, 2019) Your Pa$$word doesn’t matter.  This blog unfortunately has a misleading title that caused many readers to misjudge it before reading it. It is not telling users not to choose strong passwords, but instead is saying that password complexity policies have little benefit when considering the various ways that passwords are attacked. In spirit it is similar to the previous item, but this research is more modern. It tells system administrators and security policy writers:

    “Focusing on password rules, rather than things that can really help – like multi-factor authentication (MFA), or great threat detection – is just a distraction.”

  • (Research paper, 2015) Quantifying the Security Advantage of Password Expiration Policies. The abstract gets right to the point:

    “Many security policies force users to change passwords within fixed intervals, with the apparent justification that this improves overall security. However, the implied security benefit has never been explicitly quantified. In this note, we quantify the security advantage of a password expiration policy, finding that the optimal benefit is relatively minor at best, and questionable in light of overall costs.”

    This paper does a mathematical analysis on the benefit of password expiry policies. Without considering the possibility of new passwords being related to old, this paper instead considers an attacker who keeps trying to guess the user password in an online attack even after the password might have changed without the attackers knowledge of that change happening. In essence, they show that the attacker’s chance of success is not largely different than if there was no password expiry policy in place. And this is even if the password is chosen randomly (most users don’t do this complexity, which makes attacks easier). The authors conclude by challenging those who favour such policies to explain why and in which specific circumstances a substantiating benefit is evident.

  • (Research paper, 2009) It’s no secret : Measuring the security and reliability of authentication via ‘secret’ questions.  Historically, requiring users to provide answers to secret questions upon registration was a way to do account recovery in the event of forgotten password. This paper analyses these practices and finds that 20% of users forget their own answers to secret questions within 6 months, acquaintances of such people can guess answers to their secret questions 17% of the time, and 13% of the time an answer could be guessed by an attacker within 5 attempts by trying the most popular answers. The authors conclude that these questions are neither reliable nor do they meet security requirements. They propose a number of options to improve secret questions or have alternative backup authenticators, but ultimately this paper more than any other lead to the removal of secret questions for account recovery.

Other references

  • (Microsoft blog, 2019) Security baseline (FINAL) for Windows 10 v1903 and Windows Server v1903.  Periodic password expiration will no longer be enabled in Windows 10 and Windows Server. The blog writes

    “Periodic password expiration is an ancient and obsolete mitigation of very low value, and we don’t believe it’s worthwhile for our baseline to enforce any specific value. By removing it from our baseline rather than recommending a particular value or no expiration, organizations can choose whatever best suits their perceived needs without contradicting our guidance. At the same time, we must reiterate that we strongly recommend additional protections even though they cannot be expressed in our baselines.”

  • (FTC blog, 2016) Time to rethink mandatory password changes.  This is from a former Chief Technologist of the US Federal Trade Commission. The author explains why requiring users to change their passwords does more harm than good, and includes a long list of references like many included here to back up the argument. While there may be reasons for you to change your password (examples given), requiring regular changes in a password policy is not necessarily good practice:

    Research suggests frequent mandatory expiration inconveniences and annoys users without as much security benefit as previously thought, and may even cause some users to behave less securely. Encouraging users to make the effort to create a strong password that they will be able to use for a long time may be a better approach for many organizations…

Concluding remarks

The shortcomings of legacy password policies are well documented, however there are two distinct philosophies for moving forward.  On one hand, there remains the “fix the user” philosophy which pushes education and password managers as the main mechanisms for protecting user accounts.  The alternate approach is to design systems that are less reliant upon passwords as the sole determinant for authentication, which is the approach that Google and Microsoft seem to be taking (related: see Protecting User Accounts When Usability Matters).  I honestly believe that both philosophies have a place going forward.  The problem today is that there is way too much emphasis on the former and not enough effort being put into the latter.  We can’t always count on people to do the right thing, but there are often things we can do to protect their backs when they are negligent.  Putting security complexity burden on the user should be the last fallback option, not the default option.

Protecting User Accounts When Usability Matters

Scenario: Password guessing attacks are happening on your website. The attacker is performing password spraying: he tries a single password for a user, and if it fails, he moves on to the next user. The attacker is also changing his IP address, including the use of IP addresses that are geolocated where many legitimate users come from.

Because of the attacker’s tactics, blocking IP addresses and account lockouts won’t work. You also work for a business that is very sensitive to security controls that impact usability. Captchas, two-factor authentication, and stronger password security policies are rejected by the business.

What can you do?

This blog is about a security control that largely prevents these type of password guessing attacks at minimal usability impact. We call it One-Time Two Factor Authentication (OT2FA). It’s a simple idea derived from a number of sources, including:

Revisiting Defenses Against Large-Scale Online Password Guessing Attacks by Mansour Alsaleh, Mohammad Mannan, and P.C. van Oorschot.
Securing Passwords Against Dictionary Attacks by Benny Pinkas and Tomas Sander.
Enhanced Authentication In Online Banking by Gregory D. Williamson.

OT2FA should not be considered new, as it has strong similarities to what companies like Google and Lastpass are doing (though their implementation details are unpublished). However too many websites are doing alternatives that are both less secure and less user friendly.

One-Time Two Factor Authentication

Two-factor authentication (2FA) is an effective security control for preventing password guessing attacks, but it comes at a large usability impact: users don’t like be challenged for a one-time secret code every time they login. Businesses that are trying to acquire new users to win market share are averse to security controls that annoy users.

But what if one could get close to the security of 2FA with little usability impact? This is what OT2FA aims to accomplish.

The idea is simple: the first time a user logs in from a new user agent (i.e. browser or client software), require them to prove their identity via two-factors: their password, and a secret code that is emailed or SMSed to them (note: email is preferable as SMS security is known to be weak). When the user succeeds in proving the identity, provide a digitally signed token, such as a JWT, back to the user agent: “User X signed in from this device before.” Instead of the actual user name, something like a UUID should be used for the identity, which is tied to the username inside the server database. The token, called a OT2FA token, serves the purpose of marking that user agent trusted for that user. For web browsers, it is particularly convenient to store it in a cookie.

The next time that same user logs in from that user agent, they only need to provide the username and password, and the OT2FA token is sent up transparently to the user as the second factor proof of identity. The server authenticates the user by verifying the username and password are correct, the digital signature on the OT2FA token is valid, and the identity of the user in the OT2FA token maps to the username provided. If all are correct, access is granted without requiring a 2FA challenge from the user. In other words, the second factor challenge to the user happens only the first time he logs in from a particular user agent, and then the user never sees the second factor challenge on that user agent again. Hence the name: One-Time Two Factor Authentication.

This protection is not perfect, but it is a huge improvement in security at little impact to the user. The main thing is that it stops the large scale password guessing attacks: an attacker can only succeed against a user if he not only knows the username and password, but he also can crack the second factor or somehow get the user’s OT2FA token. If we make the assumption that no attacker is going to collect a large number of user OT2FA tokens (discussed further in the Potential Concerns section below), then we would believe that we have stopped the large scale password guessing attacks.

For emphasis, OT2FA is designed only to prevent password guessing attacks against users. There is no need to challenge the user for a second factor when they sign up for an account – new users should get an OT2FA token by default.

Below, we will discuss enhancements and then potential security concerns, but let’s first review where we are. In the context of password guessing attacks, Username/password-only is low security, but very usable. Two factor authentication is high security, but scores low on the usability scale. OT2FA is not as secure as 2FA nor as user-friendly as Username/password-only, but is not bad in either category, and could arguably considered good in both. Therefore OT2FA is a realistic security option for websites built with a strong emphasis on usability.

Enhancements

There are a many directions that one can go to enhance this idea.

For example, by including a unique identifier for each OT2FA token and storing the corresponding value in the database, you can give users the option to revoke OT2FA tokens living on trusted devices / user agents that should no longer be trusted for that user. So although you cannot make the OT2FA token go away, you can enforce on the server side that the specific OT2FA token being sent up is one that the user has not revoked. Some subtleties are discussed in Footnote 1 at the bottom.

Alternatively to maintaining server side state for revocation purposes, one could expire the OT2FA token. Indeed, many implementations (gmail, Lastpass, Azure DevOps, etc…) do this with a fixed expiry, and ask the user for a new 2FA challenge on a regular basis. The problem with expiring the token after a fixed interval is that it no longer meets the “one time” requirement for this design.

A more user-friendly approach to fixed token expiry is to set an initial expiry, but generate a new token with extended expiry each time the user returns. This mimics how “remember me” functionality is often implemented. If a user’s OT2FA token becomes known to an attacker, the user’s only defence then is his password, which needs to be strong enough to prevent the attacker from getting in until the compromised OT2FA token expires.  Although less than ideal, attackers are largely limited in the number of accounts they can go after assuming that there is no mass OT2FA token leakage.

Another direction is that OT2FA can be combined with (temporary) account lockout. Different failed password attempts thresholds can be allowed depending upon whether or not a valid OT2FA token is present. For example, one can impose a temporary lockout when the OT2FA token is not present, but still allow logins when the OT2FA token is present – provided that the second (higher) threshold for failed password attempts is not reached. This allows legitimate users, coming from trusted user agents, in easily while preventing hackers from getting in at the same time. It also mitigates the risk of an attacker trying to lock out a legitimate user from his account.

Another idea is allowing more than one user to login from a user agents for shared devices/computers. However there is a risk of over-engineering the implementation for limited benefit.

One can also consider a number of options to rolling this out smoothly, so it becomes as transparent as possible when the technology is first adopted by a company to their existing user base. There are many approaches for that, which would be too much of a tangent to expound upon here.

Aside: Clarification on Tokens

OT2FA tokens should not be confused with session ids or session tokens.

Session ids/tokens are high value and relatively short lived.  If an attacker captures a session token, he then can hijack the victim’s account.

OT2FA tokens are long-lived tokens and are insufficient for hijacking an account.  OT2FA tokens serve one purpose: to limit the ability of a hacker to perform password guessing attacks on user accounts.

Potential Concerns

The implementation may leak the validity of the password: The most straightforward way of implementing this is to only challenge for the second factor after the the username and password are confirmed. Note that if it is not implemented this way, then the user would (under some implementation assumptions) get notified every time somebody malicious tries an incorrect password for that user, which would be annoying and a cause of unnecessary stress to the user.

The fact that the validity of the password is leaked is not necessarily a bad thing. In fact, depending upon the wording of the email/SMS with the 2nd factor code, it may be a good thing because it alerts the user that there is good reason to change your password. For example, a notice like:

A new device has attempted to login to your account! If this is you, please click this link to prove your identity.
If this was not you, then it means that somebody has your password. Don’t panic: we have protected access to your account. However, we recommend that you change your password to prevent this person from continuing to attempt to access your account.”

Remark: The academic research papers mentioned above are more sophisticated than OT2FA, and attempt to hide the validity of the password (See Section 4 of Pinkas and Sander paper). But they do so using captchas, which we consider a no-no for usability and accessibility reasons. Not to mention that machines seem to be better than humans at captchas nowadays, which defeats the whole point of the technology.

Brute forcing the second factor: When a hacker gets the username and password correct, he can then focus his attention on brute forcing the second factor. This is only practical for the attacker if the second factor is brute force-able (for example, a 6-letter code), in which case it needs to be prevented in some way. For example, for too many wrong second factor guesses, impose a temporary account lockout for devices that do not have a valid OT2FA token for that user. The length of the time for locking the account should be dependent upon the the time to brute force the second factor challenge, and the user should be notified of the lockout via email or SMS.

Private browsing: For those who use private/incognito browsing, they may be forced to do 2FA every time because the OT2FA token does not persist on the client device. Allowing security and privacy conscious individuals to opt-out is one compensating control to address this.

Public/shared computers: You would not want to store the OT2FA token on a shared computer, because this would allow a hacker to capture it and then brute force the password without the second factor challenge. The first defence is to allow the user to click “shared computer” upon login, which will prevent the token from being stored on it. Having a revocation mechanism (see enhancements section) is a second defence.

Stolen OT2FA token: One way to reduce the risk of cookie theft is to make cookies httpOnly. But this alone cannot be relied upon, since there are a number of ways cookies may leak – not to mention that you might not even be using cookies for storing the token.

If the OT2FA token is stolen, it reduces to the security of password-only for that user – assuming the attacker knows to whom the token belongs. Having a revocation mechanism (as described above) is a compensating control.

Mass OT2FA token leakage: If there is mass token theft, then the security reduces to password-only under the assumption that the attacker can somehow map each token to each username. Usernames should not be put in tokens, instead unique identifiers should be stored there that are mapped to the user via the database. If the attacker is able to brute force a large number of these, it implies that the attacker not only has the tokens but also the map between the token and the user. This is a bad situation, and it requires a serious response from the owners of the website. The recommended action is to roll the key that is used for signing the OT2FA token, which means each user has to perform the OT2FA on each of their user agents again.

User loses access to email: If the user loses access to the email address that the second factor authentication requests is sent to, then he cannot add a new user agent as trusted. However, if the user has one already trusted user agent, he can use that one to update his email address on the system, thus working around the problem. When a user’s email address is changed, an email should be sent to the previous address to make sure this did not happen maliciously. There are other controls that can be added to reduce the risk of a hacker who succeeding in defeating the OT2FA from locking out a user from his account.

Questions

Is this the same as OWASP page on device cookies? No. It is similar, but the OWASP description gives out device cookies upon valid username/password without requiring the second factor authentication. As mentioned on the OWASP page, it does not stop password spraying attacks like OT2FA does. It also cites the source as Marc Heuse and Alec Muffett, whose discussions on the topic came years after the research cited at the top of this blog.

Is it just risk based authentication? Risk based authentication was first described in Enhanced Authentication In Online Banking by Gregory D. Williamson in 2006, which we cited at the beginning. The document recommends a number of ideas for enhance security, such as:

Machine Authentication, or PC fingerprinting, is a developing and widely used form of authentication (FFIEC, 2005). This type of authentication uses the customer’s computer as a second form of authentication (PassMark Security, n.d.). Machine authentication is the process of gathering information about the customer’s computer, such as serial numbers, MAC addresses of parts in the computer, system configuration information, and other identifying information that is unique to each machine. A profile is then built for the user and the machine. The profile is captured and stored on the machine for future use by the authentication system (PassMark Security, n.d.). Once the PC fingerprint is gathered, the system knows what machine attributes should be present when the user attempts to access their online bank account (Entrust, 2005). This type of authentication usually requires the user to register the machine at first sign on. If a customer logs in from another computer the system will know to further scrutinize the login attempt. At this point the system can prompt for additional authentication, such as out of band authentication or shared secret questions.”

The concepts here are very similar to the description of OT2FA except they use device fingerprinting instead of digital signatures to identify devices. Indeed, if one Googles for risk based authentication, many websites (example) talk about device fingerprints without mentioning the concept of digital signatures.

Device fingerprints are less preferable than digital signatures. Through reverse engineering, one is able to determine what device properties are used for the fingerprint. Depending upon exact details, this could potentially be used by a hacker to brute force the fingerprint of a victim once he knows the password. In contrast, brute forcing a cryptographic digital signature is not practical assuming that the crypto is done correctly.

In general, OT2FA is a special case of risk based authentication. It is a particularly simple and strong way of implementing the concept.

Conclusion

When security does not have an adequate answer, we often transfer the burden to the user. Putting too much burden on the user is bad security practice, as it violates the important security principle of pyschological acceptability.

In regard to login security, complex passwords, password rotation, captchas, 2FA, etc… are all poor solutions for a general audience due to the burden they put on users. Users resoundingly reject these ideas and technologies for day-to-day online activities, so something better is needed.

OT2FA is a practical tradeoff between security and usability. It offers much stronger than username/password-only security at very little usability impact. It can also fairly easily be implemented by any organisation. Most importantly, it is a realistic/practical solution for stopping large scale password guessing attacks without significantly burdening users.

Footnotes

Footnote 1: If one takes the unique identifier in the OT2FA token for revocation purposes approach, the use of server side persistent storage can change the whole implementation: rather than using digital signatures on the tokens, instead store a copy of each non-revoked token for each user on server side persistent storage.  When a user logs in with a OT2FA token, a simple verification that that token is in the server side database for that user takes the place of any digital signature check.  The major downside of this approach is that it requires more storage: one token for each device for each user, and any user could potentially generate a large number of them for himself using automated means.  If this is considered a threat, then obvious mitigating controls can be put in place.

Acknowledgments

The author would like to thank Dharshin De Silva for feedback on a preliminary version of this document.