One of my favourite authors in the field of computer security is Gary McGraw. If you are not familiar with him, I’d suggest you start by reading his book Software Security: Building Security In. One of the key points he makes is a distinction between security bugs versus security flaws: the former are the simple problems that involve only a small pieces of code, such as cross site scripting, SQL injection, and lack of input sanitisation; the latter are more complex problems at the design level, and thus cannot be pin-pointed to a small section of code. Gary points out that half of the problems he sees in practice are bugs, the other half are flaws. These odds are not good when you take into consideration that flaws are much harder than bugs to fix, because they are ingrained into the software’s design.
So here I want to talk about how this applies to the recent Ashley Madison hack. But I should be clear that calling Ashley Madison a “design flaw” may be a stretch by the current-state-of-the-art in web security software. What I hope for is that some time in the future, major systems are designed with much more thought into their security and the protection of private information. Our story starts with what Ashley Madison did right: using bcrypt to protect passwords.
Ashley Madison used Bcrypt to Protect Passwords
As noted in a number of online articles, Ashely Madison protected passwords in the database the right way: they used bcrypt. (EDIT: On 10 September, a researcher found that the website had a security bug allowing attackers to bypass the bcrypt computation — regardless, continue reading because the real value of this article is when we discuss ephemeral knowledge web applications below). Ask a leading security practitioner about protecting passwords in databases and they will recommend either bcrypt, scrypt, or PBKDF2 (or Argon2 if they have been following the password hashing competition). So, Ashely Madison’s did not have a software bug in the password protection.
But let’s step back a moment and ask why tools such as bcrypt are recommended, for which we cite the web’s subject-matter expert, Thomas Pornin:
Let’s see the following scenario: you have a Web site, with users who can “sign in” by showing their name and password. Once signed in, users gain “extra powers” such as reading and writing data. The server must then store “something” which can be used to verify user passwords. The most basic “something” consists in the password themselves. Presumably, the passwords would be stored in a SQL database, probably along with whatever data is used by the application.
The bad thing about such “cleartext” storage of passwords is that it induces a vulnerability in the case of an attack model where the attacker could get a read-only access to the server data. If that data includes the user passwords, then the villain could use these passwords to sign in as any user and get the corresponding powers, including any write access that valid users may have. This is an edge case (attacker can read the database but not write to it). However, this is a realistic edge case. Unwanted read access to parts of a Web server database is a common consequence of an SQL injection vulnerability. This really happens.
There are other ways than SQL injection that a database can leak, including insecure backups, attacker getting a reverse shell (can happen a variety of ways) on the system, physical break-in, insider threat, and poor network configuration. We don’t know for sure what happened in the case of Ashley Madison, but there is some indication here.
If you have been following security for a while, you will know that database exposures happen all the time. The good news is that with the exception of not having a password complexity policy, Ashley Madison’s website did protect passwords the right way!
So What’s the Problem Here?
Glad you asked. You see, the passwords were protected via bcrypt as a “second line of defence“, i.e. to prevent hackers from getting user passwords in the event of a database leak. But in retrospect, we now know that the hackers that got the data did not give a darn about the users’ passwords: instead, they wanted the users’ private information such as names, addresses, and email addresses. So why didn’t user private information get the same second level of defence protection that the user passwords got?
Let’s be honest. Almost certainly the answer is that the web designers never thought about it, or perhaps never cared about that question, and instead just put in their best effort to build the website to the budget that they had, and following the general security best practices that they were aware of.
But the honest question is less interesting to me than the research question: how would you design a website that has a second line of defence for protecting members’ private information? In other words, if the hacker can get the database, can the information in the database still be protected? The applicability of this question is not limited to adult websites, for example it may be of value to websites holding patient medical information.
how would you design a website that has a second line of defence for protecting members’ private information? In other words, if the hacker can get the database, can the information in the database still be protected?
If you followed me this far, maybe you realise that we are thinking about this from a threat modeling perspective (identifying assets that the system holds and mechanisms for protecting those assets), and we are trying to architect a system that better protects data in the event of a security breach.
Pause and Understand what We are Trying to Solve
If you come from an operational security background, you may be thinking that the problem with the Ashley Madison breach is the lack of operational security defences. You may be thinking about monitoring, altering, network configuration, patching, intrusion detection, and so on.
Those defences are all fine, but that’s not what we’re trying to do here. Instead, we’re trying to solve it from an application security perspective: building applications that resist attacks in the event of other things failing. Think of it as defence in depth: if everything else fails, we still want the data to be protected, just like the passwords are protected.
It’s Not as Simple as Encrypt the Data
I always get amused when people think the solution to everything is encryption. Encryption is easy, key management is hard. If you are going to encrypt the database, then where do you put the key? Given that the website needs to be able to decrypt content as it is needed, it implies that a hacker who gets a shell on the system would also be able to decrypt data as it is needed, so we haven’t really solve anything yet.
Towards a Solution: Basic Concepts
A former colleague of mine, Blair Strang, had a lot of great ideas about protecting private information. What I write here is largely influenced by his ideas (though I take the blame for any errors in presentation).
This is the most important paragraph of the whole blog: read it three times. We start with the concept of a zero-knowledge web application, which is a web application built so that not even the server can decrypt the data, and we’re going to relax it slightly and instead require that the server can only decrypt a users’ sensitive data when a user has an active session. This means that if the system is hacked at any point in time, then only those with active sessions at that time will have their data compromised: other users will be safe by design. In the event of a breach, most of the data will be protected.
Why do we make this relaxation? Because a zero-knowledge design is overkill, and hard to realise in practice. Zero-knowledge web applications are designed with the goal of making it so that you do not even need to trust the service provider, which has the side effect of limiting the features that the server can provide. We want instead a design where we trust the service provider, however data still remains largely protected in the event of an intrusion. This means that the server is internally making its best effort to enforce the least privilege concept on sensitive data through a clever use of key management and cryptography (disposing of the cryptographic key and unencrypted data at the end of the user’s session as our second line of defence). We will call this concept an ephemeral knowledge web application.
As we go forward, keep in mind that a user will typically have sensitive and non-sensitive information in the database. Taking Ashley Madison as an example, users will have some information that they want to be public (the type of affair they are looking for, their interests, etc…), and other information they want protected (name, email address, address). The non-sensitive information will be unencrypted and the sensitive information will be encrypted in our design.
A Simple Example: Protecting the Email Address
Let’s start simple, so simple that we will not even use cryptographic keys in this example. Suppose the user requires the email address and the password to login. We already know we are protecting the password in the database by bcrypt, but can we not do the same thing with the email address? Almost, except the salt will bring in some trouble, but if we use a fixed salt for the email protection (still have varying salt for the password), then we have already made progress.
Consider how the user would login: User enters his email address and password. System applies bcrypt to email address with fixed salt to look up user in database. From that, system gets the salt that applies for the password, and system computes bcrypt on user provided password with salt to see if it matches hashed password in database. If so, user is granted access. Therefore, system granted access without storing the user’s email address in plaintext form in the database. Note that if user does not exist in the database, see my previous blog for how to handle it properly (preventing account enumeration).
What about user forgetting password? No problem: user enters email address on forgot password page, system applies bcrypt with fixed salt on user entered email address to see if the user exists in database, and assuming yes, then emails the user a secret link for password reset. Within the database, we have to associate that secret link to this user to make sure he only resets his own password (not somebody else’s!)
What if the database is exposed? Anybody could determine if a specific user is in the database by computing bcrypt on that user’s email address and looking for a match in the database, but nobody is going to be able to reverse the entire collection of email addresses, which is a big improvement over the present situation.
This illustrates the concept for a second line of defence for email address protection, but it’s going to be trickier for other data.
Of course those who can get shells on servers and scrape memory will know that the user information (username and password) is still in memory somewhere, until it gets overwritten. It would be nice if Java had a secure storage for sensitive content like .Net does.
A Second Line of Defence for All the Sensitive Data!
We were able to do quite a similar concept on email addresses as is being done on passwords because the email address is being used to login, but this won’t work for other private data that is not presented by the user when he logs in. So we’re going to have to change our strategy to protect more. I’ll just give high level details here — the lower level details can be worked out.
Suppose rather than using bcrypt to just verify the user password, we also use it in another way to derive an encryption key for that user, K_user. We can imagine for example that K_user is derived from bcrypt applied to the username, password, and salt input combination. After user is authenticated, K_user is derived, and that key is used to encrypt and decrypt all of the user’s private data. The data remains decrypted for the session (in separate database table), but at the end of the session the plaintext data it is securely deleted from the database.
What if one user wants to share information with another user, for example, Bob on Ashley Madison wants to have an affair with Alice (who might be a bot). Bob needs some way to share information, such as his email address, with Alice, and that information has to still be encrypted and unaccessible by the server after he logs out. The solution to that is bringing in public key cryptography.
Each user needs a public/private key pair. Each users’ private key needs to be encrypted with K_user. The public key is not encrypted (it is not sensitive). Now Bob can send his private information to Alice through the system using her public key. When she logs in, the system can decrypt and present it to her. Yet when she is not logged in, the system cannot decrypt it because it does not have her password available to do decrypt her private key.
Okay, that’s progress, but what if we need administrators to have access to some of that private data when they need it? That functionality can be built in too using a public/private administrator key pair. But the private administrator key should never be on the same system. Instead, it needs to be on a separate, network-isolated system because if the hacker can get access to that key and the database, you’ve lost. By having a separate, isolated system for administration, you have a much stronger defence in the event of an attack when compared to the current state-of-the-art. However, understand the security tradeoffs that are being made: having administrative accessibility to all the data is less secure than not having that feature, but it is still stronger than designs like Ashley Madison where there is no second order defence to a leaked database.
There is one last gotcha that is quite important: what about user forgetting their password? We can still use the forgot password email secret link for the user to reset their password, but the private data provided by the user is no longer accessible because K_user is lost. So we either have to tell the users that they lose their information if they forget their password, or the users need to involve an Administrator in an information recovery process. There are various complexities in how one might build that, but that’s a story for another time.
Generic security advice is useful and common in the web application security industry, but it only solves part of the problem. You also need to think about the application you are building and what are the threats specific to it. This is where threat modeling comes in. As we have seen from the Ashley Madison breach, there are a number of systems where protecting personal information in the event of a security breach would have high value, which is a more difficult task than protecting passwords in the event of a security breach. However, we introduced a design pattern called ephemeral knowledge web application that illustrates how such protection can be achieved. Ephemeral knowledge web applications are applicable to designs where trusting the server is acceptable, yet the server holds sensitive personal data that needs a second-order defence protection.