Pitfalls in writing cryptographic applications
by: Thomas Olofsson, Chief Technology Officer at BTblock
First, let me start with stating that I am old enough for the abbreviation, “crypto,” to actually refer to cryptography and not crypto-currencies. For the remainder of this article, when I use the word “crypto”, i’m referring to cryptography.
There is a lot of hype around blockchain, distributed ledgers, and cryptocurrencies. But what you have to understand is that it is all based on cryptographic hashes.
In this blog post it is a bit out of scope to explain how blockchain works, so instead I will try to explain the cryptographic primitives relevant to the blockchain application as well as share some of my experience from both auditing cryptographic applications (both blockchain as well as traditional ones) as well as writing and implementing cryptographically secure applications.
Writing secure code is HARD and writing secure cryptography code is even harder. With the explosion of the blockchain, cryptocurrencies, and distributed ledgers, developers with real crypto experience is in dire need. The shortage of experienced developers often leads to poor implementation and the lack of security knowledge sometimes leads to gaping security holes in many applications.
This blog post is written to act as a guide if you are starting to either develop secure cryptographic applications or if you are starting to look for vulnerabilities in other applications for fun or profit.Without further ado, let me share my views of the most common pitfalls developers fall victim to when implementing cryptographic applications.
Lesson number one: Random is hard
Randomness is the lack of pattern or predictability of events. A random sequence of events, symbols or steps has no order and does not follow an intelligible pattern or combination. Sounds easy, right? Well, it turns out that computers are good at following instructions in a predictable, controlled way that is repeatable every time. And when you think about it, that is the opposite of random.
So how do we get the computers to provide us with a true random? Well, there are many tested ways but most of them are quite expensive from a computational perspective. Since computers are inherently bad at random we must look at other sources of randomness.
People have used many sources of ‘random’ to create true randomness. The source considered most random is background radiation and decay of radioactive isotopes. The problem with sampling the real world is the sampling. Most computers are not equipped to measure the universe and many applications and operating systems have to resort to sampling other things that are hopefully random. visually the coolest approach to true random was the lavarand generator, which was built by Silicon Graphics and used a video of an array of lava lamps to generate true random numbers. If you have dealt with encryption, you have undoubtedly come across the buzzword "entropy."
Here it is used as a sentence: “This military-grade encryption has 256 bits of entropy”. So what does the buzzword entropy even mean? According to Wikipedia, entropy is defined as follows: Information entropy is the average rate at which information is produced by a stochastic source of data. So what does that even mean?
Well, it means that it is a measurement of a bearers maximum capacity to encode information given a true random as input. When people say an encryption algorithm or hash function has a specific entropy, that might be true if the developer has managed to feed it "true random" data to begin.
But, in reality, true random is hard to archive by stupid computers that follow instructions. Most machines have an implementation to do this for us called “PRNGs” or pseudo-random number generators. These differ from what we call “CSPRNGs” or cryptographically secure pseudo-random number generators. They differ in the fact that they might be good enough to make a random string for a choice in a text-based web game but not random enough to secure your 100M dollars worth of bitcoins.
Never trust the operating systems random functions and make sure that you know where your random entropy comes from.
Lesson Number two: Nothing is better than the seed
Encryption is based on secrets and secrets should be hard to guess. In computers we have functions to generate random numbers for us. These can be used to create a secret (e.g. a private key). Since all keys are derived from a random seed, it is that random seed that you use to initiate your PRNG, which is really really important. A secret key can be seen as a seed for a signing algorithm and it is generated by another seed that is your PRNG that is generated by another seed that is “THE SEED” -- this seed is really hard to obtain with a high enough entropy. Badly seeded random number generators is one of the biggest risks when it comes to key creation in crypto application.
Another interesting point to note is there exists a common notion that the seed entropy can be extended by concatenating the seed with more data in order to get more random. This simply is just not true. In schemes such as bitcoins bip32 they propose to derive a number of child keys from a parent key by “extending the key with an extra 256 bits of entropy” and from that derive 2^31 number of child keys.
From that perspective, how much entropy per key do would we have left in the last key? While the scheme protects against key reuse and thereby nonce reuse it will in fact decrease from the entropy pool for every key used.
Lesson Number three: Do not reuse nonces
The word “nonce” can be defined as “coined for or used on one occasion.” In cryptography, nonces are often used as in initialization vector of an encryption algorithm. The idea of a nonce is just as the name implies - that it should never be reused. However, it is a common mistake in applications to either reuse nonces or at least disregard their importance. Developers that are not familiar with the inner workings of encryption algorithms sometimes miss the significance of the nonce. Since nonces are generally just an integer that is increased with one for each step, many developers fail to see the importance of this little digit.
These types of vulnerabilities are extra sensitive in cases where the same keys are reused multiple times (e.g. ECDSA as implemented in Bitcoin and Ethereum). In ECDSA, if you have a repeated nonce key that is used to sign two different messages, h1 and h2 with the same secret key, then the secret key is revealed.
This means that if you find someone that has used a nonce twice on the blockchain you can spend their money.
Lesson Number four: Do not share randomness between cryptographic functions and other functions
It is quite common especially in larger projects that you reuse code. Most of the time it is not the same developers that write the encryption layer that writes, for example, the network stack or the graphical user interface. I have found that quite often the network developers that need a random function take the approach of: “But wait, the crypto guys already have this shiny random number generator that is already instantiated I’ll use that for the node id generation of each packet we send.”
That is not the approach to take. Reusing sources of random opens up many classic side channel attacks where you can calculate the next number to be generated by analyzing, for example, network packets generated by the application.
Never use the same random number generator for cryptographic functions and other application functions
Lesson number five: Private keys should stay private
This tip may sound obvious but in auditing code for many applications I have found that this is one of the most common mistakes we find: passing keys as copies and not by reference. Let's say you invested heavily in a super duper secure key super random number generator that you made to generate your private key. This private key is then encrypted with a second layer on a disk.
All this protection of the keys is worthless if the application copies the key into ram in a hundred locations and sprays it all over the stack and the heap. Doing this means that an attacker could dump the memory of your phone, computer, node or device. I have seen wallet and node implementations that copy the keys several times for every signature.
This vulnerability is getting even worse by running the services in the cloud. There have been several vulnerabilities on cloud providers where you have been able to dump memory outside of the virtual machine both via hypervisor vulnerabilities as well as hardware design flaws.
Always treat your private keys as originals. Never copy them but always reference them.
Lesson number six: Do not copy and paste from stack overflow
Doing a code audit of a bitcoin wallet implemented in the programming language python I once came across the following line of code: let prng = random.seed(42);
If you’re like me and a lot of other nerds that have read The Hitchhiker’s Guide to the Galaxy by Douglas Adams, you would know that 42 can be defined as “the answer to the ultimate question of life, the universe, and everything.” The number 42 is therefore often used in cryptography code examples as a seed for two reasons: 1) it is repeatable and the keys or string of random will always be the same, and 2) it is a fun reference. As we now know how pseudo-random number generators work we can surmise that this is not a good idea.
Never copy and paste code and if you do make sure that you understand EVERY line fully.
Lesson number seven: Do not implement security by obscurity.
A while back I was developing an application that would be used for online authentication and was asked to “hide” the signing key for verification of the incoming transaction “inside the app” in a way that made it “hard to extract.” Given my background as a hacker and having friends with weird hobbies such as analyzing and reverse engineering malware, viruses and ransomware, I know that no matter how many times you xor, obfuscate, hide, encode or encrypt a secret in an app it can all be circumvented by a single breakpoint in the right place with a proper debugger or disassembler. If you want to learn how to do this I can really recommend https://securedorg.github.io/RE101/ by @malwareunicorn as a great introduction to the weird and wonderful hobby of reverse engineering assembly code.
The truth is we have asymmetric public key encryption, which eliminates the need for “secrets” encoded in the apps. If your app uses this I suggest that you head back to the drawing board and begin again.
No security by obscurity! Just don’t. It will bite you
Lesson number eight: Find out how deep the third party rabbit hole goes.
In rust they are called “crates,” in python “modules,” in java “packages,” but they are basically the same thing - random code written by someone else that we lazily import into our “secure crypto” application. We often assume that since the code is written by someone else it will be more secure than the code we write since you do not write a cryptography library unless you are really good at crypto, right? Wrong! Most people are really bad at crypto. Before you import that fast sexy chacha 256 implementation, read the code!
As developers we are inherently lazy. We love to reuse code and in today's ecosystem driven development, you quickly get a lot of dependencies. While Doing a line count “wc -l” on a current Rust project I am working on I counted a staggering 1 million lines of code while I had only written about five thousand lines of code. This is due to the plague of package managers.
If your app depends on ten external packages that in turn depends on ten external packages, that makes your footprint 100 packages. Unfortunately, one package might depend on 10 packages that depend on 10 packages that depend on 10 packages. That means your footprint is 1000 times your original code base. This gets even worse when one package depends on random version “0.1.0” and the next depends on on random version “0.2.0” then you might, in the end, depend on several versions of the same code with different versions and different problems.
Make sure you can map and trace all the security critical functions u use in your application and make sure to code audit them.
Lesson number nine: Never trust your neighbors
When I started with hacking, one of the first articles I read that really blew my mind was “Smashing the Stack for Fun and Profit” by @aleph1. It described how to crash c programs and hijack the CPU of the target computer and let it execute any code of your liking. This is possible because of buffer overflows that stem from poor input validation. Since that article was published, many similar attack vectors have been discovered that can be exploited by an attacker.
Most crypto applications are used to perform transactions and or to protects messages in transit, which means that they are communicating with third parties. These third parties not only have the opportunity to exploit vulnerabilities in the network stack of the application but also good motive. One of the most obvious motives would be to get access to the private keys!
It does not matter how random your keys are if there is a single unchecked buffer that can lead to remote code execution.
In most applications parsing of incoming messages and packets may be parsed correctly in the core application but the developers have depended on a third party package to implement some web service. Again does all the third party libraries perform good input validation?
To test against this many large companies and organizations employ what is called fuzz testing. a practice where input data is manipulated to find input validation errors.
Assume that every packet that comes from the network is trying to kill you.
Lesson 10: it takes one to know one
When we design apps we design them to work as we intended. The problem lies in that the attackers tend to use them as they are not intended. To find the potential vulnerabilities in a application we need to think like an attacker and perform some kind of threat modeling .
This means we need to think like the bad guys, so instead of asking - What would batman do? You should ask - what would the penguin/Joker do?
This exercise is important to get a clear vision of what information you are trying to protect against and how to circumvent the protection. Without the base model of what you are trying to protect against, it is easy to focus on the wrong thing or miss obvious security concerns.
This is by no means an extensive list but rather a guide outlining what to think about when writing secure crypto applications. But the biggest take away I hope you took from this post - to create secure crypto code you will actually have to understand secure cryptography and most importantly, random is hard!