Information may be a valuable asset. Do you remember the movie "1917" where Lance Corporal William Schofield takes an incredibly dangerous journey through the front to deliver a message that helps saving thousands of lives? Or, in present context, if your company would be the first one to figure out the correct way to produce COVID-19 vaccine, this information would be of high value, indeed.
Whenever there is value, there is also someone interested in it. If Germans would have known the implications of Schofield's message, they could have set up a targeted attack against him. When you are close to producing a COVID-19 vaccine, you should be very aware of industrial espionage.
When there is a threat, it is better to be prepared so that potential attacks do not take you by surprise. Is there anything we can do to prevent our valuable information from being harmed? Luckily, yes. General Erinmore did not send Lance Corporal William Schofield to deliver the message alone, but together with Tom Blake. This precaution paid off, as the latter was killed during the mission. In modern terms, we would classify it as an organisational measure to prevent information loss.
The case of industrial espionage against COVID-19 vaccine candidates is a little bit different. The threat here is not that much of information loss, but rather copying it without the knowledge of the legitimate owner. In the worst case, the owner would not even notice this before competing vaccine hits the market, as he would still have the original version of the information.
What could we do to prevent information from being stolen? Sure, we can keep it under lock and key, in a server that has never been attached to the Internet, etc. But the thing is - someone needs access to this information for processing, otherwise gathering it would be useless in the first place. To do processing, in turn, you need some sort of a computer running some sort of a software on it. To prevent information loss, you probably also want to make data backups off-site.
Already these basic scenarios show potentially vulnerable points. Software may contain vulnerabilities allowing malware to manipulate your data. At the same time, off-site backup storage may suffer from simple theft. Can we prevent such a theft? Not really, but we can do something almost equally good -we can make the data on the backup medium illegible for a potential thief. The trick here is to do it efficiently.
This is the first example of a magic property that we may achieve with the help of some clever math (called cryptography). Namely, it is possible to select a few hundred bits (called the key) and use them to scramble (encrypt) our sensitive data so that it is undistinguishable from random noise for anyone who does not have access to the key. With the help of the key, on the other hand, all the original data can be restored (decrypted) very fast.
Why is it important to keep the key small? The reason is simple - this key also has to be stored somewhere. Smaller keys are easier to protect in, say, chip cards, or by other cryptographic mechanisms. It is important to notice, though, that encrypting the data does not solve all our problems (in this case data confidentiality), but rather converts them to others (in this case, the problem of key management).
Now, imagine a scenario that you need to make the backup of your enormous amounts of encrypted data available at your company's brand-new field office in the Philippines. Pushing a few hundred gigabytes of scrambled files through the Internet is not a problem these days, but the real difficulty is getting those few hundred bits of the key across. You cannot send them along with the encrypted files, because anyone who has access to the data transmission line can take the key and decrypt all your secrets.
Hence, another mechanism is needed for sending the encryption key. This is where the second instance of magic happens. The above-described usage of keys is known as symmetric encryption, meaning that you need the same key to both encrypt and decrypt messages. Turns out that this is not the only option. It is possible to have two mathematically related keys, one of which can be made public (e.g. sent over a public network) and used for encryption, whereas the second one is used only privately for decryption. This kind of asymmetric cryptography is considerably less efficient compared to its symmetric sibling, but it can still be used to send short messages, say, symmetric keys.
The whole hybrid protocol for sending encrypted data backup to the field office in the Philippines could be as follows.
- The office manager in the Philippines generates the asymmetric public and private key pair. He keeps the private part to himself and sends the public key to you.
- You encrypt your data with a symmetric key, and the symmetric key with the public key you got from the office manager.
- You send both cryptograms to the Philippines.
- The office manager in the Philippines uses his private key to first decrypt the symmetric key, and then the symmetric key to decrypt the actual data.
Have we now solved all the problems of sensitive data transmission and key management? Of course, not. The weakest point from the four above is the first one. How do you know that the public key really comes from the field office in the Philippines? Just imagine that an attacker manages to generate a key pair himself and convince you to run the above protocol, while having the private key himself. He would be capable of decrypting your data! Thus, we still need to ensure the authenticity of the public key.
How do we do that? By a third act of crypto magic, of course. Note that the above public key encryption scheme can also be inverted - the private key can be used for encryption and the public key for decryption. With some extra care we can get something pretty similar to a signature - only the private key owner can create a cryptogram (signature), whereas anyone can check its validity.
Assuming there is a global authority we trust, we can ask it to sign the Philippines’ office manager's public key together with his identity, and obtain a so-called public key certificate. And this is basically how identification works in the present-day Internet. Every actor (webpage, service provider, physical person, etc.) can generate its own signature key, get it certified, and distribute the certificate. This system (called Public Key Infrastructure) has its weaknesses, too, but for many of our everyday needs it more or less works. And all of this is thanks to a lot of crypto magic!
Written by Jan Willemson