Anonymisation 2.0: Sharemind as a Tool for De-Identifying Personal Data - Part 2: Sharemind and anonymisation

In this two-part blog post, we are answering an often-asked question - is Sharemind anonymisation? Or is it something better? Is this comparison actually valid? Triin and Dan combine their legal and technical know-how to tell you more. If you missed the first part, you'll find it here.

Part 2: How does Sharemind work with anonymisation?

Sharemind is a platform for privacy-enhancing data analytics. Depending on its set-up and configuration, there are many ways in which Sharemind can be used to analyse de-identified information. When implemented in its maximum privacy mode, Sharemind enables anonymised processing of personal data. How does Sharemind achieve that?

Anonymous data vs anonymous processing

In part one of this blog post, we acknowledged that there are two well-known techniques to anonymisation - noise addition at the input level (anonymised database) and at the output level (anonymised query result). It is less known that in addition to anonymous databases and anonymous query results, anonymisation can also be achieved by means of anonymous processing. In that case, there is no suppression or noise needed - the underlying data remains intact and the anonymisation principle is applied at the processing level, not only to the data.

There's a common technology we can use as an analogy. Secure channels on the internet provide end-to-end confidentiality and integrity. TLS (stands for Transport Layer Security) is a popular standard for such communications. When done properly, the content of the data exchanged through a secure channel can not be manipulated and data subjects cannot be identified. The sender can be sure that only the intended recipient can read the messages. However, secure communication is static - the data cannot be modified.

What if we could go a step further and also process data with end-to-end encryption?

Quick recap of Sharemind

Let's get a quick reminder of what Sharemind does.

Data owners encrypt the data and provide it to Sharemind without giving the Sharemind host access to the decryption key. This takes the reidentification capability out of the hands of the host. The unique selling point of Sharemind is the transformation of encrypted inputs into encrypted results without making the data available to Sharemind. For details on how this is done, look at the product pages.

This is true end-to-end security. From data owners to users with no middlemen seeing the values. Think of it like TLS for analytics. Or we could say that Sharemind provides PLS - Process Layer Security.

Or we could say that Sharemind provides PLS - Process Layer Security.

Furthermore, Sharemind provides remote audit and control capabilities that the Sharemind host cannot turn off. This is great for enforcing privacy policies and ensuring that only legitimate processing takes place.

Is Sharemind anonymisation?

Yes and no.

From a regulatory standpoint, Sharemind provides anonymisation guarantees (for example, in the meaning of the GDPR). Read more about why this is the case in part one of this blog post. Sharemind's use of encryption technology achieves de-identification throughout the data flow.

From a technical standpoint, Sharemind has properties that other anonymisation technologies cannot achieve. Let's look at the same service provider example we had in part one of this blog post.

First, assume that a service is built with the Sharemind secure application servers. In that case, the service provider will not have access to the data at all. Re-identification will be nearly impossible, yet linking, aggregation, statistical analysis, AI and other functions will be possible. From a security analysis standpoint, the main channel for re-identification is the exploitation of side channels. In applications where that risk is realistic, special care should be dedicated to countering side channel attacks during application preparation.

For the data user, our approach of choice is to apply minimisation. That is, to show the user the absolute minimum amount of data to deliver the value from the data. This requires careful analysis during application preparation and a change in the way data analysts are used to working. But the prize is that the results will be accurate, with no added noise that noise-based anonymisation techniques would require.

However, Sharemind is also compatible with other anonymisation techniques, for example, differential privacy. In this case, the service provider will build anonymisation into the Sharemind application so that anonymised results are calculated just as normal ones would be. The difference is that instead of minimisation and accuracy, the results will be less limited, but with noise added.

A comparison of all approaches

The below table compares all four solutions described in this two-part post in a single table. If a cell is green, it means that the quality is preferable to the respective role. Red cells mean a risk to security or utility. A blue cell means that the risk is dependent on the application, not only the Privacy Enhancing Technology.

Key property

Anonymisation at the service provider

Anonymisation at the data owner

Sharemind - anonymised processing with minimisation

Sharemind - anonymised processing with anonymised results

What does the data owner do to protect its data?

Nothing

Adds noise to data, reducing accuracy

Encrypt the data

Encrypt the data

What does the service provider do to protect the data?

Adds noise to results, reducing accuracy

Can add further noise, but decreases accuracy further

Applies secure computing technology to compute encrypted results from encrypted inputs without removing the protection

Applies secure computing technology to compute encrypted results from encrypted inputs without removing the protection, then add anonymisation to results

Are there restrictions to data utility for the service provider?

No restrictions

Depending on the anonymisation technique, certain processing might be impossible

No restrictions

No restrictions

Is resulting data accurate?

No

No

Yes

No

Can the service provider identify data records?

Yes

Maybe, with auxiliary data

No

No

Can the users identify data records?

Maybe, with auxiliary data

Maybe, with auxiliary data

Depends on the extent of minimisation

Maybe, with auxiliary data

Can regulators or data owners remotely audit and/or control processing?

Have to trust service provider to behave as agreed

Have to trust service provider to behave as agreed

Can apply machine-enforced privacy policies, also remotely

Can apply machine-enforced privacy policies, also remotely

Conclusion

The goal of this two-part blog post is to answer the popular question on how is Sharemind related to the concept of anonymisation and anonymisation technologies.

While Sharemind does not perform anonymisation according to the popular definitions, it may well be offering the best possible anonymisation in the meaning of the law. This is because Sharemind helps to lower the risk of identifying a person by any data processor to the minimum, while maintaining the accuracy of the underlying data and enabling making adequate conclusions from it.

When building your data-driven service, pick the best anonymisation tools based on what is the value gained from re-identifying the data. In order to get the most accurate results, we suggest anonymous processing with Sharemind and minimisation of query interfaces. If minimisation seems hard, then anonymous processing with Sharemind and result anonymisation with randomisation is another great option.