-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differentiating between privacy-sensitive and anonymous data #49
Comments
Read through the PEP paper, this is based on a new encryption algorithm. It would be a complete infrastructure effort to implement this. The issue remains: everything that does not have to be shown in the dashboard or retrieved using the API, could be encrypted. This would involve especially privacy-sensitive stuff. In all cases, we would have to trust the researchers to handle their private keys with care though... Another note, for example Tresorit has a nice key exchange algorithm as well, and does not encrypt the data itself with those keys, but instead it encrypts the encryption keys for the data. That makes the data encryption less heavy, plus no re-encryption is needed if keys change. In any case, we'd have properly follow their protocol (or another well-documented protocol) to avoid any of the pitfalls in encryption. |
If we encrypt data using a key that is unknown to the Platform, we cannot apply any analysis on this data. What kind of data would you encrypt with this method? |
Exactly, that's the point. So for example absolute locations, IP addresses or unprocessed voice data are privacy sensitive. However, we could choose to store them in encrypted way. The platform would not be able to read or process it, but just provide it as-is in the full data extracted from HDFS. Less sensitive data, such as battery levels, we'd send unencrypted so our platform can process it. We could decide on a stream-per-stream basis whether we want the data encrypted or unencrypted. Also, we could choose to leave the keys always unencrypted (anonymous patient ID), but just encrypt the values. |
Another alternative is to do the data processing on another "trusted" host, where we would provide the decryption key as well. Right now, I don't think we have the budget + motivation to have this additional infrastructure cost though. |
The vast majority of collected variables are privacy sensitive (HR, Acc, ecc.. ). We can absolutely design something to provide also this functionality, but we should bring WP8 up in the discussion or wait a clear need/requirement. |
As long as the HR and Acc is not coupled to a specific person, I'd consider them anonymised data, which would be fine to process if we don't know the identity. However, something like absolute location can be used to find someones home and then identity. Likewise with voice recognition and IP address. |
We could consider to process privacy-sensitive data, but to encrypt it before sending it to the server. Keys to the encryption could be provided to researchers that are allowed to access those data (for example, using PEP). For non-sensitive data, we could send the data in plain text as we do now, so that the Kafka streams can aggregate it properly. Using the PEP mechanism, those data could also be encrypted, but the Kafka streams could get a key for only those data.
The text was updated successfully, but these errors were encountered: