Skip to content

Privacy

Linda edited this page Aug 4, 2023 · 8 revisions

"Let(')s audit Learning Analytics" (LaLA) does not store any personal information that could directly or indirectly be used to identify a person. This is necessary due to the following reasons. First, if the plugin would use personal information, to comply with the GDPR, the plugin provider would need to get the data subjects' permission to collect and analyze their data, and to share it with third-party auditors. Getting meaningful consent adds complexity and time to an audit. Secondly, according to the GDPR, data subjects would have the right to request the deletion of their data, which would lead to the evidence being changed after the collection. However, for a reproducible and reliable audit, evidence should be immutable.
This relates directly to reason three: If the plugin would store personal information, the evidence could not be provided for download since the plugin provider can not make sure that downloaded data is updated as well following a data deletion request. Finally, evidence that contains identifying information could not be shared freely, hindering the transparency of an audit.

The following measures have been taken to anonymize the data before storing it as files on the Moodle server and returning these to the auditor:

  • All ids in all the evidence are pseudonymized. The internal objects that map the original ids to their pseudonyms are not persisted after the model version creation finishes. When pseudonymizing, care is taken that the range of pseudonyms and order of original ID to pseudonym mappings do not hint at the identity.
  • The dataset evidence collection requires at least three entities if the audited model is one that is marked as processing user data.
  • Critical columns that directly contain identifying information or other information that should not be shared are filtered out (username, firstname, lastname, middlename, firstnamephonetic, lastnamephonetic, alternatename, email, phone1, phone2, address, ip, lastip, secret, password, moodlenetprofile, imagealt, picture).
  • Text type columns are filtered out since they could contain text which could contain identifying information.
  • Data from user-related tables (tables with names that contain the word user) is only stored, if it consists of at least three distinct entities.
  • Data from tables that reference user-related tables (tables that have a column ending in userid) is only stored if at least three distinct entities are referenced.
  • The rows of the dataset and related data evidence are shuffled so that the order does not hint at the identity.

🛠️ In order to more reliably determine columns that contain identifying information while reducing the loss of potentially valuable and actually harmless information, a future version of LaLA will feature more sophisticated anonymization algorithms such as ℓ-diversity.

The implementation of these features is described in more detail at Architecture.

Clone this wiki locally