GDPR and its technical challenges: Part one
Want to read this article later?
Just tap MyLCN+ to save it to your account
It has been five months since the EU General Data Protection Regulation (GDPR) was brought into force, but many companies are still struggling to make action plans for data subject access requests and data erasure requests. There has been a lot written about the imminent need for enterprises to make their data processing clear to users and prepare for data access requests. However, because the European Commission has not made it clear how the GDPR would be enforced, issues on enforcement remain uncertain and risks to implementing the regulation are unresolved. Inspired by a data privacy workshop for computer scientists, this post will focus on the technical challenges and implications of complying with GDPR from a company developer’s perspective, rather than that of a private citizen.
Data mapping refers to the process of identifying what and how data is collected, for what purposes and from whom such data is collected. Article 30 of the GDPR states that each data controller should maintain a record of processing activities. For companies which do not know what specific data is collected and do not keep updated documentation on how their products are built, a data mapping exercise is the first step to making a GDPR-compliant action plan and complying with Article 30.
Mapping old data may sound simple for small organisations where the codebase is small or data collection processes are done on paper. Developers will only need to trace the getter functions in a few modules or packages in the codebase to locate where data is collected, analysed and reused. Product managers may simply check data stores such as databases to get a high-level picture of what data is collected permanently. Challenges arise when data is also collected by a third-party application programming interface or when the codebase is large. The same set of data could have been used in various functions of a web application. Take your current location for example. It can be used so that Facebook search gives results relevant to your location, news feed can prioritise posts that are made by friends in the same area and the ads feature would slip in ads of restaurants near your location. With so many services, how do you know if data has only passed through a certain set of services and not all? In a distribued system where processes are run concurrently, how should data be erased or accessed within a short amount of time? Also, what happens if data is sent to third parties? What if personal data to be sent for processing is selected at random? These are questions to be answered before a user, or in data protection terms, data subject submits a data access request form or a ‘right to be forgotten’ form.
Security for data groups
Data is usually categorised, tagged or grouped together for faster processing. However, this creates a risk as exposing one person’s data may also expose another person’s data. Using the Cambridge Analytica scandal as an example, not only did the direct victims have their data accessed, but their friends, friends of friends and so on had their names exposed. This is because each person has a ‘list of friends’ entity. With the entity in hand, one can propagate and do manual searches to find friends of friends. As a result, data classification needs to be carefully designed, and there should be appropriate access controls for certain types of data.
These are only some of the issues and questions we have on GDPR. Companies are likely to write programs to automate the data extraction process. What happens when artificial intelligence meets personal data? How can developers find the set of data that is under GDPR jurisdiction at specific periods of time? Can deductions be sensitive data? I will explore some of these issues in part two of this blog.