Page 33 - FIGI - Big data, machine learning, consumer protection and privacy
P. 33

Figure 4 – Identification of Governor Weld from four attributes


























               Anonymization involves the elimination or trans-  providing a means of evaluating cumulative harm
            formation of the directly and indirectly identifying   over multiple uses.
            data. While pseudonymisation and de-identification   “[D]ifferentially private database mechanisms can
            involve procedures and technical, organizational and   make confidential data widely available for accurate
            legal controls to prevent employees and third parties   data analysis, without resorting to data clean rooms,
            (such as researchers) from re-identifying individuals,   data usage agreements, data protection plans, or
            anonymization – once achieved – does not require   restricted views.” Thus, it “addresses the paradox of
            such further measures. However, anonymization      learning nothing about an individual while learning
            reduces the utility of the data. The richer data is, the   useful information about a population.”
                                                                                                 153
            more useful it is.                                   Statistical disclosure control, inference control,
                                                               privacy-preserving data mining, and private data
            Improving the approaches to re-identification risk  analysis are other algorithmic techniques that may
            Technologies and  criteria  are  emerging  that  seek   be applied to large databases using statistical meth-
            to preserve the richness of data while reducing the   ods with a view to managing privacy.
            identifiability of individuals. For instance, “differen-  A market is growing in services for de-identifi-
            tial  privacy”  has  grown  in  popularity  since  Apple   cation,  pseudonymization and  anonymization.  For
            announced that it uses it to anonymise user data.    instance, German company KIProtect  enables firms
                                                         151
                                                                                               154
            Differential  privacy  makes  it  possible  to  measure   working with large datasets to secure the data, inte-
            the quality of data anonymization. It quantifies   grating over APIs with the client firm’s data process-
            how much information the anonymization method      ing to detect and protect private or sensitive data by
            will leak about a given individual being added to a   transforming the data using pseudonymization, ano-
            dataset using that method. It works with the trade-  nymization and encryption techniques. The ability to
            offs between utility and convenience, introducing   support many data types and storage technologies
            random noise to eliminate the difference between   (e.g., Apache Kafka and Google Firebase) allows use
            what is revealed about an individual whose data is   in a wide range of settings. The increasing availability
            included in big data analysis and one who opts out.    of such service providers means that firms processing
                                                         152
               Where the number of individuals involved is high   data can outsource key parts of their privacy needs,
            enough, while the slightly biased statistical noise   reducing the burden of building their own in-house
            masks individuals’ data, the noise averages out over   privacy capability which is not their key business.
            large numbers of data points, allowing patterns to be   De-identification, pseudonymization and ano-
            detected and meaningful information to be emerge.   nymization methodologies may not merely require
            This enables better discussion and decisions about   to be included in the coding of dataset manage-
            trade-offs between privacy and statistical utility by   ment, but also in administrative organization. Thus,
                                                               Apple performs differential privacy on user data on



                                                             Big data, machine learning, consumer protection and privacy  31
   28   29   30   31   32   33   34   35   36   37   38