Page 25 - FIGI - Big data, machine learning, consumer protection and privacy
P. 25

4  THE ENGAGEMENT PHASE: CONSUMER PROTECTION AND PRIVACY IN THE OPERATION OF
                AI-DRIVEN SERVICES

            This section discusses engagement: the consumer’s   that personal data in their databases is correct and
            experience with big data and machine learning, and   updated for the purposes for which it was gathered. 104
            conversely the collection, use, storage and transfer of   This raises the question about the accuracy of data
            the consumer’s data by big data and machine learn-  in the wider data ecosystem, and the extent to which
            ing firms. Sections 6.1 and 6.2 consider consumer   firms should be held responsible for inaccuracy or to
            concerns and legal issues that arise from the substan-  contribute to accurate information more broadly.
            tive results of the data processing, in particular
            responsibility for accuracy and biased decision-mak-  Responsibility for data accuracy in financial
            ing. Section 6.3 considers protections for consumers   services
            against the risk of the release of their data through   Sector-specific laws governing financial services
            data  breach  and  re-identification,  focusing  on  the   often emphasize the  importance  of  ensuring  accu-
            techniques of de-identification, pseudonymisation   racy of data used for financial services. Data used
            and anonymisation. Section 1.1 turns to the risks to   for  credit scoring  is an example.   Credit  report-
                                                                                             105
            consumers that arise through transfers of data in the   ing bureaus are typically subject to regulation and
            vibrant data broker market, and increased regulation   strong internal controls to ensure accuracy of the
            of this market segment.                            data they hold on individuals. Such credit reporting
                                                               systems reduce the costs of lending by reducing risk
            4�1  Accuracy – protecting consumers from errone-  (and thus loan default losses, provisioning for bad
            ous and outdated data                              debt, and need for collateral) inherent in information
                                                               asymmetries between lenders and borrowers. They
            Accuracy of data inputs                            provide lenders with information to evaluate borrow-
            The successful functioning of machine learning     ers, allowing greater access to financial services.
                                                                                                           106
            models and accuracy of their outputs depends on the   Because of the importance of their data in credit
            accuracy of the input data. Some of the vast volumes   and other decision-making, credit reference bureaus
            of data used to train the system may be “structured”   provide individuals with a means of correcting inac-
            (organized and readily searchable) and some may be   curate information.
            “unstructured.”  The data may have been obtained     However, this formal information system is now
                          103
            in different ways over time from a variety of sources,   only part of a wider data-rich environment, most of
            some more and some less directly. The wider the    which is not regulated. The advent of big data and
                                            104
            net of data that is collected, the greater the chances   machine learning poses a risk that existing legisla-
            are that data will be out of date and that systematic   tion and policy guidance does not keep up with the
            updating processes are not applied. Historical data   data-rich environment. For instance, the first princi-
            may have even been incorrect from the start.       ple of the World Bank’s General Principles on Credit
               These factors may result in questionable accuracy   Reporting (GPCR), published in 2011 , is that “cred-
                                                                                               107
            of data inputs to the algorithms. This may be true   it reporting systems should have relevant, accurate,
            both for the personal data about the individual who   timely and sufficient data – including positive – col-
            is the subject of an automated decision (to which the   lected on a systematic basis from all reliable, appro-
            machine learning model is applied), as well as for the   priate and available sources, and should retain this
            wider pool of data used to train the machine. If the   information for a sufficient amount of time.”
            training data is inaccurate, the model will not func-  Questions arise about how exactly this sort of pol-
            tion to produce the intended outputs when applied   icy guidance should apply today – just eight years
            to the individual’s personal data. All of these prob-  later – to information about individuals supplied and
            lems may give rise to erroneous inferences about the   collected  for  purposes  that  may  not  initially  have
            consumer.                                          related to making credit decisions. Big data and
               Data protection and privacy laws thus increasing-  machine learning may collect and use data that var-
            ly set some form of legal responsibility on firms to   ies greatly in its relevance, accuracy and timeliness.
            ensure the accuracy of the data they hold and pro-   These challenges apply also to laws that were
            cess. Mexico’s data protection legislation applies a   written before the advent of big data and machine
            quality principle requiring data controllers to verify   learning and even the internet itself. Firms that do
                                                               not consider themselves to be credit reference



                                                             Big data, machine learning, consumer protection and privacy  23
   20   21   22   23   24   25   26   27   28   29   30