Page 31 - FIGI - Big data, machine learning, consumer protection and privacy
P. 31
sold at a discounted price to students and old age bearing on creditworthiness. For example, a person
pensioners. It can, however, also result in perceived with a healthy salary and little debt may be treated
unfairness, where some population groups are target- adversely as a result of living in a community (or hav-
ed to pay higher prices based on their profile result- ing social media friends, or the same medical doctor,
ing from geographic location or other attributes. 141 or shopped at discount stores) where people have
In financial services, the focus of differential pric- historically higher debt-to-income ratios. Machine
ing relates primarily to a consumer’s risk profile. Pric- learning models are thus among other trends in auto-
ing based on risk can improve economic efficiency mation of economic processes that may increase
by discouraging behaviour that is risky, rewarding inequality over time.
142
individuals with no history of engaging in unlawful
activities such as traffic accidents. It can improve 4�3 Protecting consumers in the event of data
access to insurance by reducing adverse selection, breach and re-identification
when only individuals with a high-risk profile will The vast amounts of data held by and transferred
enrol at a uniform price. However, differential pricing among big data players creates risks of data secu-
of insurance products can result in unfairness where rity breach, and thus risk to consumer privacy. Even
risk factors arise beyond an individual’s control, e.g., when the amount of data held on an individual is
in health insurance. kept to a minimum, their identity may be uncov-
Big data may engage in differential pricing by ered through reverse-engineering from even a small
drawing inferences from personal data about an indi- number of data points, risking violation of their
vidual’s need for the service, and his or her capacity privacy. The risk of this occurring arises where
143
to pay and price sensitivity. The machine may esti- the data may be obtained by third parties, whether
mate a price as near as possible to the maximum through unauthorised access through a data breach
amount the profiled consumer may be willing to pay. or by transfer of the data to a third party with the
Due to an asymmetry of information, the consumer agreement with the firm controlling or processing
does not know enough about the provider to negoti- the data. In both cases, measures to protect the
ate the price down to the minimum amount the pro- release of data about identifiable individuals include
vider would be willing to accept (e.g., for it to achieve de-identification, pseudonymisation and anonymis-
a reasonable return on investment). ation. Such measures and the challenges that they
In a dynamic market, competition would be face in the context of big data are discussed in this
expected to impose downward pressure on the pro- section 6.3. Section 1.1 discusses the role and regula-
vider’s price, driving it downward towards its costs. tion of third-party intermediaries who acquire data
However, policy concerns arise where differential by agreement in the data market.
pricing disadvantages persons who are already dis-
advantaged. An individual may be more desperate The limits of de-identification, pseudonymisation
for a financial service, and thus be willing to pay a and anonymisation
higher price. A lender may be able to charge a higher Personal privacy may be protected in varying degrees
price that does not so much reflect the higher risk of by using privacy enhancing technologies (PETs)
144
default as the borrower’s urgency. This may preju- such as de-identification, which involves suppressing
dice low income individuals and families. or adding noise to directly identifying and indirect-
Differential pricing can also become discrimina- ly identifying information in a dataset, or otherwise
tory where prices are set according to criteria that, introducing barriers (making it statistically unlikely)
while seemingly objective, result in adverse treat- to identifying a person: 145
ment of protected groups. For instance, if an algo-
rithm sets higher prices for consumers with a post- • Directly identifying data identifies a person with-
code from a neighbourhood that has historically had out additional information or by linking to infor-
higher levels of default than those from other neigh- mation in the public domain (e.g., a person’s name,
bourhoods, individuals who do not themselves have telephone number, email address, photograph,
other attributes to suggest a higher risk may face social security number, or biometric identifiers).
higher prices. • Indirectly identifying data includes attributes that
Certain historically disadvantaged population can be used to identify a person, such as age,
groups share particular attributes (such as a post- location and unique personal characteristics.
code). Individuals with those attributes may there-
by suffer discrimination even if they do not have a
Big data, machine learning, consumer protection and privacy 29