Page 76 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media

P. 76

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020

- Neural Data
- Prediction
Cloud ML Embedded ML
- Train/Test Data

Client Client
Client
Client
Clients communication Server Clients communication Server
- collects data - perform inference - organizes
- performs inference - perform training - orchestrates
- performs training
Fig. 1 – Comparison between the two paradigms for machine learning from distributed data. In Cloud ML, data from users is collected and
processed by a centralized service provider. In Embedded ML, data never leaves the user device. To perform inference and collaborative
training, neural network parametrizations are communicated and data is processed locally.

autonomously and are not allowed to depend on slow being communicated. In this section we will review
and unreliable connections to a cloud server. For in- the three most important settings, namely on-device
stance, in a self-driving car, intelligence responsible for inference, federated learning and peer-to-peer learning.
making driving decisions needs to be available at all These settings differ with respect to their communica-
times and thus has to be present on the device. tion topology, frequency of communication and network
constraints. We will also review distributed training in
As awareness for these issues increases and mobile and the data center, as many methods for neural data com-
IoT devices are getting equipped with ever more potent pression have been proposed in this domain. Figure 2
hardware, a new paradigm, which we term ”Embedded illustrates the flow of (neural) data in these different
ML”, arises with the goal to keep data on device and settings. Table 1 summarizes the communication char-
acteristics of the different distributed ML pipelines in
”Bring the model to the data.”
further detail and gives an overview on popular com-
Multi-party machine learning workflows that follow this pression techniques in the respective applications.
paradigm all have one principle in common: In order to
avoid the shortcomings of Cloud ML and achieve data 2.1 On-device Inference
locality they communicate neural network parametriza-
tions (”neural data”) instead of raw data. This may in- Inference is the act of using a statistical model (e.g.
clude not only trained neural network models, but also a trained neural network) to make predictions on new
model updates and model gradients. data. While cloud-based inference solutions can cer-
Since neural networks are typically very large, contain- tainly offer a variety of benefits, there still exists a wide
ing millions to billions of parameters [60], and mobile range of applications that require quick, autonomous
connections are slow, unreliable and costly, the commu- and failure-proof decision making, which can only be
nication of neural data is typically one of the main bot- offered by on-device intelligence solutions.
tlenecks in applications of Embedded ML. As a result, For instance, in a self-driving car, intelligence responsi-
recently a vast amount of research has been conducted ble for making driving decisions needs to be available at
which aims to reduce the size of neural network repre- all times and thus has to be present on-device. At the
sentations, and a wide range of domain specific com- same time, the models used for inference might be con-
pression methods have been proposed. tinuously improving as new training data becomes avail-
In this work, we provide an overview on machine learn- able and thus need to be frequently communicated from
ing workflows which follow the Embedded ML paradigm the compute node to a potentially very large number of
through the unified lens of communication efficiency. user devices. Since typical modern DNNs consists of ex-
We describe properties of the ”neural data” communi- orbitant numbers of parameters this constant streaming
cated in Embedded ML and systematically review the of models can impose a high burden on the communica-
current state of research in neural data compression. Fi- tion channel, potentially resulting in prohibitive delays
nally, we also enumerate important related challenges, and energy spendings.
which need to be considered when designing efficient Compression for On-Device Inference: The field of
communication schemes for Embedded ML applications. neural network compression has set out to mitigate this
problem by reducing the size of trained neural network
2. SURVEY ON NEURAL NETWORK representations. The goal in this setting is typically to
COMMUNICATION find a compressed neural network representation with
minimal bit-size, which achieves the same or comparable
We currently witness the emergence of a variety of ap- performance as the uncompressed representation. To
plications of Embedded ML, where neural networks are achieve this end, a large variety of methods have been

71 72 73 74 75 76 77 78 79 80 81