Page 76 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 76

ITU Journal: ICT Discoveries, Vol. 3(1), June 2020



                                                                                                  - Neural Data
                                                                                                  - Prediction
                                    Cloud ML                                Embedded ML
                                                                                                  - Train/Test Data

                         Client                                        Client
                                                                        Client
                          Client
                         Clients  communication  Server                 Clients  communication  Server
                                           - collects data            - perform inference  - organizes
                                           - performs inference       - perform training  - orchestrates
                                           - performs training
          Fig. 1 – Comparison between the two paradigms for machine learning from distributed data. In Cloud ML, data from users is collected and
          processed by a centralized service provider. In Embedded ML, data never leaves the user device. To perform inference and collaborative
          training, neural network parametrizations are communicated and data is processed locally.


          autonomously and are not allowed to depend on slow   being communicated.  In this section we will review
          and unreliable connections to a cloud server. For in-  the three most important settings, namely on-device
          stance, in a self-driving car, intelligence responsible for  inference, federated learning and peer-to-peer learning.
          making driving decisions needs to be available at all  These settings differ with respect to their communica-
          times and thus has to be present on the device.      tion topology, frequency of communication and network
                                                               constraints. We will also review distributed training in
          As awareness for these issues increases and mobile and  the data center, as many methods for neural data com-
          IoT devices are getting equipped with ever more potent  pression have been proposed in this domain. Figure 2
          hardware, a new paradigm, which we term ”Embedded    illustrates the flow of (neural) data in these different
          ML”, arises with the goal to keep data on device and  settings. Table 1 summarizes the communication char-
                                                               acteristics of the different distributed ML pipelines in
                     ”Bring the model to the data.”
                                                               further detail and gives an overview on popular com-
          Multi-party machine learning workflows that follow this  pression techniques in the respective applications.
          paradigm all have one principle in common: In order to
          avoid the shortcomings of Cloud ML and achieve data  2.1 On-device Inference
          locality they communicate neural network parametriza-
          tions (”neural data”) instead of raw data. This may in-  Inference is the act of using a statistical model (e.g.
          clude not only trained neural network models, but also  a trained neural network) to make predictions on new
          model updates and model gradients.                   data. While cloud-based inference solutions can cer-
          Since neural networks are typically very large, contain-  tainly offer a variety of benefits, there still exists a wide
          ing millions to billions of parameters [60], and mobile  range of applications that require quick, autonomous
          connections are slow, unreliable and costly, the commu-  and failure-proof decision making, which can only be
          nication of neural data is typically one of the main bot-  offered by on-device intelligence solutions.
          tlenecks in applications of Embedded ML. As a result,  For instance, in a self-driving car, intelligence responsi-
          recently a vast amount of research has been conducted  ble for making driving decisions needs to be available at
          which aims to reduce the size of neural network repre-  all times and thus has to be present on-device. At the
          sentations, and a wide range of domain specific com-  same time, the models used for inference might be con-
          pression methods have been proposed.                 tinuously improving as new training data becomes avail-
          In this work, we provide an overview on machine learn-  able and thus need to be frequently communicated from
          ing workflows which follow the Embedded ML paradigm  the compute node to a potentially very large number of
          through the unified lens of communication efficiency.  user devices. Since typical modern DNNs consists of ex-
          We describe properties of the ”neural data” communi-  orbitant numbers of parameters this constant streaming
          cated in Embedded ML and systematically review the   of models can impose a high burden on the communica-
          current state of research in neural data compression. Fi-  tion channel, potentially resulting in prohibitive delays
          nally, we also enumerate important related challenges,  and energy spendings.
          which need to be considered when designing efficient  Compression for On-Device Inference: The field of
          communication schemes for Embedded ML applications.  neural network compression has set out to mitigate this
                                                               problem by reducing the size of trained neural network
          2. SURVEY ON NEURAL NETWORK                          representations. The goal in this setting is typically to
              COMMUNICATION                                    find a compressed neural network representation with
                                                               minimal bit-size, which achieves the same or comparable
          We currently witness the emergence of a variety of ap-  performance as the uncompressed representation. To
          plications of Embedded ML, where neural networks are  achieve this end, a large variety of methods have been





          54                                 © International Telecommunication Union, 2020
   71   72   73   74   75   76   77   78   79   80   81