Page 81 - ITU Journal, ICT Discoveries, Volume 3, No. 1, June 2020 Special issue: The future of video and immersive media
P. 81
ITU Journal: ICT Discoveries, Vol. 3(1), June 2020
needs to be more thoroughly investigated. For instance, tings in which deep neural networks are communicated
it has been shown that it is possible for an adversary to and has discussed the respective proposed compression
introduce hidden functionality into the jointly trained methods and methodological challenges. Our holistic
model [5] or disturb the training process [16]. Detecting view has revealed that these four seemingly different
these adversarial behaviors becomes much more difficult and independently developing fields of research have a
under privacy constraints. Future methods for data- lot in common. We therefore believe that these settings
local training will have to jointly address the issues of should be considered in conjunction in the future.
efficiency, privacy and robustness.
Synchrony: In most distributed learning schemes of REFERENCES
Embedded ML, communication takes place at regular
time intervals such that the state of the system can al- [1] M. S. H. Abad, E. Ozfatura, D. Gunduz, and
ways be uniquely determined [13]. This has the ben- O. Ercetin. Hierarchical federated learning across
efit that it severely simplifies the theoretical analysis heterogeneous cellular networks. arXiv preprint
of the properties of the distributed learning system. arXiv:1909.02362, 2019.
However synchronous schemes may suffer dramatically [2] A. F. Aji and K. Heafield. Sparse communication
from delayed computation in the presence of slow work- for distributed gradient descent. arXiv preprint
ers (stragglers). While countermeasures against strag- arXiv:1704.05021, 2017.
glers can usually be taken (e.g. by restricting the maxi-
mum computation time per worker), in some situations [3] D. Alistarh, D. Grubic, J. Li, R. Tomioka, and
it might still be beneficial to adopt an asynchronous M. Vojnovic. Qsgd: Communication-efficient sgd
training strategy (e.g. [54]), where parameter updates via gradient quantization and encoding. In Ad-
are applied to the central model directly after they ar- vances in Neural Information Processing Systems,
rive at the server. This approach avoids delays when the pages 1707–1718, 2017.
time required by workers to compute parameter updates [4] S. Bach, A. Binder, G. Montavon, F. Klauschen,
varies heavily. The absence of a central state however K.-R. Müller, and W. Samek. On pixel-wise
makes convergence analysis far more challenging (al- explanations for non-linear classifier decisions by
though convergence guarantees can still be given [21]) layer-wise relevance propagation. PLoS ONE,
and may cause model updates to become ”stale” [88]. 10(7):e0130140, 2015.
Since the central model may be updated an arbitrary
number of times while a client is computing a model [5] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and
update, this update will often be out of date when it V. Shmatikov. How to backdoor federated learning.
arrives at the server. Staleness slows down convergence, arXiv preprint arXiv:1807.00459, 2018.
especially during the final stages of training. [6] D. Bahdanau, K. Cho, and Y. Bengio. Neural ma-
Standards: To communicate neural data in an inter- chine translation by jointly learning to align and
operable manner, standardized data formats and com- translate. arXiv preprint arXiv:1409.0473, 2014.
munication protocols are required. Currently, MPEG is
working towards a new part 17 of the ISO/IEC 15938 [7] A. Bellet, R. Guerraoui, M. Taziki, and M. Tom-
standard, defining tools for compression of neural data masi. Personalized and private peer-to-peer ma-
for multimedia applications and representing the result- chine learning. arXiv preprint arXiv:1705.08435,
ing bitstreams for efficient transport. Further steps are 2017.
needed in this direction for a large-scale implementation [8] J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and
of embedded machine learning solutions. A. Anandkumar. signsgd: compressed optimi-
sation for non-convex problems. arXiv preprint
4. CONCLUSION arXiv:1802.04434, 2018.
We currently witness a convergence between the ar- [9] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone,
eas of machine learning and communication technology. H. B. McMahan, S. Patel, D. Ramage, A. Segal,
Not only are today’s algorithms used to enhance the and K. Seth. Practical secure aggregation for fed-
design and management of networks and communica- erated learning on user-held data. arXiv preprint
tion components [34], ML models such as deep neu- arXiv:1611.04482, 2016.
ral networks themselves are being communicated more [10] L. Bottou. Online learning and stochastic approx-
and more in our highly connected world. The roll-out imations. On-line learning in neural networks,
of data-intensive 5G networks and the rise of mobile 17(9):142, 1998.
and IoT applications will further accelerate this devel-
opment, and it can be predicted that neural data will [11] S. Caldas, J. Konečny, H. B. McMahan, and A. Tal-
soon account for a sizable portion of the traffic through walkar. Expanding the reach of federated learn-
global communication networks. ing by reducing client resource requirements. arXiv
This paper has described the four most important set- preprint arXiv:1812.07210, 2018.
© International Telecommunication Union, 2020 59