Active Learning Solution on Distributed Edge Computing

Jia Qian; Lars Kai Hansen; Sayantan Sengupta

arxiv: 1906.10718 · v1 · pith:XXNAZLBMnew · submitted 2019-06-25 · 💻 cs.DC · cs.LG

Active Learning Solution on Distributed Edge Computing

Jia Qian , Sayantan Sengupta , Lars Kai Hansen This is my paper

Pith reviewed 2026-05-25 15:46 UTC · model grok-4.3

classification 💻 cs.DC cs.LG

keywords active learningfederated learningedge computingfog computingdistributed machine learningimage classification

0 comments

The pith

Active learning on edge devices plus federated learning on fog nodes reduces the samples and communication needed to train image classifiers in distributed setups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes splitting data handling in fog platforms so that edge devices run active learning to pick informative samples while the fog node runs federated learning to combine models. This division is presented as a way to cut the volume of data required for training and the amount of data moved between devices and nodes. The approach is evaluated on an image classification task under both massively distributed and non-massively distributed conditions.

Core claim

By decomposing data aggregation and processing between edge devices and fog nodes, active learning at the edges selects fewer samples and federated learning at the fog node aggregates models without centralizing raw data, thereby lowering both training sample count and communication cost for image classification in the two distribution regimes.

What carries the argument

Intelligent division of active learning (edge) and federated learning (fog) that performs sample selection locally and model aggregation centrally.

If this is right

Fewer raw data samples need to be stored or transmitted from edge devices.
Communication volume between edges and fog decreases because only model updates or selected samples move.
Local processing at edges supports privacy by limiting data sharing.
Separate solutions are offered for massively distributed versus non-massively distributed device populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may extend to other supervised tasks if active learning query strategies remain effective on edge hardware.
Energy use on battery-powered edges could drop if fewer samples are processed locally.
Deployment would still require verifying that the fog node can handle the federated aggregation load without becoming a bottleneck.

Load-bearing premise

The split of tasks between edges and fog nodes can be arranged so that accuracy stays acceptable and no new overheads erase the claimed reductions in samples and communication.

What would settle it

A direct comparison on the same image classification task showing that the active-plus-federated method requires at least as many samples or as much communication as a baseline centralized or non-active approach while matching accuracy.

Figures

Figures reproduced from arXiv: 1906.10718 by Jia Qian, Lars Kai Hansen, Sayantan Sengupta.

**Figure 1.** Figure 1: Pool-based Active Learning Framework. maximizing the likelihood. Uncertainty-based methods aim to use uncertain information to enhance the model during the training process. It plays the role of the exploitation while acts as the exploration part. We will introduce three different ways to estimate the uncertainty. – Maximal Entropy: H[y|x, Dtrain] is the predictive entropy expectation as defined in [9]. H… view at source ↗

**Figure 2.** Figure 2: Overview of the scheme. non-massive case, a small number of distributed devices, let’s say four edge devices and one centralized node. Initially, we trained LeNet model by 20 images at the centralized node (Fog Node), and then dispatch the model to the edge devices. On the devices side, we further trained the model by additional data points that are generated locally. They are acquired by entropy, bald or … view at source ↗

**Figure 4.** Figure 4: Learning curve: Well-Trained Model. B. Experiment II: AL acquisition number In this series of experiment, we study how does the acquisition number influence the performance. Recall that during every data acquisition, we include 10 additional images for further training [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 7.** Figure 7: Active Learning Vs Random Sample (20 Acquisitions). [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 5.** Figure 5: Learning Curve of Edge Devices for 10, 20, 30 and 40 acquisitions [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Active Learning Vs Random Sample (10 Acquisitions). [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: learning curves: 20 devices, trained by 60 images. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Accuracy from the centralized fog node where we have 20 devices [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Accuracy from the centralized fog node where we have 20 devices [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Architecture of massively distributed setting. Diagram A indicates [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

read the original abstract

Industry 4.0 becomes possible through the convergence between Operational and Information Technologies. All the requirements to realize the convergence is integrated on the Fog Platform. Fog Platform is introduced between the cloud server and edge devices when the unprecedented generation of data causes the burden of the cloud server, leading the ineligible latency. In this new paradigm, we divide the computation tasks and push it down to edge devices. Furthermore, local computing (at edge side) may improve privacy and trust. To address these problems, we present a new method, in which we decompose the data aggregation and processing, by dividing them between edge devices and fog nodes intelligently. We apply active learning on edge devices; and federated learning on the fog node which significantly reduces the data samples to train the model as well as the communication cost. To show the effectiveness of the proposed method, we implemented and evaluated its performance for an image classification task. In addition, we consider two settings: massively distributed and non-massively distributed and offer the corresponding solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a hybrid active learning and federated learning approach for distributed edge computing in Industry 4.0 settings. Active learning is performed on edge devices to select informative samples, while federated learning aggregates models at fog nodes. This is claimed to reduce the number of training samples and communication costs. The method is evaluated on an image classification task under both massively distributed and non-massively distributed settings.

Significance. If the reported experimental reductions in samples and communication hold with maintained accuracy, the work could provide a practical technique for lowering overhead in fog-edge deployments while preserving privacy. The explicit handling of two distribution regimes is a useful contribution, and the presence of concrete accuracy and communication metrics in the experimental section strengthens the central claim.

minor comments (2)

The abstract asserts significant reductions in data samples and communication cost but provides no quantitative results, baselines, or error bars; adding a sentence with key metrics would better support the claim.
The description of how the two settings (massively vs. non-massively distributed) are implemented could be expanded with more detail on data partitioning and model update frequency to aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The acknowledgment of the practical value for fog-edge deployments and the explicit treatment of the two distribution regimes is appreciated. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation stands alone

full rationale

The manuscript describes an empirical architecture that applies active learning at edge devices and federated learning at the fog node, then reports concrete accuracy and communication metrics on an image-classification task under massively and non-massively distributed regimes. No equations, parameter-fitting steps, uniqueness theorems, or self-citations appear in the provided text that would allow any claimed result to reduce to its own inputs by construction. The central claims are therefore supported by external experimental outcomes rather than definitional or self-referential loops.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; it contains no mathematical derivations, fitted constants, or postulated entities, so the ledger is empty.

pith-pipeline@v0.9.0 · 5699 in / 1217 out tokens · 32151 ms · 2026-05-25T15:46:44.613417+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 5 internal anchors

[1]

”Fog computing and its role in the internet of things.” Proceedings of the ﬁrst edition of the MCC workshop on Mobile cloud computing

Bonomi, Flavio, et al. ”Fog computing and its role in the internet of things.” Proceedings of the ﬁrst edition of the MCC workshop on Mobile cloud computing. ACM, 2012

work page 2012
[2]

”Fog computing: A platform for internet of things and analytics.” Big data and internet of things: A roadmap for smart environments

Bonomi, Flavio, et al. ”Fog computing: A platform for internet of things and analytics.” Big data and internet of things: A roadmap for smart environments. Springer, Cham, 2014. 169-186

work page 2014
[3]

”Fog computing: Helping the Internet of Things realize its potential.” Computer 49.8 (2016): 112- 116

Dastjerdi, Amir Vahid, and Rajkumar Buyya. ”Fog computing: Helping the Internet of Things realize its potential.” Computer 49.8 (2016): 112- 116

work page 2016
[4]

”Active learning literature survey

Settles, Burr. ”Active learning literature survey. 2010.” Computer Sci- ences Technical Report 1648 (2014)

work page 2010
[5]

”Dropout as a Bayesian approxi- mation: Representing model uncertainty in deep learning.” international conference on machine learning

Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approxi- mation: Representing model uncertainty in deep learning.” international conference on machine learning. 2016

work page 2016
[6]

Federated Learning: Strategies for Improving Communication Efficiency

Konen, Jakub, et al. ”Federated learning: Strategies for improving communication efﬁciency.” arXiv preprint arXiv:1610.05492 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

”Gaussian processes in machine learning.” Advanced lectures on machine learning

Rasmussen, Carl Edward. ”Gaussian processes in machine learning.” Advanced lectures on machine learning. Springer, Berlin, Heidelberg,

work page
[8]

LeCun, Yann, Corinna Cortes, and C. J. Burges. ”MNIST handwritten digit database.” AT &T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist 2 (2010)

work page 2010
[9]

”A mathematical theory of communication.” Bell system technical journal 27.3 (1948): 379-423

Shannon, Claude Elwood. ”A mathematical theory of communication.” Bell system technical journal 27.3 (1948): 379-423

work page 1948
[10]

”Support vector machine active learning with applications to text classiﬁcation.” Journal of machine learning research 2.Nov (2001): 45-66

Tong, Simon, and Daphne Koller. ”Support vector machine active learning with applications to text classiﬁcation.” Journal of machine learning research 2.Nov (2001): 45-66

work page 2001
[11]

Bayesian Active Learning for Classification and Preference Learning

Houlsby, Neil, et al. ”Bayesian active learning for classiﬁcation and preference learning.” arXiv preprint arXiv:1112.5745 (2011)

work page internal anchor Pith review Pith/arXiv arXiv 2011
[12]

Elementary applied statistics: for students in behav- ioral science

Freeman, Linton C. Elementary applied statistics: for students in behav- ioral science. John Wiley and Sons, 1965

work page 1965
[13]

Deep Bayesian Active Learning with Image Data

Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. ”Deep bayesian active learning with image data.” arXiv preprint arXiv:1703.02910 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

and Chang, E., 2001, October

Tong, S. and Chang, E., 2001, October. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM interna- tional conference on Multimedia (pp. 107-118). ACM

work page 2001
[15]

and Chilamkurti, N., 2018

Diro, A.A. and Chilamkurti, N., 2018. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Generation Computer Systems, 82, pp.761-768

work page 2018
[16]

Thompson, Cynthia A., Mary Elaine Califf, and Raymond J. Mooney. ”Active learning for natural language parsing and information extrac- tion.” ICML. 1999

work page 1999
[17]

”Balancing exploration and exploitation: A new algorithm for active machine learning.” Data Mining, Fifth IEEE International Conference on

Osugi, Thomas, Deng Kim, and Stephen Scott. ”Balancing exploration and exploitation: A new algorithm for active machine learning.” Data Mining, Fifth IEEE International Conference on. IEEE, 2005

work page 2005
[18]

and Blundell, C., 2017

Lakshminarayanan, B., Pritzel, A. and Blundell, C., 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (pp. 6402-6413)

work page 2017
[19]

and Van Roy, B., 2016

Osband, I., Blundell, C., Pritzel, A. and Van Roy, B., 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems (pp. 4026-4034)

work page 2016
[20]

and Ghahramani, Z., 2016, June

Gal, Y . and Ghahramani, Z., 2016, June. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059)

work page 2016
[21]

and Bengio, Y ., 1999

LeCun, Y ., Haffner, P., Bottou, L. and Bengio, Y ., 1999. Object recog- nition with gradient-based learning. In Shape, contour and grouping in computer vision (pp. 319-345). Springer, Berlin, Heidelberg

work page 1999
[22]

Differential privacy: A survey of results

Dwork, C., 2008, April. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (pp. 1-19). Springer, Berlin, Heidelberg

work page 2008
[23]

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

Tang, B., Chen, Z., Hefferman, G., Wei, T., He, H. and Yang, Q., 2015, October. A hierarchical distributed fog computing architecture for big data analysis in smart cities. In Proceedings of the ASE BigData and SocialInformatics 2015 (p. 28). ACM.reprint arXiv:1506.02158

work page internal anchor Pith review Pith/arXiv arXiv 2015
[24]

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Konecn, J., McMahan, H.B., Ramage, D. and Richtrik, P., 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

and Si, L., 2012, August

Hong, D. and Si, L., 2012, August. Mixture model with multiple centralized retrieval algorithms for result merging in federated search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 821-830). ACM

work page 2012
[26]

and McAuliffe, J.D., 2017

Blei, D.M., Kucukelbir, A. and McAuliffe, J.D., 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), pp.859-877

work page 2017
[27]

and Lerer, A., 2017

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. and Lerer, A., 2017. Automatic differentiation in pytorch

work page 2017
[28]

and Haffner, P., 1998

LeCun, Y ., Bottou, L., Bengio, Y . and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278-2324

work page 1998
[29]

and Salakhutdi- nov, R., 2014

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdi- nov, R., 2014. Dropout: a simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1), pp.1929- 1958

work page 2014

[1] [1]

”Fog computing and its role in the internet of things.” Proceedings of the ﬁrst edition of the MCC workshop on Mobile cloud computing

Bonomi, Flavio, et al. ”Fog computing and its role in the internet of things.” Proceedings of the ﬁrst edition of the MCC workshop on Mobile cloud computing. ACM, 2012

work page 2012

[2] [2]

”Fog computing: A platform for internet of things and analytics.” Big data and internet of things: A roadmap for smart environments

Bonomi, Flavio, et al. ”Fog computing: A platform for internet of things and analytics.” Big data and internet of things: A roadmap for smart environments. Springer, Cham, 2014. 169-186

work page 2014

[3] [3]

”Fog computing: Helping the Internet of Things realize its potential.” Computer 49.8 (2016): 112- 116

Dastjerdi, Amir Vahid, and Rajkumar Buyya. ”Fog computing: Helping the Internet of Things realize its potential.” Computer 49.8 (2016): 112- 116

work page 2016

[4] [4]

”Active learning literature survey

Settles, Burr. ”Active learning literature survey. 2010.” Computer Sci- ences Technical Report 1648 (2014)

work page 2010

[5] [5]

”Dropout as a Bayesian approxi- mation: Representing model uncertainty in deep learning.” international conference on machine learning

Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approxi- mation: Representing model uncertainty in deep learning.” international conference on machine learning. 2016

work page 2016

[6] [6]

Federated Learning: Strategies for Improving Communication Efficiency

Konen, Jakub, et al. ”Federated learning: Strategies for improving communication efﬁciency.” arXiv preprint arXiv:1610.05492 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

”Gaussian processes in machine learning.” Advanced lectures on machine learning

Rasmussen, Carl Edward. ”Gaussian processes in machine learning.” Advanced lectures on machine learning. Springer, Berlin, Heidelberg,

work page

[8] [8]

LeCun, Yann, Corinna Cortes, and C. J. Burges. ”MNIST handwritten digit database.” AT &T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist 2 (2010)

work page 2010

[9] [9]

”A mathematical theory of communication.” Bell system technical journal 27.3 (1948): 379-423

Shannon, Claude Elwood. ”A mathematical theory of communication.” Bell system technical journal 27.3 (1948): 379-423

work page 1948

[10] [10]

”Support vector machine active learning with applications to text classiﬁcation.” Journal of machine learning research 2.Nov (2001): 45-66

Tong, Simon, and Daphne Koller. ”Support vector machine active learning with applications to text classiﬁcation.” Journal of machine learning research 2.Nov (2001): 45-66

work page 2001

[11] [11]

Bayesian Active Learning for Classification and Preference Learning

Houlsby, Neil, et al. ”Bayesian active learning for classiﬁcation and preference learning.” arXiv preprint arXiv:1112.5745 (2011)

work page internal anchor Pith review Pith/arXiv arXiv 2011

[12] [12]

Elementary applied statistics: for students in behav- ioral science

Freeman, Linton C. Elementary applied statistics: for students in behav- ioral science. John Wiley and Sons, 1965

work page 1965

[13] [13]

Deep Bayesian Active Learning with Image Data

Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. ”Deep bayesian active learning with image data.” arXiv preprint arXiv:1703.02910 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

and Chang, E., 2001, October

Tong, S. and Chang, E., 2001, October. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM interna- tional conference on Multimedia (pp. 107-118). ACM

work page 2001

[15] [15]

and Chilamkurti, N., 2018

Diro, A.A. and Chilamkurti, N., 2018. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Generation Computer Systems, 82, pp.761-768

work page 2018

[16] [16]

Thompson, Cynthia A., Mary Elaine Califf, and Raymond J. Mooney. ”Active learning for natural language parsing and information extrac- tion.” ICML. 1999

work page 1999

[17] [17]

”Balancing exploration and exploitation: A new algorithm for active machine learning.” Data Mining, Fifth IEEE International Conference on

Osugi, Thomas, Deng Kim, and Stephen Scott. ”Balancing exploration and exploitation: A new algorithm for active machine learning.” Data Mining, Fifth IEEE International Conference on. IEEE, 2005

work page 2005

[18] [18]

and Blundell, C., 2017

Lakshminarayanan, B., Pritzel, A. and Blundell, C., 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (pp. 6402-6413)

work page 2017

[19] [19]

and Van Roy, B., 2016

Osband, I., Blundell, C., Pritzel, A. and Van Roy, B., 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems (pp. 4026-4034)

work page 2016

[20] [20]

and Ghahramani, Z., 2016, June

Gal, Y . and Ghahramani, Z., 2016, June. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059)

work page 2016

[21] [21]

and Bengio, Y ., 1999

LeCun, Y ., Haffner, P., Bottou, L. and Bengio, Y ., 1999. Object recog- nition with gradient-based learning. In Shape, contour and grouping in computer vision (pp. 319-345). Springer, Berlin, Heidelberg

work page 1999

[22] [22]

Differential privacy: A survey of results

Dwork, C., 2008, April. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (pp. 1-19). Springer, Berlin, Heidelberg

work page 2008

[23] [23]

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

Tang, B., Chen, Z., Hefferman, G., Wei, T., He, H. and Yang, Q., 2015, October. A hierarchical distributed fog computing architecture for big data analysis in smart cities. In Proceedings of the ASE BigData and SocialInformatics 2015 (p. 28). ACM.reprint arXiv:1506.02158

work page internal anchor Pith review Pith/arXiv arXiv 2015

[24] [24]

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Konecn, J., McMahan, H.B., Ramage, D. and Richtrik, P., 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

and Si, L., 2012, August

Hong, D. and Si, L., 2012, August. Mixture model with multiple centralized retrieval algorithms for result merging in federated search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 821-830). ACM

work page 2012

[26] [26]

and McAuliffe, J.D., 2017

Blei, D.M., Kucukelbir, A. and McAuliffe, J.D., 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), pp.859-877

work page 2017

[27] [27]

and Lerer, A., 2017

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. and Lerer, A., 2017. Automatic differentiation in pytorch

work page 2017

[28] [28]

and Haffner, P., 1998

LeCun, Y ., Bottou, L., Bengio, Y . and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278-2324

work page 1998

[29] [29]

and Salakhutdi- nov, R., 2014

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdi- nov, R., 2014. Dropout: a simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1), pp.1929- 1958

work page 2014