pith. sign in

arxiv: 1906.09248 · v2 · pith:DQV7WU4Ynew · submitted 2019-06-21 · 💻 cs.LG · cs.MA· stat.ML

Privacy Preserving QoE Modeling using Collaborative Learning

Pith reviewed 2026-05-25 18:48 UTC · model grok-4.3

classification 💻 cs.LG cs.MAstat.ML
keywords privacy preservingcollaborative learningQoE modelinground-robin trainingmachine learningfederated learningquality of experience
0
0 comments X

The pith

Round-robin sequential updates across nodes let QoE models generalize better without sharing sensitive user data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that training a Quality of Experience model by passing it sequentially among partner nodes produces a more general model than training on any single node's data alone. Each node updates the model on its local data and forwards the updated parameters, so no raw user records ever leave their origin. The approach is benchmarked directly against isolated training on one node, pooled centralized training, and a customized federated learning baseline. If the sequential updates converge usefully, researchers could combine small protected datasets into more robust models while satisfying consent and privacy rules that currently block data sharing.

Core claim

The authors introduce round-robin based collaborative machine learning model training, where the model is trained in a sequential manner amongst the collaborated partner nodes, and benchmark this mechanism using their customized Federated Learning setup as well as conventional Centralized and Isolated Learning methods for privacy-preserving QoE modeling.

What carries the argument

Round-robin collaborative training: a sequential hand-off of model parameters across nodes, each performing local updates before passing the model onward.

If this is right

  • QoE models can reach higher accuracy on new populations without any node revealing its raw user records.
  • Sequential updates preserve the privacy guarantees of isolated training while incorporating information from multiple sources.
  • Performance lies between isolated training and fully centralized training that would require data pooling.
  • The same sequential protocol can be applied to any domain limited by small, consent-restricted datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Research groups could pool modeling effort across institutions without negotiating data-transfer agreements.
  • Convergence behavior under larger differences in user demographics remains open for direct measurement.
  • The method may lower the cost of QoE studies by letting each new small experiment contribute to an accumulating shared model.

Load-bearing premise

The data distributions at the different nodes are compatible enough that sequential updates converge to a single useful model.

What would settle it

A test set drawn from a user population outside all participating nodes where the round-robin model shows no accuracy gain over a model trained only on one node's data.

Figures

Figures reproduced from arXiv: 1906.09248 by Konstantinos Vandikas, Markus Fiedler, Selim Ickin.

Figure 1
Figure 1. Figure 1: D, and it works as follows. First an initial weight [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Comparison of different learning techniques. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Basic architecture of our prototype for Federated Learning The model training time for FL and RRL is compared. RRL is a sequential training, i.e., training on one node starts after training is completed on another node, hence the total training time is the sum of individual training time on all workers. On the other hand, in FL training, the training takes place in parallel; all workers train on their indi… view at source ↗
Figure 3
Figure 3. Figure 3: Kernel Density Estimate (KDE) plots for the MOS ratings and the age of the users on each group. Reasonable bandwidths are chosen in the visualisation to avoid under- or over-fitting. listed as follows: i) maximum downlink bandwidth (dl. bw.) set in the network emulator; ii) browsing time until the rat￾ing prompt which is the surfing duration (dur. surfing); and iii) time consumed during the rating process,… view at source ↗
Figure 4
Figure 4. Figure 4: Mean (with 95% CI) training perfor￾mance(left) and time(right) over rounds in FL [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: Amongst the collaborative learning approaches, there are pro’s and con’s between FL and RRL. Within one cycle training time (approx. 2.5 s), the RRL based model reaches a minimum AUC of 0.77 on all nodes, which is higher than when the training on the nodes take place independently and in an isolated manner. In this scenario, each node sends the weights directly to the next node, hence the next node can rev… view at source ↗
read the original abstract

Machine Learning based Quality of Experience (QoE) models potentially suffer from over-fitting due to limitations including low data volume, and limited participant profiles. This prevents models from becoming generic. Consequently, these trained models may under-perform when tested outside the experimented population. One reason for the limited datasets, which we refer in this paper as small QoE data lakes, is due to the fact that often these datasets potentially contain user sensitive information and are only collected throughout expensive user studies with special user consent. Thus, sharing of datasets amongst researchers is often not allowed. In recent years, privacy preserving machine learning models have become important and so have techniques that enable model training without sharing datasets but instead relying on secure communication protocols. Following this trend, in this paper, we present Round-Robin based Collaborative Machine Learning model training, where the model is trained in a sequential manner amongst the collaborated partner nodes. We benchmark this work using our customized Federated Learning mechanism as well as conventional Centralized and Isolated Learning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes Round-Robin based Collaborative Machine Learning for privacy-preserving QoE modeling, in which a shared model is trained sequentially across partner nodes without exchanging raw datasets. It benchmarks the approach against a customized Federated Learning mechanism as well as standard Centralized and Isolated Learning baselines, motivated by overfitting risks arising from small, consent-restricted QoE data lakes.

Significance. A validated sequential collaborative scheme that demonstrably improves generalization over isolated training while preserving privacy would be relevant to QoE modeling and other domains with non-i.i.d. private data. The manuscript supplies no equations, convergence analysis, performance numbers, or experimental validation, so the practical significance cannot yet be assessed.

major comments (1)
  1. [Abstract] Abstract: the central claim that sequential round-robin updates produce a model with better generalization than isolated training is load-bearing, yet the text supplies neither a mathematical description of the update rule, nor any safeguard against parameter overwriting on heterogeneous QoE distributions, nor convergence analysis or empirical results.
minor comments (1)
  1. [Abstract] The abstract refers to 'our customized Federated Learning mechanism' without indicating what customizations were made relative to standard FedAvg.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review. We agree that the current manuscript version is missing key technical elements required to substantiate the central claims, and we will substantially revise the paper to include them.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that sequential round-robin updates produce a model with better generalization than isolated training is load-bearing, yet the text supplies neither a mathematical description of the update rule, nor any safeguard against parameter overwriting on heterogeneous QoE distributions, nor convergence analysis or empirical results.

    Authors: We agree that the abstract and main text currently lack a mathematical formulation of the round-robin update rule, any analysis of overwriting risks under non-i.i.d. QoE data, convergence guarantees, and experimental results. In the revised manuscript we will add: (1) explicit equations describing the sequential parameter update across nodes, (2) a discussion of mechanisms (e.g., learning-rate scheduling or local regularization) to mitigate overwriting on heterogeneous distributions, (3) a brief convergence sketch, and (4) the benchmark results comparing round-robin, federated, centralized, and isolated training on the QoE datasets. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical benchmarking only

full rationale

The paper presents a sequential round-robin collaborative training procedure for QoE models and benchmarks it empirically against federated, centralized, and isolated baselines. No equations, fitted parameters, predictions derived from prior fits, or load-bearing self-citations appear in the provided text. The method is described as a practical protocol with no mathematical derivation chain that reduces to its own inputs by construction. This is a standard non-circular empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the domain assumption that QoE datasets contain sensitive user information that legally prevents sharing, creating the need for collaborative training; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption QoE datasets contain user sensitive information and are only collected with special consent, preventing sharing among researchers
    Explicitly stated in the abstract as the reason for small QoE data lakes and the motivation for privacy-preserving methods.

pith-pipeline@v0.9.0 · 5706 in / 1131 out tokens · 26794 ms · 2026-05-25T18:48:57.985344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    As such, data privacy is preserved but the models might have the risk of being not sufficiently representative

    INTRODUCTION Today, QoE models are developed based on isolated data lakes within the premises of researchers, as sharing of data is often not preferred or allowed. As such, data privacy is preserved but the models might have the risk of being not sufficiently representative. There is an increasing trend that the data sets collected via QoE experiments are b...

  2. [2]

    Privacy Preserving QoE Modeling using Collaborative Learning

    RELA TED WORK ML algorithms such as Decision Trees, Random Forests are a few of the most commonly used techniques in the QoE literature [6]. Support Vector Machines (SVM) have been arXiv:1906.09248v2 [cs.LG] 26 Jun 2019 used earlier in QoE Modeling as they often perform well in small datasets [10]. These models are hard to use for Collab- orative Learning...

  3. [3]

    ML MODEL TRAINING MECHANISMS Table 1: Scenario comparison summary. Centralized Isolated Collaborative Data transfer High None None Weight transfer None None Low Privacy preserving No Yes Yes Training Master Workers Workers The experiments are performed in four scenarios: Cen- tralized (CL), Isolated (IL), Round-Robin Learning (RRL), and Federated Learning...

  4. [4]

    low” and “high

    DA TASET AND MODELING 4.1 Dataset The public web QoE dataset which is available at [4] is used in the experiments. The dataset is artificially and arbitrarily divided in three different groups, where the as- sumption is that these three isolated groups are located at different data centers and are not allowed to share raw data amongst each other. The users i...

  5. [5]

    The experiments are performed to understand and find out the best hyper parameters

    RESULTS 5.1 Isolated Learning (IL) Two machine learning algorithms, one simple DT and one rather more complex NN, are studied with different hyper parameters to model QoE. The experiments are performed to understand and find out the best hyper parameters. We let the isolated models to train at best effort, i.e., tuned the hyper parameters until the AUC did n...

  6. [6]

    NN model accuracy outperforms the isolated decision tree models when trained either as an isolated, or in a collabora- tive manner

    CONCLUSION In this paper, we present that collaborative machine learn- ing as potential tool that can be suggested in QoE modeling. NN model accuracy outperforms the isolated decision tree models when trained either as an isolated, or in a collabora- tive manner. We study Federated Learning (FL) and Round Robin Learning (RRL) to show that on par accuracy ...

  7. [7]

    https://www.pytorch.org, Accessed: 2019-06-04

    Pytorch. https://www.pytorch.org, Accessed: 2019-06-04

  8. [8]

    https://www.tensorflow.org/federated/, Accessed: 2019-06-04

    Tensorflow Federated. https://www.tensorflow.org/federated/, Accessed: 2019-06-04

  9. [9]

    https: //github.com/baidu-research/baidu-allreduce/, Accessed: 2019-06-14

    baidu-allreduce. https: //github.com/baidu-research/baidu-allreduce/, Accessed: 2019-06-14

  10. [10]

    https://www.schatz.cc/downloads/web-dataset/, Accessed: 2019-06-14

    Web browsing QoE subjective test dataset V 1.0. https://www.schatz.cc/downloads/web-dataset/, Accessed: 2019-06-14

  11. [11]

    http://dbq.multimediatech.cz, Accessed: 2019-06-18

    Qualinet database. http://dbq.multimediatech.cz, Accessed: 2019-06-18

  12. [12]

    Aroussi et al

    S. Aroussi et al. Survey on machine learning-based QoE-QoS correlation models. International Conference on Computing, Management and Telecommunications (ComManTel), pages 200–204, 2014

  13. [13]

    Bonawitz et al

    K. Bonawitz et al. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, pages , New York, USA. ACM., pages 1175–1191, 2017

  14. [14]

    Braams and C

    J. Braams and C. Dwork. Differential Privacy. Springer US, pages 338–340, 2011

  15. [15]

    Brendan McMahan et al

    H. Brendan McMahan et al. Federated learning of deep networks using model averaging. CoRR, 2018

  16. [16]

    Hoßfeld et al

    T. Hoßfeld et al. Quantification of Youtube QoE via crowdsourcing. In 2011 IEEE International Symposium on Multimedia, pages 494–499, 2011

  17. [17]

    Moritz et al

    P. Moritz et al. Ray: A distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 561–577, 2018

  18. [18]

    Orsolic et al

    I. Orsolic et al. Youtube QoE estimation from encrypted traffic: Comparison of test methodologies and machine learning based models. QoMEX’18, 2018