Privacy Preserving QoE Modeling using Collaborative Learning

Konstantinos Vandikas; Markus Fiedler; Selim Ickin

arxiv: 1906.09248 · v2 · pith:DQV7WU4Ynew · submitted 2019-06-21 · 💻 cs.LG · cs.MA· stat.ML

Privacy Preserving QoE Modeling using Collaborative Learning

Selim Ickin , Konstantinos Vandikas , Markus Fiedler This is my paper

Pith reviewed 2026-05-25 18:48 UTC · model grok-4.3

classification 💻 cs.LG cs.MAstat.ML

keywords privacy preservingcollaborative learningQoE modelinground-robin trainingmachine learningfederated learningquality of experience

0 comments

The pith

Round-robin sequential updates across nodes let QoE models generalize better without sharing sensitive user data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that training a Quality of Experience model by passing it sequentially among partner nodes produces a more general model than training on any single node's data alone. Each node updates the model on its local data and forwards the updated parameters, so no raw user records ever leave their origin. The approach is benchmarked directly against isolated training on one node, pooled centralized training, and a customized federated learning baseline. If the sequential updates converge usefully, researchers could combine small protected datasets into more robust models while satisfying consent and privacy rules that currently block data sharing.

Core claim

The authors introduce round-robin based collaborative machine learning model training, where the model is trained in a sequential manner amongst the collaborated partner nodes, and benchmark this mechanism using their customized Federated Learning setup as well as conventional Centralized and Isolated Learning methods for privacy-preserving QoE modeling.

What carries the argument

Round-robin collaborative training: a sequential hand-off of model parameters across nodes, each performing local updates before passing the model onward.

If this is right

QoE models can reach higher accuracy on new populations without any node revealing its raw user records.
Sequential updates preserve the privacy guarantees of isolated training while incorporating information from multiple sources.
Performance lies between isolated training and fully centralized training that would require data pooling.
The same sequential protocol can be applied to any domain limited by small, consent-restricted datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Research groups could pool modeling effort across institutions without negotiating data-transfer agreements.
Convergence behavior under larger differences in user demographics remains open for direct measurement.
The method may lower the cost of QoE studies by letting each new small experiment contribute to an accumulating shared model.

Load-bearing premise

The data distributions at the different nodes are compatible enough that sequential updates converge to a single useful model.

What would settle it

A test set drawn from a user population outside all participating nodes where the round-robin model shows no accuracy gain over a model trained only on one node's data.

Figures

Figures reproduced from arXiv: 1906.09248 by Konstantinos Vandikas, Markus Fiedler, Selim Ickin.

**Figure 1.** Figure 1: Comparison of different learning techniques. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Basic architecture of our prototype for Federated Learning The model training time for FL and RRL is compared. RRL is a sequential training, i.e., training on one node starts after training is completed on another node, hence the total training time is the sum of individual training time on all workers. On the other hand, in FL training, the training takes place in parallel; all workers train on their indi… view at source ↗

**Figure 3.** Figure 3: Kernel Density Estimate (KDE) plots for the MOS ratings and the age of the users on each group. Reasonable bandwidths are chosen in the visualisation to avoid under- or over-fitting. listed as follows: i) maximum downlink bandwidth (dl. bw.) set in the network emulator; ii) browsing time until the rating prompt which is the surfing duration (dur. surfing); and iii) time consumed during the rating process,… view at source ↗

**Figure 4.** Figure 4: Mean (with 95% CI) training performance(left) and time(right) over rounds in FL [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 4.** Figure 4: Amongst the collaborative learning approaches, there are pro’s and con’s between FL and RRL. Within one cycle training time (approx. 2.5 s), the RRL based model reaches a minimum AUC of 0.77 on all nodes, which is higher than when the training on the nodes take place independently and in an isolated manner. In this scenario, each node sends the weights directly to the next node, hence the next node can rev… view at source ↗

read the original abstract

Machine Learning based Quality of Experience (QoE) models potentially suffer from over-fitting due to limitations including low data volume, and limited participant profiles. This prevents models from becoming generic. Consequently, these trained models may under-perform when tested outside the experimented population. One reason for the limited datasets, which we refer in this paper as small QoE data lakes, is due to the fact that often these datasets potentially contain user sensitive information and are only collected throughout expensive user studies with special user consent. Thus, sharing of datasets amongst researchers is often not allowed. In recent years, privacy preserving machine learning models have become important and so have techniques that enable model training without sharing datasets but instead relying on secure communication protocols. Following this trend, in this paper, we present Round-Robin based Collaborative Machine Learning model training, where the model is trained in a sequential manner amongst the collaborated partner nodes. We benchmark this work using our customized Federated Learning mechanism as well as conventional Centralized and Isolated Learning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper describes round-robin sequential training for privacy-preserving QoE models but supplies no results, equations, or analysis to show whether it works.

read the letter

The one thing to know is that this paper describes a round-robin way to train QoE models across sites without sharing the raw data. They plan to test it against centralized training, training on each site alone, and a federated learning baseline they customized. What is new is the application of sequential collaborative training specifically to QoE modeling. The privacy constraint they highlight is genuine, since these datasets come from user studies and often cannot be shared. The paper does well at framing the problem. It points out that small data lakes lead to overfitting and that privacy preserving methods are needed. That part is accurate and relevant to anyone who has tried to build general QoE models. The soft spots start right after the description of the method. The abstract mentions benchmarking but provides no results, no equations for how the model is updated in each round, and no discussion of convergence or handling different data distributions at each node. The concern about sequential updates overwriting useful parameters on non-i.i.d. QoE data is valid here. Without any mention of techniques to prevent that, the approach as described could easily produce a model that performs worse than isolated training on some nodes. This kind of paper is for the QoE community that deals with data collection limits. A reader outside that area or looking for a complete method will not find enough to replicate or assess the contribution. I would not bring this to the next reading group. I would not cite it in my own work. It does not deserve to go to peer review because the evidence for the main claim is not present in the text.

Referee Report

1 major / 1 minor

Summary. The paper proposes Round-Robin based Collaborative Machine Learning for privacy-preserving QoE modeling, in which a shared model is trained sequentially across partner nodes without exchanging raw datasets. It benchmarks the approach against a customized Federated Learning mechanism as well as standard Centralized and Isolated Learning baselines, motivated by overfitting risks arising from small, consent-restricted QoE data lakes.

Significance. A validated sequential collaborative scheme that demonstrably improves generalization over isolated training while preserving privacy would be relevant to QoE modeling and other domains with non-i.i.d. private data. The manuscript supplies no equations, convergence analysis, performance numbers, or experimental validation, so the practical significance cannot yet be assessed.

major comments (1)

[Abstract] Abstract: the central claim that sequential round-robin updates produce a model with better generalization than isolated training is load-bearing, yet the text supplies neither a mathematical description of the update rule, nor any safeguard against parameter overwriting on heterogeneous QoE distributions, nor convergence analysis or empirical results.

minor comments (1)

[Abstract] The abstract refers to 'our customized Federated Learning mechanism' without indicating what customizations were made relative to standard FedAvg.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review. We agree that the current manuscript version is missing key technical elements required to substantiate the central claims, and we will substantially revise the paper to include them.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that sequential round-robin updates produce a model with better generalization than isolated training is load-bearing, yet the text supplies neither a mathematical description of the update rule, nor any safeguard against parameter overwriting on heterogeneous QoE distributions, nor convergence analysis or empirical results.

Authors: We agree that the abstract and main text currently lack a mathematical formulation of the round-robin update rule, any analysis of overwriting risks under non-i.i.d. QoE data, convergence guarantees, and experimental results. In the revised manuscript we will add: (1) explicit equations describing the sequential parameter update across nodes, (2) a discussion of mechanisms (e.g., learning-rate scheduling or local regularization) to mitigate overwriting on heterogeneous distributions, (3) a brief convergence sketch, and (4) the benchmark results comparing round-robin, federated, centralized, and isolated training on the QoE datasets. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical benchmarking only

full rationale

The paper presents a sequential round-robin collaborative training procedure for QoE models and benchmarks it empirically against federated, centralized, and isolated baselines. No equations, fitted parameters, predictions derived from prior fits, or load-bearing self-citations appear in the provided text. The method is described as a practical protocol with no mathematical derivation chain that reduces to its own inputs by construction. This is a standard non-circular empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the domain assumption that QoE datasets contain sensitive user information that legally prevents sharing, creating the need for collaborative training; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption QoE datasets contain user sensitive information and are only collected with special consent, preventing sharing among researchers
Explicitly stated in the abstract as the reason for small QoE data lakes and the motivation for privacy-preserving methods.

pith-pipeline@v0.9.0 · 5706 in / 1131 out tokens · 26794 ms · 2026-05-25T18:48:57.985344+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

As such, data privacy is preserved but the models might have the risk of being not suﬃciently representative

INTRODUCTION Today, QoE models are developed based on isolated data lakes within the premises of researchers, as sharing of data is often not preferred or allowed. As such, data privacy is preserved but the models might have the risk of being not suﬃciently representative. There is an increasing trend that the data sets collected via QoE experiments are b...

work page
[2]

Privacy Preserving QoE Modeling using Collaborative Learning

RELA TED WORK ML algorithms such as Decision Trees, Random Forests are a few of the most commonly used techniques in the QoE literature [6]. Support Vector Machines (SVM) have been arXiv:1906.09248v2 [cs.LG] 26 Jun 2019 used earlier in QoE Modeling as they often perform well in small datasets [10]. These models are hard to use for Collab- orative Learning...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[3]

ML MODEL TRAINING MECHANISMS Table 1: Scenario comparison summary. Centralized Isolated Collaborative Data transfer High None None Weight transfer None None Low Privacy preserving No Yes Yes Training Master Workers Workers The experiments are performed in four scenarios: Cen- tralized (CL), Isolated (IL), Round-Robin Learning (RRL), and Federated Learning...

work page
[4]

low” and “high

DA TASET AND MODELING 4.1 Dataset The public web QoE dataset which is available at [4] is used in the experiments. The dataset is artiﬁcially and arbitrarily divided in three diﬀerent groups, where the as- sumption is that these three isolated groups are located at diﬀerent data centers and are not allowed to share raw data amongst each other. The users i...

work page
[5]

The experiments are performed to understand and ﬁnd out the best hyper parameters

RESULTS 5.1 Isolated Learning (IL) Two machine learning algorithms, one simple DT and one rather more complex NN, are studied with diﬀerent hyper parameters to model QoE. The experiments are performed to understand and ﬁnd out the best hyper parameters. We let the isolated models to train at best eﬀort, i.e., tuned the hyper parameters until the AUC did n...

work page
[6]

NN model accuracy outperforms the isolated decision tree models when trained either as an isolated, or in a collabora- tive manner

CONCLUSION In this paper, we present that collaborative machine learn- ing as potential tool that can be suggested in QoE modeling. NN model accuracy outperforms the isolated decision tree models when trained either as an isolated, or in a collabora- tive manner. We study Federated Learning (FL) and Round Robin Learning (RRL) to show that on par accuracy ...

work page
[7]

https://www.pytorch.org, Accessed: 2019-06-04

Pytorch. https://www.pytorch.org, Accessed: 2019-06-04

work page 2019
[8]

https://www.tensorflow.org/federated/, Accessed: 2019-06-04

Tensorﬂow Federated. https://www.tensorflow.org/federated/, Accessed: 2019-06-04

work page 2019
[9]

https: //github.com/baidu-research/baidu-allreduce/, Accessed: 2019-06-14

baidu-allreduce. https: //github.com/baidu-research/baidu-allreduce/, Accessed: 2019-06-14

work page 2019
[10]

https://www.schatz.cc/downloads/web-dataset/, Accessed: 2019-06-14

Web browsing QoE subjective test dataset V 1.0. https://www.schatz.cc/downloads/web-dataset/, Accessed: 2019-06-14

work page 2019
[11]

http://dbq.multimediatech.cz, Accessed: 2019-06-18

Qualinet database. http://dbq.multimediatech.cz, Accessed: 2019-06-18

work page 2019
[12]

Aroussi et al

S. Aroussi et al. Survey on machine learning-based QoE-QoS correlation models. International Conference on Computing, Management and Telecommunications (ComManTel), pages 200–204, 2014

work page 2014
[13]

Bonawitz et al

K. Bonawitz et al. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, pages , New York, USA. ACM., pages 1175–1191, 2017

work page 2017
[14]

Braams and C

J. Braams and C. Dwork. Diﬀerential Privacy. Springer US, pages 338–340, 2011

work page 2011
[15]

Brendan McMahan et al

H. Brendan McMahan et al. Federated learning of deep networks using model averaging. CoRR, 2018

work page 2018
[16]

Hoßfeld et al

T. Hoßfeld et al. Quantiﬁcation of Youtube QoE via crowdsourcing. In 2011 IEEE International Symposium on Multimedia, pages 494–499, 2011

work page 2011
[17]

Moritz et al

P. Moritz et al. Ray: A distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 561–577, 2018

work page 2018
[18]

Orsolic et al

I. Orsolic et al. Youtube QoE estimation from encrypted traﬃc: Comparison of test methodologies and machine learning based models. QoMEX’18, 2018

work page 2018

[1] [1]

As such, data privacy is preserved but the models might have the risk of being not suﬃciently representative

INTRODUCTION Today, QoE models are developed based on isolated data lakes within the premises of researchers, as sharing of data is often not preferred or allowed. As such, data privacy is preserved but the models might have the risk of being not suﬃciently representative. There is an increasing trend that the data sets collected via QoE experiments are b...

work page

[2] [2]

Privacy Preserving QoE Modeling using Collaborative Learning

RELA TED WORK ML algorithms such as Decision Trees, Random Forests are a few of the most commonly used techniques in the QoE literature [6]. Support Vector Machines (SVM) have been arXiv:1906.09248v2 [cs.LG] 26 Jun 2019 used earlier in QoE Modeling as they often perform well in small datasets [10]. These models are hard to use for Collab- orative Learning...

work page internal anchor Pith review Pith/arXiv arXiv 1906

[3] [3]

ML MODEL TRAINING MECHANISMS Table 1: Scenario comparison summary. Centralized Isolated Collaborative Data transfer High None None Weight transfer None None Low Privacy preserving No Yes Yes Training Master Workers Workers The experiments are performed in four scenarios: Cen- tralized (CL), Isolated (IL), Round-Robin Learning (RRL), and Federated Learning...

work page

[4] [4]

low” and “high

DA TASET AND MODELING 4.1 Dataset The public web QoE dataset which is available at [4] is used in the experiments. The dataset is artiﬁcially and arbitrarily divided in three diﬀerent groups, where the as- sumption is that these three isolated groups are located at diﬀerent data centers and are not allowed to share raw data amongst each other. The users i...

work page

[5] [5]

The experiments are performed to understand and ﬁnd out the best hyper parameters

RESULTS 5.1 Isolated Learning (IL) Two machine learning algorithms, one simple DT and one rather more complex NN, are studied with diﬀerent hyper parameters to model QoE. The experiments are performed to understand and ﬁnd out the best hyper parameters. We let the isolated models to train at best eﬀort, i.e., tuned the hyper parameters until the AUC did n...

work page

[6] [6]

NN model accuracy outperforms the isolated decision tree models when trained either as an isolated, or in a collabora- tive manner

CONCLUSION In this paper, we present that collaborative machine learn- ing as potential tool that can be suggested in QoE modeling. NN model accuracy outperforms the isolated decision tree models when trained either as an isolated, or in a collabora- tive manner. We study Federated Learning (FL) and Round Robin Learning (RRL) to show that on par accuracy ...

work page

[7] [7]

https://www.pytorch.org, Accessed: 2019-06-04

Pytorch. https://www.pytorch.org, Accessed: 2019-06-04

work page 2019

[8] [8]

https://www.tensorflow.org/federated/, Accessed: 2019-06-04

Tensorﬂow Federated. https://www.tensorflow.org/federated/, Accessed: 2019-06-04

work page 2019

[9] [9]

https: //github.com/baidu-research/baidu-allreduce/, Accessed: 2019-06-14

baidu-allreduce. https: //github.com/baidu-research/baidu-allreduce/, Accessed: 2019-06-14

work page 2019

[10] [10]

https://www.schatz.cc/downloads/web-dataset/, Accessed: 2019-06-14

Web browsing QoE subjective test dataset V 1.0. https://www.schatz.cc/downloads/web-dataset/, Accessed: 2019-06-14

work page 2019

[11] [11]

http://dbq.multimediatech.cz, Accessed: 2019-06-18

Qualinet database. http://dbq.multimediatech.cz, Accessed: 2019-06-18

work page 2019

[12] [12]

Aroussi et al

S. Aroussi et al. Survey on machine learning-based QoE-QoS correlation models. International Conference on Computing, Management and Telecommunications (ComManTel), pages 200–204, 2014

work page 2014

[13] [13]

Bonawitz et al

K. Bonawitz et al. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, pages , New York, USA. ACM., pages 1175–1191, 2017

work page 2017

[14] [14]

Braams and C

J. Braams and C. Dwork. Diﬀerential Privacy. Springer US, pages 338–340, 2011

work page 2011

[15] [15]

Brendan McMahan et al

H. Brendan McMahan et al. Federated learning of deep networks using model averaging. CoRR, 2018

work page 2018

[16] [16]

Hoßfeld et al

T. Hoßfeld et al. Quantiﬁcation of Youtube QoE via crowdsourcing. In 2011 IEEE International Symposium on Multimedia, pages 494–499, 2011

work page 2011

[17] [17]

Moritz et al

P. Moritz et al. Ray: A distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 561–577, 2018

work page 2018

[18] [18]

Orsolic et al

I. Orsolic et al. Youtube QoE estimation from encrypted traﬃc: Comparison of test methodologies and machine learning based models. QoMEX’18, 2018

work page 2018