Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields

Aizierjiang Aiersilan; Zhangfei Yang

arxiv: 2603.20999 · v2 · submitted 2026-03-22 · 💻 cs.NI · cs.CV· cs.MM· cs.RO· eess.IV

Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields

Aizierjiang Aiersilan , Zhangfei Yang This is my paper

Pith reviewed 2026-05-15 01:59 UTC · model grok-4.3

classification 💻 cs.NI cs.CVcs.MMcs.ROeess.IV

keywords 360-degree video streamingviewport predictionsemantic potential fieldstraining-freeteleoperationadaptive bitrategravitational predictionPD controller

0 comments

The pith

Semantic objects generate potential fields that predict operator gaze for training-free 360-degree video streaming.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OrbitStream as a training-free system for adaptive 360-degree video streaming in teleoperation. It casts viewport prediction as a Gravitational Viewport Prediction problem in which semantic objects produce potential fields that draw operator attention. A saturation-based proportional-derivative controller then regulates the playback buffer and bitrate selection over changing wireless links. The method reports 94.7 percent zero-shot prediction accuracy on object-rich traces and ranks second in quality-of-experience among twelve algorithms across thousands of simulations, all without offline training or user profiling.

Core claim

OrbitStream formulates viewport prediction as Gravitational Viewport Prediction where semantic objects generate potential fields that attract operator gaze, and pairs it with a Saturation-Based Proportional-Derivative Controller for buffer regulation, delivering 94.7 percent zero-shot accuracy and second-place QoE of 2.71 among twelve tested algorithms while keeping decision latency at 1.01 ms.

What carries the argument

Gravitational Viewport Prediction (GVP) in which semantic objects create potential fields that attract gaze, combined with a saturation-based PD controller that adjusts bitrate to maintain buffer stability.

If this is right

The system can be deployed immediately in safety-critical teleoperation without collecting user data or running offline training.
Decisions remain interpretable because they derive directly from visible object locations rather than black-box neural networks.
The low 1.01 ms decision latency supports real-time operation on resource-constrained hardware.
Buffer regulation keeps rebuffering low while still ranking near the top in overall quality of experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same object-driven attraction model could be tested in virtual-reality navigation tasks where attention also follows semantic landmarks.
If potential fields capture stable cross-user gaze attractors, the approach might reduce the need for personalization in other immersive media applications.
Extending the fields to incorporate motion or task-specific semantics would be a direct next measurement on the same teleoperation traces.

Load-bearing premise

Semantic objects in the scene generate potential fields whose attraction reliably predicts operator gaze patterns in a zero-shot manner across different users and environments without any fitted parameters or user profiling.

What would settle it

A controlled study in which recorded gaze trajectories from multiple operators in the same object-rich scenes deviate substantially from the paths predicted by the semantic potential fields.

Figures

Figures reproduced from arXiv: 2603.20999 by Aizierjiang Aiersilan, Zhangfei Yang.

**Figure 2.** Figure 2: Nonlinear PD buffer controller. The control loop computes the buffer error [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: QoE comparison across twelve algorithms (O [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Cumulative distribution function (CDF) of average raw QoE across [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Adaptive 360{\deg} video streaming for teleoperation faces two coupled challenges: viewport prediction under uncertain gaze patterns and bitrate adaptation over fluctuating wireless channels. While Deep Reinforcement Learning (DRL) methods achieve high Quality of Experience (QoE), their lack of interpretability and dependence on offline training limit deployment in safety-critical systems. We propose OrbitStream, a training-free framework that formulates viewport prediction as a Gravitational Viewport Prediction (GVP) problem, where semantic objects generate potential fields that attract operator gaze, and employs a Saturation-Based Proportional-Derivative (PD) Controller for buffer regulation. On object-rich teleoperation traces, OrbitStream achieves 94.7% zero-shot viewport prediction accuracy without user-specific profiling, approaching trajectory-extrapolation baselines (~98.5%). Across 3,600 Monte Carlo simulations, it ranks second among 12 algorithms (QoE 2.71 vs. BOLA-E's 2.80), outperforming FastMPC (1.84), with 1.01 ms decision latency and minimal rebuffering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OrbitStream gives a training-free semantic potential field method for viewport prediction in 360 teleop streaming that hits competitive numbers without training, but the abstract leaves the core equations too vague to judge fully.

read the letter

The main takeaway is that this paper introduces OrbitStream, a training-free system for adaptive 360-degree video streaming in teleoperation. It models viewport prediction using gravitational potential fields from semantic objects in the scene and controls the buffer with a saturation-based PD controller. What stands out as new is the gravitational viewport prediction approach. Instead of relying on deep reinforcement learning or simple trajectory extrapolation, it treats detected objects as sources of attraction that pull the predicted gaze. This gives an interpretable way to do zero-shot prediction without any user profiling or offline training. The paper shows it reaching 94.7 percent accuracy on object-rich traces, which is close to the 98.5 percent from extrapolation baselines. In the QoE evaluations across thousands of Monte Carlo runs, it comes in second behind BOLA-E but well ahead of FastMPC, all while keeping decision latency at about one millisecond and low rebuffering. The work does well in highlighting the limitations of DRL for safety-critical applications and offering a practical alternative that is fast and transparent. The simulation results are presented clearly enough to show the trade-offs. The soft spots are mostly around missing details. The abstract does not include the exact equations for computing the potential fields, such as field strength, object weighting, or normalization steps. This makes it difficult to verify how parameter-free the method truly is or whether any values were tuned on the evaluation data. There are also no error bars or variance measures on the accuracy and QoE numbers, which would help judge if the rankings are robust. These are not fatal issues, but they leave some room for doubt until the full derivations are checked. This paper is for researchers focused on adaptive streaming for remote operation and telepresence, particularly those concerned with interpretability and quick deployment. A reader who works on viewport prediction or buffer control in wireless video would get concrete ideas and comparison data from it. I think it deserves a serious referee. The core claims are backed by extensive simulations, and the method is distinct enough to warrant review even if some polishing on the math presentation is needed.

Referee Report

2 major / 3 minor

Summary. The paper proposes OrbitStream, a training-free adaptive 360° video streaming framework for teleoperation. Viewport prediction is formulated as a Gravitational Viewport Prediction (GVP) problem in which detected semantic objects generate attractive potential fields; a Saturation-Based Proportional-Derivative controller then regulates the playback buffer. On object-rich teleoperation traces the method reports 94.7 % zero-shot viewport accuracy (vs. ~98.5 % for trajectory-extrapolation baselines) and, across 3 600 Monte-Carlo simulations, achieves the second-highest QoE (2.71) among 12 algorithms while incurring 1.01 ms decision latency and negligible rebuffering.

Significance. If the GVP construction is genuinely parameter-free and generalizes across users and environments, the work supplies an interpretable, zero-shot alternative to DRL-based viewport predictors that is attractive for safety-critical teleoperation. The reported latency and QoE ranking relative to BOLA-E and FastMPC would constitute a practical advance for real-time 360° streaming under fluctuating wireless conditions.

major comments (2)

[§3.2] §3.2 (GVP formulation): the abstract and method summary assert a parameter-free gravitational potential field, yet no explicit equations are supplied for field strength, object weighting, saturation thresholds, or normalization. Without these definitions it is impossible to confirm that the reported 94.7 % accuracy is achieved without any fitted parameters or trace-specific tuning.
[§5] §5 (Monte-Carlo evaluation): QoE values are given to two decimal places (2.71 vs. 2.80) but no standard deviations, confidence intervals, or statistical significance tests accompany the 3 600-run ranking. This omission weakens the claim that OrbitStream is reliably second-best among the twelve algorithms.

minor comments (3)

[Abstract] Abstract: the parenthetical baseline accuracy (~98.5 %) should be accompanied by the exact baseline name and the precise viewport-prediction metric used for comparison.
[§4] Notation: the saturation function inside the PD controller is referenced but never defined; a short equation or pseudocode block would remove ambiguity.
[Figures 4-6] Figure captions: several simulation plots lack axis labels or legend entries for the twelve competing algorithms, reducing readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify the manuscript. We address each major comment below and will incorporate revisions to strengthen the presentation of the GVP formulation and the statistical rigor of the evaluation.

read point-by-point responses

Referee: [§3.2] §3.2 (GVP formulation): the abstract and method summary assert a parameter-free gravitational potential field, yet no explicit equations are supplied for field strength, object weighting, saturation thresholds, or normalization. Without these definitions it is impossible to confirm that the reported 94.7 % accuracy is achieved without any fitted parameters or trace-specific tuning.

Authors: We appreciate the referee highlighting this point. The submitted manuscript described the GVP construction at a high level but did not include the full set of governing equations. In the revised version we will add the explicit definitions to §3.2: the potential field generated by each semantic object o is given by Φ(o) = −G·w(o)/d(o)^2 with saturation at a fixed threshold S, where w(o) is a deterministic semantic weight derived from object class (no learned parameters), d(o) is Euclidean distance in the equirectangular projection, and the resulting field is normalized by the sum of all active fields before viewport selection. These fixed, non-tuned expressions are used uniformly across all traces, confirming the zero-shot, parameter-free claim. The 94.7 % accuracy figure was obtained with exactly these definitions. revision: yes
Referee: [§5] §5 (Monte-Carlo evaluation): QoE values are given to two decimal places (2.71 vs. 2.80) but no standard deviations, confidence intervals, or statistical significance tests accompany the 3 600-run ranking. This omission weakens the claim that OrbitStream is reliably second-best among the twelve algorithms.

Authors: We agree that the current presentation of the Monte-Carlo results is incomplete. In the revised manuscript we will augment §5 with (i) standard deviations for every QoE score, (ii) 95 % confidence intervals computed over the 3 600 independent runs, and (iii) pairwise statistical significance tests (paired t-tests with Bonferroni correction) between OrbitStream and the top-ranked algorithm as well as the next-best methods. These additions will substantiate that the reported second-place ranking is statistically reliable rather than an artifact of reporting only point estimates. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper explicitly frames OrbitStream as a training-free, parameter-free method that derives viewport prediction from semantic object potential fields and applies a Saturation-Based PD Controller for buffer control. The 94.7% zero-shot accuracy and QoE rankings are reported as outcomes of Monte Carlo simulations on external traces rather than quantities fitted or redefined from the same inputs. No equations or steps in the abstract reduce a claimed prediction back to a fitted parameter or self-citation by construction; the central GVP formulation is presented as an independent modeling choice whose validity is tested externally. The derivation chain therefore remains self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unstated assumption that semantic object detection produces stable potential fields whose parameters do not require per-user fitting and that the PD controller saturation logic generalizes across wireless traces.

axioms (1)

domain assumption Semantic objects in the video frame can be reliably detected and assigned potential values that attract gaze in a manner independent of individual operator behavior.
Invoked by the definition of Gravitational Viewport Prediction in the abstract.

invented entities (1)

Gravitational Viewport Prediction (GVP) potential fields no independent evidence
purpose: Model operator gaze attraction toward semantic objects without training data
New modeling construct introduced to replace learned predictors; no independent falsifiable prediction (e.g., measured field strength) is supplied in the abstract.

pith-pipeline@v0.9.0 · 5497 in / 1363 out tokens · 37979 ms · 2026-05-15T01:59:46.546494+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

formulates viewport prediction as a Gravitational Viewport Prediction (GVP) problem, where semantic objects generate potential fields that attract operator gaze... training-free... closed-form equations with no learned parameters

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

Millimeter wave mobile communications for 5g cellular: It will work!

T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y . Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5g cellular: It will work!”IEEE access, vol. 1, pp. 335–349, 2013

work page 2013
[2]

What should 6g be?

S. Dang, O. Amin, B. Shihada, and M.-S. Alouini, “What should 6g be?”Nature Electronics, vol. 3, no. 1, pp. 20–29, 2020

work page 2020
[3]

Teleoperation: The holy grail to solve problems of automated driving? sure, but latency matters,

S. Neumeier, P. Wintersberger, A.-K. Frison, A. Becher, C. Facchi, and A. Riener, “Teleoperation: The holy grail to solve problems of automated driving? sure, but latency matters,” inProceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, 2019, pp. 186–197

work page 2019
[4]

Transatlantic robot-assisted telesurgery,

J. Marescaux, J. Leroy, M. Gagner, F. Rubino, D. Mutter, M. Vix, S. E. Butner, and M. K. Smith, “Transatlantic robot-assisted telesurgery,” Nature, vol. 413, no. 6854, pp. 379–380, 2001

work page 2001
[5]

Bilateral teleoperation: An historical survey,

P. F. Hokayem and M. W. Spong, “Bilateral teleoperation: An historical survey,”Automatica, vol. 42, no. 12, pp. 2035–2057, 2006

work page 2035
[6]

Toward a theory of situation awareness in dynamic systems,

M. R. Endsley, “Toward a theory of situation awareness in dynamic systems,” inSituational awareness. Routledge, 2017, pp. 9–42

work page 2017
[7]

A survey on quality of experience of http adaptive streaming,

M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hoßfeld, and P. Tran-Gia, “A survey on quality of experience of http adaptive streaming,”IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 469–492, 2014

work page 2014
[8]

Optimizing 360 video delivery over cellular networks,

F. Qian, L. Ji, B. Han, and V . Gopalakrishnan, “Optimizing 360 video delivery over cellular networks,” inProceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, 2016, pp. 1–6

work page 2016
[9]

Viewport- adaptive navigable 360-degree video delivery,

X. Corbillon, G. Simon, A. Devlic, and J. Chakareski, “Viewport- adaptive navigable 360-degree video delivery,” in2017 IEEE interna- tional conference on communications (ICC). IEEE, 2017, pp. 1–7

work page 2017
[10]

An optimal tile-based approach for viewport-adaptive 360-degree video streaming,

D. V . Nguyen, H. T. Tran, A. T. Pham, and T. C. Thang, “An optimal tile-based approach for viewport-adaptive 360-degree video streaming,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 1, pp. 29–42, 2019

work page 2019
[11]

Adaptive 360-degree streaming: Op- timizing with multi-window and stochastic viewport prediction,

W. Feng, S. Wang, and Y . Dai, “Adaptive 360-degree streaming: Op- timizing with multi-window and stochastic viewport prediction,”IEEE Transactions on Mobile Computing, 2025

work page 2025
[12]

Fixation prediction for 360 video streaming in head-mounted virtual reality,

C.-L. Fan, J. Lee, W.-C. Lo, C.-Y . Huang, K.-T. Chen, and C.-H. Hsu, “Fixation prediction for 360 video streaming in head-mounted virtual reality,” inProceedings of the 27th workshop on network and operating systems support for digital audio and video, 2017, pp. 67–72

work page 2017
[13]

ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation,

G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, Y . Kwon, K. Michael, J. Fang, Z. Yifu, C. Wong, D. Monteset al., “ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation,”Zenodo, 2022

work page 2022
[14]

A buffer-based approach to rate adaptation: Evidence from a large video streaming service,

T.-Y . Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson, “A buffer-based approach to rate adaptation: Evidence from a large video streaming service,” inProceedings of the 2014 ACM conference on SIGCOMM, 2014, pp. 187–198

work page 2014
[15]

Bola: Near-optimal bitrate adaptation for online videos,

K. Spiteri, R. Urgaonkar, and R. K. Sitaraman, “Bola: Near-optimal bitrate adaptation for online videos,”IEEE/ACM transactions on net- working, vol. 28, no. 4, pp. 1698–1711, 2020

work page 2020
[16]

360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming,

L. Xie, Z. Xu, Y . Ban, X. Zhang, and Z. Guo, “360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 315–323

work page 2017
[17]

An http/2-based adaptive streaming framework for 360 virtual reality videos,

S. Petrangeli, V . Swaminathan, M. Hosseini, and F. De Turck, “An http/2-based adaptive streaming framework for 360 virtual reality videos,” inProceedings of the 25th ACM international conference on Multimedia, 2017, pp. 306–314

work page 2017
[18]

A control-theoretic approach for dynamic adaptive video streaming over http,

X. Yin, A. Jindal, V . Sekar, and B. Sinopoli, “A control-theoretic approach for dynamic adaptive video streaming over http,” inProceed- ings of the 2015 ACM conference on special interest group on data communication, 2015, pp. 325–338

work page 2015
[19]

Toward a principled framework to design dynamic adaptive streaming algorithms over http,

X. Yin, V . Sekar, and B. Sinopoli, “Toward a principled framework to design dynamic adaptive streaming algorithms over http,” inProceedings of the 13th ACM Workshop on Hot Topics in Networks, 2014, pp. 1–7

work page 2014
[20]

Neural adaptive video stream- ing with pensieve,

H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video stream- ing with pensieve,” inProceedings of the conference of the ACM special interest group on data communication, 2017, pp. 197–210

work page 2017
[21]

Learning in situ: a randomized experiment in video streaming,

F. Y . Yan, H. Ayers, C. Zhu, S. Fouladi, J. Hong, K. Zhang, P. Levis, and K. Winstein, “Learning in situ: a randomized experiment in video streaming,” in17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 495–511

work page 2020
[22]

Qdash: a qoe-aware dash system,

R. K. Mok, X. Luo, E. W. Chan, and R. K. Chang, “Qdash: a qoe-aware dash system,” inProceedings of the 3rd multimedia systems conference, 2012, pp. 11–22

work page 2012
[23]

Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices,

F. Qian, B. Han, Q. Xiao, and V . Gopalakrishnan, “Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices,” in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, 2018, pp. 99–114

work page 2018
[24]

Pano: Optimizing 360 video streaming with a better understanding of quality perception,

Y . Guan, C. Zheng, X. Zhang, Z. Guo, and J. Jiang, “Pano: Optimizing 360 video streaming with a better understanding of quality perception,” inProceedings of the ACM Special Interest Group on Data Communi- cation, 2019, pp. 394–407

work page 2019
[25]

The prediction of head and eye movement for 360 degree images,

Y . Zhu, G. Zhai, and X. Min, “The prediction of head and eye movement for 360 degree images,”Signal Processing: Image Communication, vol. 69, pp. 15–25, 2018

work page 2018
[26]

Vad360: Viewport aware dynamic 360-degree video frame tiling,

C. Kattadige and K. Thilakarathna, “Vad360: Viewport aware dynamic 360-degree video frame tiling,”arXiv preprint arXiv:2105.11563, 2021

work page arXiv 2021
[27]

Seaware: Semantic aware view prediction system for 360-degree video streaming,

J. Park, M. Wu, K.-Y . Lee, B. Chen, K. Nahrstedt, M. Zink, and R. Sitaraman, “Seaware: Semantic aware view prediction system for 360-degree video streaming,” in2020 IEEE International Symposium on Multimedia (ISM). IEEE, 2020, pp. 57–64

work page 2020
[28]

Viewport prediction with cross modal multiscale transformer for 360° video streaming,

Y . Tian, Y . Zhong, Y . Han, and F. Chen, “Viewport prediction with cross modal multiscale transformer for 360° video streaming,”Scientific Reports, vol. 15, no. 1, p. 30346, 2025

work page 2025
[29]

Viewport-aware adaptive 360 video streaming using tiles for virtual reality,

C. Ozcinar, A. De Abreu, and A. Smolic, “Viewport-aware adaptive 360 video streaming using tiles for virtual reality,” in2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 2174–2178

work page 2017
[30]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

work page 2016
[31]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,”IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649– 1668, 2012

work page 2012
[32]

Visual field shape and foraging ecology in diurnal raptors,

S. Potier, O. Duriez, G. B. Cunningham, V . Bonhomme, C. O’rourke, E. Fern´andez-Juricic, and F. Bonadonna, “Visual field shape and foraging ecology in diurnal raptors,”Journal of Experimental Biology, vol. 221, no. 14, p. jeb177295, 2018

work page 2018
[33]

Virtues of the haversine,

R. W. Sinnott, “Virtues of the haversine,”Sky and telescope, vol. 68, no. 2, p. 158, 1984

work page 1984
[34]

D. C. Montgomery and G. C. Runger,Applied statistics and probability for engineers. John wiley & sons, 2010

work page 2010
[35]

Pid control system analysis, design, and technology,

K. H. Ang, G. Chong, and Y . Li, “Pid control system analysis, design, and technology,”IEEE transactions on control systems technology, vol. 13, no. 4, pp. 559–576, 2005

work page 2005
[36]

Optimum settings for automatic con- trollers,

J. G. Ziegler and N. B. Nichols, “Optimum settings for automatic con- trollers,”Transactions of the American society of mechanical engineers, vol. 64, no. 8, pp. 759–765, 1942

work page 1942
[37]

H. K. Khalil and J. W. Grizzle,Nonlinear systems. Prentice hall Upper Saddle River, NJ, 2002, vol. 3

work page 2002
[38]

Gaze prediction in dynamic 360 immersive videos,

Y . Xu, Y . Dong, J. Wu, Z. Sun, Z. Shi, J. Yu, and S. Gao, “Gaze prediction in dynamic 360 immersive videos,” inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5333–5342

work page 2018
[39]

A dataset for exploring user behaviors in vr spherical video streaming,

C. Wu, Z. Tan, Z. Wang, and S. Yang, “A dataset for exploring user behaviors in vr spherical video streaming,” inProceedings of the 8th ACM on Multimedia Systems Conference, 2017, pp. 193–198

work page 2017

[1] [1]

Millimeter wave mobile communications for 5g cellular: It will work!

T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y . Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5g cellular: It will work!”IEEE access, vol. 1, pp. 335–349, 2013

work page 2013

[2] [2]

What should 6g be?

S. Dang, O. Amin, B. Shihada, and M.-S. Alouini, “What should 6g be?”Nature Electronics, vol. 3, no. 1, pp. 20–29, 2020

work page 2020

[3] [3]

Teleoperation: The holy grail to solve problems of automated driving? sure, but latency matters,

S. Neumeier, P. Wintersberger, A.-K. Frison, A. Becher, C. Facchi, and A. Riener, “Teleoperation: The holy grail to solve problems of automated driving? sure, but latency matters,” inProceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, 2019, pp. 186–197

work page 2019

[4] [4]

Transatlantic robot-assisted telesurgery,

J. Marescaux, J. Leroy, M. Gagner, F. Rubino, D. Mutter, M. Vix, S. E. Butner, and M. K. Smith, “Transatlantic robot-assisted telesurgery,” Nature, vol. 413, no. 6854, pp. 379–380, 2001

work page 2001

[5] [5]

Bilateral teleoperation: An historical survey,

P. F. Hokayem and M. W. Spong, “Bilateral teleoperation: An historical survey,”Automatica, vol. 42, no. 12, pp. 2035–2057, 2006

work page 2035

[6] [6]

Toward a theory of situation awareness in dynamic systems,

M. R. Endsley, “Toward a theory of situation awareness in dynamic systems,” inSituational awareness. Routledge, 2017, pp. 9–42

work page 2017

[7] [7]

A survey on quality of experience of http adaptive streaming,

M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hoßfeld, and P. Tran-Gia, “A survey on quality of experience of http adaptive streaming,”IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 469–492, 2014

work page 2014

[8] [8]

Optimizing 360 video delivery over cellular networks,

F. Qian, L. Ji, B. Han, and V . Gopalakrishnan, “Optimizing 360 video delivery over cellular networks,” inProceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, 2016, pp. 1–6

work page 2016

[9] [9]

Viewport- adaptive navigable 360-degree video delivery,

X. Corbillon, G. Simon, A. Devlic, and J. Chakareski, “Viewport- adaptive navigable 360-degree video delivery,” in2017 IEEE interna- tional conference on communications (ICC). IEEE, 2017, pp. 1–7

work page 2017

[10] [10]

An optimal tile-based approach for viewport-adaptive 360-degree video streaming,

D. V . Nguyen, H. T. Tran, A. T. Pham, and T. C. Thang, “An optimal tile-based approach for viewport-adaptive 360-degree video streaming,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 1, pp. 29–42, 2019

work page 2019

[11] [11]

Adaptive 360-degree streaming: Op- timizing with multi-window and stochastic viewport prediction,

W. Feng, S. Wang, and Y . Dai, “Adaptive 360-degree streaming: Op- timizing with multi-window and stochastic viewport prediction,”IEEE Transactions on Mobile Computing, 2025

work page 2025

[12] [12]

Fixation prediction for 360 video streaming in head-mounted virtual reality,

C.-L. Fan, J. Lee, W.-C. Lo, C.-Y . Huang, K.-T. Chen, and C.-H. Hsu, “Fixation prediction for 360 video streaming in head-mounted virtual reality,” inProceedings of the 27th workshop on network and operating systems support for digital audio and video, 2017, pp. 67–72

work page 2017

[13] [13]

ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation,

G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, Y . Kwon, K. Michael, J. Fang, Z. Yifu, C. Wong, D. Monteset al., “ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation,”Zenodo, 2022

work page 2022

[14] [14]

A buffer-based approach to rate adaptation: Evidence from a large video streaming service,

T.-Y . Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson, “A buffer-based approach to rate adaptation: Evidence from a large video streaming service,” inProceedings of the 2014 ACM conference on SIGCOMM, 2014, pp. 187–198

work page 2014

[15] [15]

Bola: Near-optimal bitrate adaptation for online videos,

K. Spiteri, R. Urgaonkar, and R. K. Sitaraman, “Bola: Near-optimal bitrate adaptation for online videos,”IEEE/ACM transactions on net- working, vol. 28, no. 4, pp. 1698–1711, 2020

work page 2020

[16] [16]

360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming,

L. Xie, Z. Xu, Y . Ban, X. Zhang, and Z. Guo, “360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming,” in Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 315–323

work page 2017

[17] [17]

An http/2-based adaptive streaming framework for 360 virtual reality videos,

S. Petrangeli, V . Swaminathan, M. Hosseini, and F. De Turck, “An http/2-based adaptive streaming framework for 360 virtual reality videos,” inProceedings of the 25th ACM international conference on Multimedia, 2017, pp. 306–314

work page 2017

[18] [18]

A control-theoretic approach for dynamic adaptive video streaming over http,

X. Yin, A. Jindal, V . Sekar, and B. Sinopoli, “A control-theoretic approach for dynamic adaptive video streaming over http,” inProceed- ings of the 2015 ACM conference on special interest group on data communication, 2015, pp. 325–338

work page 2015

[19] [19]

Toward a principled framework to design dynamic adaptive streaming algorithms over http,

X. Yin, V . Sekar, and B. Sinopoli, “Toward a principled framework to design dynamic adaptive streaming algorithms over http,” inProceedings of the 13th ACM Workshop on Hot Topics in Networks, 2014, pp. 1–7

work page 2014

[20] [20]

Neural adaptive video stream- ing with pensieve,

H. Mao, R. Netravali, and M. Alizadeh, “Neural adaptive video stream- ing with pensieve,” inProceedings of the conference of the ACM special interest group on data communication, 2017, pp. 197–210

work page 2017

[21] [21]

Learning in situ: a randomized experiment in video streaming,

F. Y . Yan, H. Ayers, C. Zhu, S. Fouladi, J. Hong, K. Zhang, P. Levis, and K. Winstein, “Learning in situ: a randomized experiment in video streaming,” in17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 495–511

work page 2020

[22] [22]

Qdash: a qoe-aware dash system,

R. K. Mok, X. Luo, E. W. Chan, and R. K. Chang, “Qdash: a qoe-aware dash system,” inProceedings of the 3rd multimedia systems conference, 2012, pp. 11–22

work page 2012

[23] [23]

Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices,

F. Qian, B. Han, Q. Xiao, and V . Gopalakrishnan, “Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices,” in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, 2018, pp. 99–114

work page 2018

[24] [24]

Pano: Optimizing 360 video streaming with a better understanding of quality perception,

Y . Guan, C. Zheng, X. Zhang, Z. Guo, and J. Jiang, “Pano: Optimizing 360 video streaming with a better understanding of quality perception,” inProceedings of the ACM Special Interest Group on Data Communi- cation, 2019, pp. 394–407

work page 2019

[25] [25]

The prediction of head and eye movement for 360 degree images,

Y . Zhu, G. Zhai, and X. Min, “The prediction of head and eye movement for 360 degree images,”Signal Processing: Image Communication, vol. 69, pp. 15–25, 2018

work page 2018

[26] [26]

Vad360: Viewport aware dynamic 360-degree video frame tiling,

C. Kattadige and K. Thilakarathna, “Vad360: Viewport aware dynamic 360-degree video frame tiling,”arXiv preprint arXiv:2105.11563, 2021

work page arXiv 2021

[27] [27]

Seaware: Semantic aware view prediction system for 360-degree video streaming,

J. Park, M. Wu, K.-Y . Lee, B. Chen, K. Nahrstedt, M. Zink, and R. Sitaraman, “Seaware: Semantic aware view prediction system for 360-degree video streaming,” in2020 IEEE International Symposium on Multimedia (ISM). IEEE, 2020, pp. 57–64

work page 2020

[28] [28]

Viewport prediction with cross modal multiscale transformer for 360° video streaming,

Y . Tian, Y . Zhong, Y . Han, and F. Chen, “Viewport prediction with cross modal multiscale transformer for 360° video streaming,”Scientific Reports, vol. 15, no. 1, p. 30346, 2025

work page 2025

[29] [29]

Viewport-aware adaptive 360 video streaming using tiles for virtual reality,

C. Ozcinar, A. De Abreu, and A. Smolic, “Viewport-aware adaptive 360 video streaming using tiles for virtual reality,” in2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 2174–2178

work page 2017

[30] [30]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

work page 2016

[31] [31]

Overview of the high efficiency video coding (hevc) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,”IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649– 1668, 2012

work page 2012

[32] [32]

Visual field shape and foraging ecology in diurnal raptors,

S. Potier, O. Duriez, G. B. Cunningham, V . Bonhomme, C. O’rourke, E. Fern´andez-Juricic, and F. Bonadonna, “Visual field shape and foraging ecology in diurnal raptors,”Journal of Experimental Biology, vol. 221, no. 14, p. jeb177295, 2018

work page 2018

[33] [33]

Virtues of the haversine,

R. W. Sinnott, “Virtues of the haversine,”Sky and telescope, vol. 68, no. 2, p. 158, 1984

work page 1984

[34] [34]

D. C. Montgomery and G. C. Runger,Applied statistics and probability for engineers. John wiley & sons, 2010

work page 2010

[35] [35]

Pid control system analysis, design, and technology,

K. H. Ang, G. Chong, and Y . Li, “Pid control system analysis, design, and technology,”IEEE transactions on control systems technology, vol. 13, no. 4, pp. 559–576, 2005

work page 2005

[36] [36]

Optimum settings for automatic con- trollers,

J. G. Ziegler and N. B. Nichols, “Optimum settings for automatic con- trollers,”Transactions of the American society of mechanical engineers, vol. 64, no. 8, pp. 759–765, 1942

work page 1942

[37] [37]

H. K. Khalil and J. W. Grizzle,Nonlinear systems. Prentice hall Upper Saddle River, NJ, 2002, vol. 3

work page 2002

[38] [38]

Gaze prediction in dynamic 360 immersive videos,

Y . Xu, Y . Dong, J. Wu, Z. Sun, Z. Shi, J. Yu, and S. Gao, “Gaze prediction in dynamic 360 immersive videos,” inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5333–5342

work page 2018

[39] [39]

A dataset for exploring user behaviors in vr spherical video streaming,

C. Wu, Z. Tan, Z. Wang, and S. Yang, “A dataset for exploring user behaviors in vr spherical video streaming,” inProceedings of the 8th ACM on Multimedia Systems Conference, 2017, pp. 193–198

work page 2017