pith. sign in

arxiv: 1906.08575 · v1 · pith:4LECAE5Unew · submitted 2019-06-20 · 💻 cs.MM · eess.IV

Probabilistic Tile Visibility-Based Server-Side Rate Adaptation for Adaptive 360-Degree Video Streaming

Pith reviewed 2026-05-25 19:17 UTC · model grok-4.3

classification 💻 cs.MM eess.IV
keywords 360-degree video streamingtile-based adaptationrate adaptationviewpoint predictionvisibility probabilityserver-side optimizationCNN prediction
0
0 comments X

The pith

A server-side optimization using CNN viewpoint predictions and Laplace-modeled tile visibility probabilities minimizes distortion for multiple users in 360-degree video streaming.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method for server-side rate adaptation in tile-based adaptive 360-degree video streaming when multiple users share limited transmission resources. It trains a CNN to predict future viewpoints, fits a Laplace distribution to the prediction errors, projects the viewport to a 2-D plane, and computes per-tile visibility probabilities to label tiles as viewport, marginal, or invisible. These probabilities feed into a nonlinear discrete optimization that minimizes total video distortion across users plus intra-user quality gaps between viewport and marginal tiles, subject to bandwidth limits and viewport needs. A steepest-descent procedure, started from the solution of the continuous relaxation, produces a near-optimal allocation that the experiments show beats prior rate-adaptation schemes.

Core claim

By mapping CNN-predicted viewpoints through planar projection and Laplace error probabilities to obtain tile visibility values, classifying tiles, and solving the resulting multi-user nonlinear discrete optimization with a steepest-descent method initialized at the continuous-relaxation optimum, the algorithm reaches a near-optimal point that reduces overall received distortion and viewport-to-marginal quality differences while respecting transmission capacities.

What carries the argument

The steepest-descent solver for the nonlinear discrete optimization of tile rates, initialized from the continuous relaxation and driven by per-tile visibility probabilities derived from the planar projection and Laplace prediction-error model.

If this is right

  • The algorithm achieves a near-optimal solution to the multi-user tile rate allocation problem.
  • It reduces overall received video distortion across all users.
  • It decreases the quality difference between viewport and marginal tiles for each user.
  • It respects transmission capacity constraints and individual viewport requirements.
  • It outperforms existing rate adaptation schemes for tile-based adaptive 360-video streaming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same visibility-probability pipeline could be tested on client-side prefetching decisions when the server is not the sole allocator.
  • Replacing the Laplace model with empirical error histograms from other prediction networks would test how sensitive the near-optimality result is to the distributional assumption.
  • The classification into viewport, marginal, and invisible tiles offers a natural way to prioritize tiles in other viewport-aware streaming formats such as volumetric video.

Load-bearing premise

The Laplace distribution accurately characterizes the probability distribution of the CNN viewpoint prediction error, allowing reliable derivation of per-tile visibility probabilities from the planar projection.

What would settle it

Collect real user viewport traces from 360-video sessions, feed the actual error distribution into the visibility calculation, and check whether the algorithm still produces lower total distortion than existing schemes or deviates markedly from the continuous-relaxation bound.

Figures

Figures reproduced from arXiv: 1906.08575 by Chenglin Li, Chengming Liu, Eckehard Steinbach, Hongkai Xiong, Junni Zou, Qin Yang.

Figure 1
Figure 1. Figure 1: (a) Head rotation angles; and (b) the position of viewpoint is represented by latitude [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CNN-based viewing angle prediction model. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Viewport and viewport tile region in the 2-D projection plane. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) The FoVs of the HMD are α horizontally and β vertically; (b) mapping the viewport plane ABCD onto the surface EF GH on the sphere. Actually, the image plane of the HMD can be considered as a 2-D plane with the region of the plane constrained by the FoV of the HMD. As an example, [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Calculation of the boundary of the user’s viewport. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Three-view drawing of the viewport, where red lines represent the viewport boundary, yellow lines represent the [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prediction error distribution of pitch and yaw angles. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Tile classification of a 2-D planar frame [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: An example of the lower convex hull of Q Ru j (B) with the j-th element fixed to different values of Ru. have Q R7 j (B) < QR6 j (B) < · · · < QR1 j (B) for B(R) > 1.6 Mbps, which indicates that the cross-over condition holds. Also, the cross-over ordering condition is verified since Bc (R1, R2, j) < Bc (R2, R3, j) < · · · < Bc (R6, R7, j). The reachability condition may not be guaranteed as there are too … view at source ↗
Figure 10
Figure 10. Figure 10: (a) Tiling in a frame of Driving in Country, where tiles inside the red rectangle represent predicted viewport tiles for a user and tiles inside the blue rectangles represent marginal tiles. Illustration of the achievable subjective quality for two tiles with (b) baseline algorithm, (c) greedy algorithm, and (d) proposed algorithm. different algorithms. Here, the server’s transmission capacity is set to C… view at source ↗
Figure 11
Figure 11. Figure 11: Representation rate indices of tiles received by two users when using the proposed algorithm, where [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Representation rate indices of tiles received by two users when using the proposed algorithm, where [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Average WS-PSNR of all users vs. server’s transmission capacity for (a) [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Average instability index of all users vs. server’s transmission capacity for (a) [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Comparison with the steepest descent solution without initialization, and the globally optimal solution achieved by [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗
read the original abstract

In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neural network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distribution model is utilized to characterize the probability distribution of the prediction error. Given the predicted viewpoint, we then map the viewport in the spherical space into its corresponding planar projection in the 2-D plane, and further derive the visibility probability of each tile based on the planar projection and the prediction error probability. According to the visibility probability, tiles are classified as viewport, marginal and invisible tiles. The server-side tile rate allocation problem for multiple users is then formulated as a non-linear discrete optimization problem to minimize the overall received video distortion of all users and the quality difference between the viewport and marginal tiles of each user, subject to the transmission capacity constraints and users' specific viewport requirements. We develop a steepest descent algorithm to solve this non-linear discrete optimization problem, by initializing the feasible starting point in accordance with the optimal solution of its continuous relaxation. Extensive experimental results show that the proposed algorithm can achieve a near-optimal solution, and outperforms the existing rate adaptation schemes for tile-based adaptive 360-video streaming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper studies server-side rate adaptation for multi-user tile-based 360-degree video streaming. It introduces a CNN viewpoint predictor, models prediction error via a Laplace distribution, derives per-tile visibility probabilities via planar projection of the spherical viewport, classifies tiles as viewport/marginal/invisible, and formulates a non-linear discrete optimization that minimizes aggregate distortion plus per-user viewport-marginal quality gaps subject to capacity and viewport constraints. The problem is solved by steepest descent initialized from the continuous relaxation; the abstract claims the method reaches a near-optimal solution and outperforms prior rate-adaptation schemes.

Significance. If the visibility probabilities are verifiably accurate and the optimization produces the claimed gains, the work supplies a concrete mechanism for uncertainty-aware tile allocation that could improve resource efficiency in contended 360-video sessions. The initialization technique is standard and non-circular.

major comments (1)
  1. [Abstract paragraph 3 / prediction-error model] Abstract paragraph 3 and the subsequent model section: the per-tile visibility probabilities (and therefore the entire optimization objective) rest on the assumption that CNN prediction error follows a Laplace distribution. No fitting procedure, Kolmogorov-Smirnov or likelihood-ratio test against alternatives, or empirical histogram comparison is supplied. Because mis-calibration of these probabilities directly alters which tiles are labeled viewport/marginal/invisible and changes the objective that the steepest-descent solver minimizes, the reported outperformance cannot be attributed to the proposed algorithm until this modeling choice is validated on the same datasets used for the rate-allocation experiments.
minor comments (1)
  1. [Abstract] The abstract asserts 'near-optimal solution' and 'outperforms the existing rate adaptation schemes' yet supplies no numerical values, baselines, or confidence intervals. A one-sentence quantitative summary would allow immediate assessment of the strength of the claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract paragraph 3 / prediction-error model] Abstract paragraph 3 and the subsequent model section: the per-tile visibility probabilities (and therefore the entire optimization objective) rest on the assumption that CNN prediction error follows a Laplace distribution. No fitting procedure, Kolmogorov-Smirnov or likelihood-ratio test against alternatives, or empirical histogram comparison is supplied. Because mis-calibration of these probabilities directly alters which tiles are labeled viewport/marginal/invisible and changes the objective that the steepest-descent solver minimizes, the reported outperformance cannot be attributed to the proposed algorithm until this modeling choice is validated on the same datasets used for the rate-allocation experiments.

    Authors: We agree that the manuscript as submitted lacks explicit statistical validation of the Laplace assumption on the prediction errors. The Laplace model was selected because its heavier tails better accommodate occasional large viewpoint prediction deviations than a Gaussian; however, this rationale alone does not substitute for empirical verification. In the revised manuscript we will add a dedicated subsection that (i) plots normalized histograms of the observed prediction errors on the exact datasets used for the rate-allocation experiments, (ii) reports Kolmogorov-Smirnov goodness-of-fit statistics for the Laplace distribution together with comparisons against Gaussian and Student-t alternatives, and (iii) includes likelihood-ratio tests. These additions will directly address the concern that mis-calibration could affect tile classification and the optimization objective, thereby allowing the performance gains to be more confidently attributed to the proposed framework. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses fitted models and standard optimization without reduction to inputs by construction

full rationale

The paper fits a CNN viewpoint predictor and adopts a Laplace model for prediction error (abstract), derives per-tile visibility probabilities from the planar projection, classifies tiles, and solves the resulting non-linear discrete optimization via steepest descent initialized from the continuous relaxation. None of these steps are self-definitional or reduce the objective to the fitted values by construction; the optimization objective (minimize distortion and quality difference subject to capacity) is independent of the specific Laplace or CNN parameters. No load-bearing self-citations appear in the derivation chain, and the initialization technique is a standard non-circular method. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on a trained CNN whose weights are free parameters, an assumed Laplace error distribution whose scale is fitted, and the unproven claim that the resulting visibility probabilities correctly classify tiles for rate allocation.

free parameters (2)
  • CNN weights
    Trained on historical viewpoint data to capture nonlinear relationship; values not stated in abstract.
  • Laplace scale parameter
    Fitted to characterize prediction-error distribution; used directly to compute visibility probabilities.
axioms (2)
  • domain assumption The CNN captures the nonlinear relationship between future and historical viewpoints
    Invoked in abstract paragraph 2 as the foundation for the prediction model.
  • domain assumption The planar projection of the viewport combined with the Laplace error model yields correct per-tile visibility probabilities
    Stated without proof in abstract paragraph 3.

pith-pipeline@v0.9.0 · 5783 in / 1502 out tokens · 25680 ms · 2026-05-25T19:17:20.117056+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

  1. [1]

    Motion-prediction-based multicast for 360-degree video transmissions,

    Y . Bao, T. Zhang, A. Pande, H. Wu, and X. Liu, “Motion-prediction-based multicast for 360-degree video transmissions,” in Proc. IEEE International Conference on Sensing, Communication, and Networking , 2017

  2. [2]

    Pre-fetching based on video analysis for interactive region-of-interest streaming of soccer sequences,

    A. Mavlankar and B. Girod, “Pre-fetching based on video analysis for interactive region-of-interest streaming of soccer sequences,” in Proc. IEEE ICIP , 2009, pp. 3061-3064

  3. [3]

    Navigation-aware adaptive streaming strategies for omnidirectional video,

    S. Rossi and L. Toni, “Navigation-aware adaptive streaming strategies for omnidirectional video,” in Proc. IEEE International Workshop on Multimedia Signal Processing , 2017. 32

  4. [4]

    Optimizing 360 video delivery over cellular networks,

    F. Qian, L. Ji, B. Han, and V . Gopalakrishnan, “Optimizing 360 video delivery over cellular networks,” in Proc. ACM Workshop on All Things Cellular: Operations, Applications and Challenges , 2016, pp. 1-6

  5. [5]

    A Rate Adaptation Algorithm for Tile-based 360-degree Video Streaming

    A. Ghosh, V . Aggarwal, and F. Qian, “A rate adaptation algorithm for tile-based 360-degree video streaming,” arXiv preprint arXiv: 1704.08215 , 2017

  6. [6]

    Viewport-adaptive navigable 360-degree video delivery,

    X. Corbillon, G. Simon, A. Devlic, and J. Chakareski, “Viewport-adaptive navigable 360-degree video delivery,” in Proc. IEEE ICC, 2017

  7. [7]

    Deep reinforcement learning-based rate adaptation for adaptive 360-degree video streaming,

    N. Kan, J. Zou, K. Tang, C. Li, and H. Xiong, “Deep reinforcement learning-based rate adaptation for adaptive 360-degree video streaming,” in Proc. ICASSP, 2019

  8. [8]

    On improving video streaming efficiency, fairness, stability, and convergence time through client-server cooperation,

    O. E. Marai, T. Taleb, M. Menacer, and M. Koudil, “On improving video streaming efficiency, fairness, stability, and convergence time through client-server cooperation,” IEEE Trans. on Broadcasting , vol. 64, no. 1, pp. 11-25, 2018

  9. [9]

    Server-based traffic shaping for stabilizing oscillating adaptive streaming players,

    S. Akhshabi, L. Anantakrishnan, C. Dovrolis, and A. C. Begen, “Server-based traffic shaping for stabilizing oscillating adaptive streaming players,” in Proc. ACM NOSSDAV, 2013

  10. [10]

    Probe and adapt: rate adaptation for http video streaming at scale,

    Z. Li, X. Zhu, J. Gahm, R. Pan, H. Hu, A. C. Begen, and D. Oran, “Probe and adapt: rate adaptation for http video streaming at scale,” IEEE Journal on Selected Areas in Communications , vol. 32, no. 4, pp. 719-733, Apr. 2014

  11. [11]

    Dynamic adaptive streaming over HTTP – standards and design principles,

    T. Stockhammer, “Dynamic adaptive streaming over HTTP – standards and design principles,” in Proc. ACM MMSys , 2011, pp. 133-144

  12. [12]

    Hypertext transfer Protocol version 2 (HTTP/2),

    IETF, “Hypertext transfer Protocol version 2 (HTTP/2),” https://tools. ietf.org/html/rfc7540

  13. [13]

    An HTTP/2-based adaptive streaming framework for 360◦ virtual reality videos,

    S. Petrangeli, V . Swaminathan, M. Hosseini, and F. D. Turck, “ An HTTP/2-based adaptive streaming framework for 360◦ virtual reality videos,” in Proc. ACM MM, 2017, pp. 1-9

  14. [14]

    BAS- 360◦ : exploring spatial and temporal adaptability in 360-degree videos over HTTP/2,

    M. Xiao, C. Zhou, V . Swaminathan, Y . Liu, and S. Chen, “BAS- 360◦ : exploring spatial and temporal adaptability in 360-degree videos over HTTP/2,” in Proc. IEEE INFOCOM , 2018, pp. 953-961

  15. [15]

    Shooting a moving target: motion-prediction-based transmission for 360-degree videos,

    Y . Bao, H. Wu, T. Zhang, A. A. Ramli, and X. Liu, “Shooting a moving target: motion-prediction-based transmission for 360-degree videos,” in Proc. IEEE International Conference on Big Data , 2016, pp. 1161-1170

  16. [16]

    A frequency-domain analysis of head motion prediction,

    R. Azuma and G. Bishop, “A frequency-domain analysis of head motion prediction,” in Proc. ACM SIGGRAPH , 1995, pp. 401-408

  17. [17]

    Predicting head trajectories in 360 virtual reality videos,

    A. D. Aladagli, E. Ekmekcioglu, D. Jarnikov, and A. Kondoz, “Predicting head trajectories in 360 virtual reality videos,” in Proc. International Conference on 3D Immersion , 2018, pp. 1-6

  18. [18]

    Predicting head movement in panoramic video: a deep reinforcement learning approach,

    M. Xu, Y . Song, J. Wang, M. Qiao, L. Huo, and Z. Wang, “Predicting head movement in panoramic video: a deep reinforcement learning approach,” IEEE Trans. on Pattern Analysis and Machine Intelligence , preprint, 2018

  19. [19]

    Delay compensation for actuated stereoscopic 360 degree telepresence systems with probabilistic head motion prediction,

    T. Aykut, C. Burgmair, M. Karimi, J. Xu, and E. Steinbach, “Delay compensation for actuated stereoscopic 360 degree telepresence systems with probabilistic head motion prediction,” in Proc. IEEE Winter Conference on Applications of Computer Vision, 2018, pp. 1-9

  20. [20]

    360ProbDash: improving QoE of 360 video streaming using tile-based http adaptive streaming,

    L. Xie, Z. Xu, Y . Ban, X. Zhang, and Z. Guo, “360ProbDash: improving QoE of 360 video streaming using tile-based http adaptive streaming,” in Proceedings of ACM MM , 2017, pp. 315-323

  21. [21]

    A Test for Normality of Observations and Regression Residuals,

    C. M. Jarque and A. K. Bera, “A Test for Normality of Observations and Regression Residuals,” International Statistical Review/Revue Internationale de Statistique , vol. 55, no. 2, pp. 163-172, Aug. 1987

  22. [22]

    AHG8: WS-PSNR for 360 video objective quality evaluation, 2016

    ISO/IEC JTC1/SC29/WG11. AHG8: WS-PSNR for 360 video objective quality evaluation, 2016

  23. [23]

    Analysis of video transmission over lossy channels,

    K. Stuhlmuller, N. Farber, M. Link, and B. Girod, “Analysis of video transmission over lossy channels,” IEEE Journal on Selected Areas in Communications , vol. 18, no. 6, pp. 1012-1032, June 2000

  24. [24]

    Boyd and L

    S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004

  25. [25]

    Efficient bit allocation for dependent video coding,

    Y . Sermadevi and S. S. Hemami, “Efficient bit allocation for dependent video coding,” in Proc. IEEE DCC , 2004. 33

  26. [26]

    Adam: a method for stochastic optimization,

    D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proc. International Conference on Learning Representations, 2015

  27. [27]

    Test sequences for virtual reality video coding 2016

    ISO/IEC JTC1/SC29/WG11. Test sequences for virtual reality video coding 2016

  28. [28]

    AHG8: new GoPro test sequences for virtual reality video coding, 2016

    SO/IEC JTC1/SC29/WG11. AHG8: new GoPro test sequences for virtual reality video coding, 2016