arxiv: 2604.25178 · v1 · submitted 2026-04-28 · 💻 cs.CV

Recognition: unknown

Lightweight Real-Time Rendering Parameter Optimization via XGBoost-Driven Lookup Tables

Baijun Tan , Francesco Moretti

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:15 UTC · model grok-4.3

classification 💻 cs.CV

keywords rendering parameter optimizationlookup tablesXGBoostreal-time renderingsubsurface scatteringambient occlusionimage quality metricsperformance adaptation

0 comments

The pith

LUT-Opt turns offline XGBoost predictions into fast lookup tables for per-frame rendering parameter choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to balance rendering quality against real-time speed on devices like phones and laptops without heavy per-frame computation or days-long pre-calculations. It trains two XGBoost models offline to forecast how parameter settings affect rendering time and image quality given scene and hardware details, then converts those models into compact lookup tables through discretization and search. At runtime the tables are simply queried each frame to pick parameters that meet time limits while maximizing quality. A sympathetic reader would care because this promises games and visual apps that run noticeably faster while looking nearly as good, without the overhead of neural networks or scene-specific exhaustive tuning.

Core claim

The central claim is that a two-stage process of training XGBoost regressors on time and quality, followed by distillation into discretized lookup tables via constrained linear search, enables adaptive per-frame parameter selection with sub-millisecond latency, delivering roughly 40 percent faster subsurface scattering and 70 percent faster ambient occlusion at the expense of only about 2 percent extra image-quality error across tested scenes and GPUs.

What carries the argument

The LUT-Opt pipeline that trains XGBoost regressors to predict rendering time and SSIM-based quality from parameters, hardware state, and scene descriptors, then distills the models into queryable lookup tables by systematic discretization and a two-phase search that first respects time bounds and then maximizes quality.

If this is right

Real-time engines can adjust parameters every frame without per-scene pre-computation lasting days.
Mobile and laptop hardware can sustain higher frame rates for effects like subsurface scattering and ambient occlusion while keeping visual error small.
The same offline training plus table lookup approach can be applied to other rendering effects that depend on tunable parameters.
Per-frame adaptation becomes practical because query cost stays below 0.1 milliseconds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The discretization step may limit precision in high-dimensional parameter spaces, suggesting future work on adaptive table sizing.
If the models prove robust, similar table-distillation techniques could speed optimization loops in other graphics or simulation domains.
Testing on a broader range of hardware would reveal whether the current training set already covers enough variation for practical deployment.

Load-bearing premise

Models trained on particular scenes and hardware will still give accurate enough predictions after discretization so that the resulting tables pick near-optimal parameters on new scenes and different devices.

What would settle it

Measure actual rendering time and SSIM on a new scene or unseen GPU configuration using the table-selected parameters versus the true optimal parameters found by exhaustive search; a large gap in either time savings or quality would falsify the generalization claim.

read the original abstract

Achieving a desirable balance between rendering quality and real-time performance is a long-standing challenge in modern game and rendering engines, particularly on resource-constrained mobile devices such as laptops, tablets, and smartphones. Existing approaches to automatic rendering parameter optimization either depend on exhaustive per-scene pre-computation that spans several days, suffer from the prohibitive inference overhead of neural networks that prevents per-frame adaptation, or lack generalizability across heterogeneous hardware and diverse scenes. In this paper, we propose \textbf{LUT-Opt}, a lightweight, general-purpose framework for adaptive per-frame rendering parameter optimization. Our method decomposes the joint optimization of rendering time and image quality into a tractable two-stage pipeline. In the offline stage, we train a pair of XGBoost regressors to predict rendering time and image quality from rendering parameters, hardware state, and scene complexity descriptors. The trained ensemble models are then distilled into compact lookup tables (LUTs) through systematic discretization and a two-phase linear search that first constrains rendering time and subsequently maximizes structural similarity (SSIM). During runtime, the pre-computed LUT is queried every frame in sub-millisecond time, enabling truly adaptive parameter selection with negligible computational overhead. We validate LUT-Opt on two representative rendering techniques -- subsurface scattering (SSS) and hybrid-pipeline ambient occlusion (AO) -- implemented within Unreal Engine 5. Extensive experiments across multiple scenes and GPU configurations demonstrate that LUT-Opt reduces subsurface scattering rendering time by approximately 40\% and ambient occlusion rendering time by roughly 70\%, while incurring only about 2\% increase in image quality error, with per-frame inference latency below 0.1\ ms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LUT-Opt distills XGBoost predictions into fast LUTs for per-frame rendering params, delivering claimed speedups in UE5 tests but resting on unshown generalization.

read the letter

The paper's core move is practical: train two XGBoost regressors offline to predict render time and SSIM from parameters, hardware state, and scene descriptors, then turn those models into compact lookup tables via discretization and a two-phase search that first caps time and then picks the best quality. At runtime you just index the table, which keeps latency under 0.1 ms. That pipeline is the concrete advance over pure exhaustive search or full neural inference at runtime.

Referee Report

2 major / 2 minor

Summary. The paper introduces LUT-Opt, a two-stage framework for adaptive rendering parameter optimization. Offline, XGBoost regressors are trained to predict rendering time and image quality (SSIM) from parameters, hardware state, and scene descriptors; these models are then distilled into compact lookup tables via discretization and a two-phase linear search that first enforces time constraints and then maximizes quality. At runtime, the LUT is queried per frame in <0.1 ms. Experiments in Unreal Engine 5 on subsurface scattering (SSS) and ambient occlusion (AO) across multiple scenes and GPUs report ~40% and ~70% time reductions respectively with only ~2% quality error increase.

Significance. If the generalization claims hold, the method supplies a practical, low-overhead alternative to exhaustive per-scene precomputation or neural-network inference for real-time parameter tuning on heterogeneous devices, directly addressing mobile and laptop rendering constraints.

major comments (2)

[Abstract and Experiments] The central performance claims (abstract: 40% SSS and 70% AO time reductions with ~2% SSIM error) depend on the XGBoost regressors producing accurate enough predictions that the resulting LUT entries remain near-optimal for scenes and hardware outside the training distribution. No quantitative evidence is supplied on training-set size, validation procedure (e.g., leave-one-scene-out or hardware-extrapolation curves), hyperparameter selection, or measured prediction error on held-out data, leaving the reported speedups and quality bounds unsupported.
[Method] The two-phase linear search used to populate the LUTs (described after the XGBoost training step) assumes that the regressors' time and quality predictions remain reliable after discretization. Without reported sensitivity analysis to LUT granularity or to the accuracy of the scene-complexity descriptors, it is unclear whether modest prediction errors would cause the search to select systematically suboptimal or invalid parameter combinations.

minor comments (2)

[Abstract] The abstract states results 'across multiple scenes and GPU configurations' but supplies no concrete counts, scene characteristics, or GPU models, making it difficult to assess the breadth of the evaluation.
[Method] Notation for the scene descriptors and hardware state features is introduced without an explicit table or equation listing their definitions, which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of model validation and robustness that we have addressed through targeted revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Experiments] The central performance claims (abstract: 40% SSS and 70% AO time reductions with ~2% SSIM error) depend on the XGBoost regressors producing accurate enough predictions that the resulting LUT entries remain near-optimal for scenes and hardware outside the training distribution. No quantitative evidence is supplied on training-set size, validation procedure (e.g., leave-one-scene-out or hardware-extrapolation curves), hyperparameter selection, or measured prediction error on held-out data, leaving the reported speedups and quality bounds unsupported.

Authors: We agree that explicit quantitative details on training and validation are required to substantiate the generalization of the performance claims. In the revised manuscript we have expanded the Experiments section with a new subsection that reports the training-set composition (number of scenes, GPU configurations, and total samples collected), the validation strategy (including leave-one-scene-out cross-validation and hardware-extrapolation tests), the hyperparameter selection procedure, and measured prediction errors on held-out data. These additions directly support the reported speedups and quality bounds. revision: yes
Referee: [Method] The two-phase linear search used to populate the LUTs (described after the XGBoost training step) assumes that the regressors' time and quality predictions remain reliable after discretization. Without reported sensitivity analysis to LUT granularity or to the accuracy of the scene-complexity descriptors, it is unclear whether modest prediction errors would cause the search to select systematically suboptimal or invalid parameter combinations.

Authors: We concur that a sensitivity analysis is necessary to confirm the stability of the LUT construction. The revised manuscript now includes an additional analysis subsection that examines the effects of varying LUT discretization granularity and introduces controlled perturbations to scene-complexity descriptors. We report the resulting deviations in selected parameters, achieved rendering time, and SSIM, showing that the two-phase search remains robust within the observed prediction-error ranges of the regressors. revision: yes

Circularity Check

0 steps flagged

No circularity: fully empirical offline training and LUT construction

full rationale

The derivation consists of (1) collecting offline data on rendering time/quality for parameter sweeps, (2) fitting two XGBoost regressors, (3) discretizing the input space and using a two-phase search to populate LUT entries, and (4) runtime table lookup. None of these steps reduce a reported speedup or error bound to a quantity defined by the same fitted parameters; the 40%/70% time reductions and ~2% SSIM error are measured outcomes on held-out scenes and GPUs. No self-citations, uniqueness theorems, or ansatzes are invoked to justify the pipeline. The generalization risk noted by the skeptic is a standard empirical concern, not circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on the unstated assumption that the learned mapping from parameters/hardware/scene descriptors to time and quality is sufficiently smooth and stationary for discretization to preserve useful accuracy; free parameters include XGBoost hyperparameters and the chosen discretization granularity, both of which are fitted or selected without reported cross-validation details.

free parameters (2)

XGBoost hyperparameters
Learning rate, tree depth, and regularization terms are tuned during offline training but not reported.
LUT discretization granularity
Number of bins per parameter dimension is chosen to trade table size against prediction fidelity.

axioms (2)

domain assumption XGBoost regressors can accurately predict rendering time and image quality from the chosen input features.
Invoked when the offline stage trains the pair of models.
domain assumption A two-phase linear search over the discretized grid yields near-optimal parameters under the time-then-quality objective.
Used to populate the final lookup tables.

pith-pipeline@v0.9.0 · 5595 in / 1560 out tokens · 77975 ms · 2026-05-07T17:15:39.203683+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 4 canonical work pages

[1]

An image-space energy-saving visualization scheme for OLED displays

Haidong Chen, Junpeng Wang, Weifeng Chen, Huamin Qu, and Wei Chen. An image-space energy-saving visualization scheme for OLED displays. InComputers and Graphics, volume 38, pages 61–68. Elsevier, 2014. 3

2014
[2]

XGBoost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM, 2016. 2, 4

2016
[3]

An energy-saving color scheme for di- rect volume rendering.Computers and Graphics, 54:57–64,

Weifeng Chen, Wei Chen, Haidong Chen, Zhifang Zhang, and Huamin Qu. An energy-saving color scheme for di- rect volume rendering.Computers and Graphics, 54:57–64,
[4]

Power minimiza- tion in a backlit TFT-LCD display by concurrent brightness 7 and contrast scaling.IEEE Transactions on Consumer Elec- tronics, 50(1):25–32, 2004

Wei-Chung Cheng and Massoud Pedram. Power minimiza- tion in a backlit TFT-LCD display by concurrent brightness 7 and contrast scaling.IEEE Transactions on Consumer Elec- tronics, 50(1):25–32, 2004. 3

2004
[5]

Nastaran Farazmand and David R. Kaeli. Quality of service- aware dynamic voltage and frequency scaling for mobile 3D graphics applications. InProceedings of 2017 IEEE Interna- tional Conference on Computer Design (ICCD), pages 513–

2017
[6]

Advancing sequential numerical prediction in au- toregressive models.Proceedings of the 63rd Annual Meet- ing of the Association for Computational Linguistics, 2025

Xiaoyu Fei, Jinghui Lu, Qianyu Sun, Hao Feng, Yanjie Wang, Wei Shi, Ao Lin Wang, Jingqun Tang, and Can Huang. Advancing sequential numerical prediction in au- toregressive models.Proceedings of the 63rd Annual Meet- ing of the Association for Computational Linguistics, 2025. 3

2025
[7]

DocPedia: Unleashing the power of large multimodal model in the frequency domain for ver- satile document understanding.Science China Information Sciences, 2024

Hao Feng, Qi Liu, Hao Liu, Jingqun Tang, Wei Zhou, Hezhi Li, and Can Huang. DocPedia: Unleashing the power of large multimodal model in the frequency domain for ver- satile document understanding.Science China Information Sciences, 2024. 2

2024
[8]

UniDoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding,

Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wei Zhou, Hezhi Li, and Can Huang. UniDoc: A univer- sal large multimodal model for simultaneous text detec- tion, recognition, spotting and understanding.arXiv preprint arXiv:2308.11592, 2023. 3

work page arXiv 2023
[9]

Dolphin: Document image parsing via heterogeneous anchor prompting

Hao Feng, Shu Wei, Xiaoyu Fei, Wei Shi, Yang Han, Lianghui Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, et al. Dolphin: Document image parsing via heterogeneous anchor prompting. pages 21919–21936,
[10]

Dolphin-v2: Universal document parsing via scalable anchor prompting.arXiv preprint arXiv:2602.05384, 2026

Hao Feng, Xuecheng Wu, Jingqun Tang, Yang Liu, Hong Chen, Wei Shi, Dingkang Yang, and Can Huang. Dolphin- v2: Universal document parsing via scalable anchor prompt- ing.arXiv preprint arXiv:2602.05384, 2026. 3

work page arXiv 2026
[11]

Friedman

Jerome H. Friedman. Greedy function approximation: a gra- dient boosting machine. InAnnals of Statistics, volume 29, pages 1189–1232, 2001. 2, 4

2001
[12]

Neural temporal adaptive sampling and denoising

Jon Hasselgren, Jacob Munkberg, Marco Salvi, Anjul Pat- ney, and Aaron Lefohn. Neural temporal adaptive sampling and denoising. InComputer Graphics Forum, volume 39, pages 147–155, 2020. 2, 3

2020
[13]

Energy-adaptive display system designs for future mobile environments

Srihari Iyer, Lu Luo, Robert Mayo, and Parthasarathy Ran- ganathan. Energy-adaptive display system designs for future mobile environments. InProceedings of the 1st International Conference on Mobile Systems, Applications and Services, pages 245–258. ACM, 2003. 3

2003
[14]

Kaplanyan, Anton Sochenov, Thomas Leimk ¨uhler, Mikhail Okunev, Todd Goodall, and Gizem Rufo

Anton S. Kaplanyan, Anton Sochenov, Thomas Leimk ¨uhler, Mikhail Okunev, Todd Goodall, and Gizem Rufo. Deep- Fovea: Neural reconstruction for foveated rendering and video compression using learned statistics of natural videos. ACM Transactions on Graphics, 38(6):Article 212, 2019. 2, 3

2019
[15]

Real shading in Unreal Engine 4.SIGGRAPH 2013 Course, 2013

Brian Karis. Real shading in Unreal Engine 4.SIGGRAPH 2013 Course, 2013. 1

2013
[16]

Light- GBM: A highly efficient gradient boosting decision tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Light- GBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 3149–3157. Curran Associates Inc., 2017. 2, 4, 5, 6, 7

2017
[17]

SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12),

Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, et al. SPTS v2: Single-point scene text spotting.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(12),
[18]

A bounding box is worth one token— interleaving layout and text in a large language model for document understanding

Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, et al. A bounding box is worth one token— interleaving layout and text in a large language model for document understanding. pages 7252–7273, 2025. 3

2025
[19]

Cohen, Amitabh Varshney, Benjamin Watson, and Robert Huebner.Level of Detail for 3D Graphics

David Luebke, Martin Reddy, Jonathan C. Cohen, Amitabh Varshney, Benjamin Watson, and Robert Huebner.Level of Detail for 3D Graphics. Elsevier Science Inc., 2002. 4

2002
[20]

Scaling probe- based real-time dynamic global illumination for production

Zander Majercik, Jean-Philippe Guertin, Derek Nowrouzezahrai, and Morgan McGuire. Scaling probe- based real-time dynamic global illumination for production. volume 8, 2019. 3

2019
[21]

Fast global illumination approximations on deep G-buffers.Computer Graphics Forum, 36(2):187– 196, 2017

Michael Mara, Morgan McGuire, Derek Nowrouzezahrai, and David Luebke. Fast global illumination approximations on deep G-buffers.Computer Graphics Forum, 36(2):187– 196, 2017. 3

2017
[22]

Real-time neural radiance caching for path trac- ing.ACM Transactions on Graphics, 40(4):Article 36, 2021

Thomas M ¨uller, Fabrice Rousselle, Jan Novak, and Alexan- der Keller. Real-time neural radiance caching for path trac- ing.ACM Transactions on Graphics, 40(4):Article 36, 2021. 2, 3

2021
[23]

Deep shading: con- volutional neural networks for screen-space shading

Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, and Tobias Ritschel. Deep shading: con- volutional neural networks for screen-space shading. In Computer Graphics Forum, volume 36, pages 65–78. Wiley,
[24]

Prathap Narra and Donald S. Zinger. An effective LED dimming approach. InProceedings of 2004 Conference Record of IEEE Industry Applications Conference, pages 1671–1676. IEEE, 2004. 3

2004
[25]

Towards foveated rendering for gaze-tracked virtual reality

Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. Towards foveated rendering for gaze-tracked virtual reality. InACM Transactions on Graphics, volume 35, page Article 179, 2016. 2, 3

2016
[26]

The state of the art in interactive global illumina- tion

Tobias Ritschel, Carsten Dachsbacher, Thorsten Grosch, and Jan Kautz. The state of the art in interactive global illumina- tion. InComputer Graphics Forum, volume 31, pages 160– 188, 2012. 1

2012
[27]

Trends and forecasts in com- puter graphics—power-efficient rendering

David Robert. Trends and forecasts in com- puter graphics—power-efficient rendering. https://www.jonpeddie.com/news/ trends-and-forecasts-in-computer-graphics-power-efficient-rendering/,
[28]

MCTBench: Multimodal cognition towards text-rich visual scenes bench- mark.arXiv preprint arXiv:2410.11538, 2024

Biluo Shan, Xiaoyu Fei, Wei Shi, Ao Lin Wang, Guozhi Tang, Lianghui Liao, Jingqun Tang, Xiang Bai, and Can Huang. MCTBench: Multimodal cognition towards text-rich visual scenes benchmark.arXiv preprint arXiv:2410.11538,

work page arXiv
[29]

Dual streaming for hardware-accelerated ray tracing

Konstantin Shkurko, Tim Grant, Daniel Kopta, Ian Mal- lett, Cem Yuksel, and Erik Brunvand. Dual streaming for hardware-accelerated ray tracing. InProceedings of High Performance Graphics, page Article 12. ACM, 2017. 2 8

2017
[30]

Adaptive image-space sampling for gaze- contingent real-time rendering.Computer Graphics Forum, 35(4):129–139, 2016

Michael Stengel, Steve Grogorick, Marcus Magnor, and El- mar Eisemann. Adaptive image-space sampling for gaze- contingent real-time rendering.Computer Graphics Forum, 35(4):129–139, 2016. 2, 3

2016
[31]

Attentive eraser: Unleashing diffusion model’s ob- ject removal potential via self-attention redirection guidance

Wenhao Sun, Benlei Cui, Jingqun Tang, and Xiao-Ming Dong. Attentive eraser: Unleashing diffusion model’s ob- ject removal potential via self-attention redirection guidance. InProceedings of the AAAI Conference on Artificial Intelli- gence, 2025. 3

2025
[32]

Character recognition com- petition for street view shop signs.National Science Review, 10(6):nwad141, 2023

Jingqun Tang, Wei Du, Bo Wang, Wei Zhou, Song Mei, Tao Xue, Xin Xu, and Hao Zhang. Character recognition com- petition for street view shop signs.National Science Review, 10(6):nwad141, 2023. 3

2023
[33]

TextSquare: Scaling up text-centric visual instruction tuning,

Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Yue He, Kaifu Lu, Hao Feng, Yang Li, et al. TextSquare: Scaling up text-centric visual instruction tuning. arXiv preprint arXiv:2404.12803, 2024. 2

work page arXiv 2024
[34]

MTVQA: Benchmarking multilingual text-centric visual question answering.Findings of the Association for Computational Linguistics: ACL 2025, pages 7748–7763,

Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shu Wei, Ao Lin Wang, Chunhui Lin, Hao Feng, Zhen Zhao, and Can Huang. MTVQA: Benchmarking multilingual text-centric visual question answering.Findings of the Association for Computational Linguistics: ACL 2025, pages 7748–7763,

2025
[35]

Optimal boxes: Boosting end-to- end scene text recognition by adjusting annotated bounding boxes via reinforcement learning

Jingqun Tang, Wei Qian, Linhui Song, Xiaomin Dong, Li- meng Li, and Xiang Bai. Optimal boxes: Boosting end-to- end scene text recognition by adjusting annotated bounding boxes via reinforcement learning. InEuropean Conference on Computer Vision, pages 233–248. Springer, 2022. 3

2022
[36]

You can even annotate text with voice: Transcription-only-supervised text spotting

Jingqun Tang, Shuyi Qiao, Benlei Cui, Yuhang Ma, Shuai Zhang, and Dimitrios Kanoulas. You can even annotate text with voice: Transcription-only-supervised text spotting. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4154–4163, 2022. 3

2022
[37]

Few could be better than all: Feature sampling and grouping for scene text detection

Jingqun Tang, Wenqing Zhang, Hao Liu, Ming-Kun Yang, Bo Jiang, Guanglong Hu, and Xiang Bai. Few could be better than all: Feature sampling and grouping for scene text detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563– 4572, 2022. 2

2022
[38]

State of the art on neural rendering.Computer Graph- ics Forum, 39(2):701–727, 2020

Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin- Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, et al. State of the art on neural rendering.Computer Graph- ics Forum, 39(2):701–727, 2020. 2

2020
[39]

A detailed study of ray tracing per- formance: render time and energy cost.The Visual Com- puter, 34(6-8):875–885, 2018

Elena Vasiou, Konstantin Shkurko, Ian Mallett, Erik Brun- vand, and Cem Yuksel. A detailed study of ray tracing per- formance: render time and energy cost.The Visual Com- puter, 34(6-8):875–885, 2018. 2

2018
[40]

PARGO: Bridging vision-language with partial and global views

Ao Lin Wang, Biluo Shan, Wei Shi, Kai-Yuan Lin, Xiaoyu Fei, Guozhi Tang, Lianghui Liao, Jingqun Tang, Can Huang, et al. PARGO: Bridging vision-language with partial and global views. InProceedings of the AAAI Conference on Artificial Intelligence, 2025. 3

2025
[41]

WildDoc: How far are we from achieving comprehen- sive and robust document understanding in the wild? 2025

Ao Lin Wang, Jingqun Tang, Lianghui Liao, Hao Feng, Qi Liu, Xiaoyu Fei, Jinghui Lu, Han Wang, Hao Liu, Yang Liu, et al. WildDoc: How far are we from achieving comprehen- sive and robust document understanding in the wild? 2025. 3

2025
[42]

Real-time rendering on a power budget.ACM Transactions on Graphics, 35(4):Article 111,

Rui Wang, Bowen Yu, Julio Marco, Tianlei Hu, Diego Gutierrez, and Hujun Bao. Real-time rendering on a power budget.ACM Transactions on Graphics, 35(4):Article 111,
[43]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Pro- cessing, 13(4):600–612, 2004. 4

2004
[44]

ButterFly: A rendering framework for cross-device collabo- rative rendering

Chao Wu, Bowen Yang, Wenwu Zhu, and Yongxiang Zhang. ButterFly: A rendering framework for cross-device collabo- rative rendering. InProceedings of IEEE International Con- ference on Multimedia, 2018. 3

2018
[45]

Toward high mobile GPU performance through collabora- tive workload offloading.IEEE Transactions on Parallel and Distributed Systems, 29(2):435–449, 2018

Chao Wu, Bowen Yang, Wenwu Zhu, and Yongxiang Zhang. Toward high mobile GPU performance through collabora- tive workload offloading.IEEE Transactions on Parallel and Distributed Systems, 29(2):435–449, 2018. 1, 2

2018
[46]

On-the-fly power-aware rendering.Computer Graphics Forum, 37(4):155–166, 2018

Yanjun Zhang, Miguel Ortin, Victor Arellano, Rui Wang, Diego Gutierrez, and Hujun Bao. On-the-fly power-aware rendering.Computer Graphics Forum, 37(4):155–166, 2018. 1, 3

2018
[47]

PowerNet: Learning-based real-time power-budget ren- dering.IEEE Transactions on Visualization and Computer Graphics, 28(10):3486–3498, 2022

Yanjun Zhang, Rui Wang, Yuchi Huo, Wei Hua, and Hujun Bao. PowerNet: Learning-based real-time power-budget ren- dering.IEEE Transactions on Visualization and Computer Graphics, 28(10):3486–3498, 2022. 1, 2, 3, 4, 5, 6, 7

2022
[48]

Loh, and Yuan Xie

Jishen Zhao, Guangyu Sun, Gabriel H. Loh, and Yuan Xie. Energy-efficient GPU design with reconfigurable in-package graphics memory. InProceedings of 2012 ACM/IEEE Inter- national Symposium on Low Power Electronics and Design, pages 403–408. ACM, 2012. 2

2012
[49]

TabPedia: Towards comprehensive visual ta- ble understanding with concept synergy.Advances in Neural Information Processing Systems, 37, 2024

Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lianghui Liao, Yongjie Ye, Hao Liu, Wei Zhou, et al. TabPedia: Towards comprehensive visual ta- ble understanding with concept synergy.Advances in Neural Information Processing Systems, 37, 2024. 3

2024
[50]

Multi-modal in-context learning makes an ego-evolving scene text recognizer

Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Hao Liu, Zhizhong Zhang, Xin Tan, Can Huang, and Yuan Xie. Multi-modal in-context learning makes an ego-evolving scene text recognizer. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2023. 3

2023
[51]

Harmonizing visual text comprehension and gen- eration.Advances in Neural Information Processing Sys- tems, 37, 2024

Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, and Yuan Xie. Harmonizing visual text comprehension and gen- eration.Advances in Neural Information Processing Sys- tems, 37, 2024. 2 9

2024