Recognition: unknown
Generative Refinement Networks for Visual Synthesis
Pith reviewed 2026-05-10 15:37 UTC · model grok-4.3
The pith
Generative Refinement Networks combine near-lossless quantization with global refinement to surpass diffusion and autoregressive models in visual synthesis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRN establishes a new paradigm that replaces uniform diffusion computation and lossy autoregressive tokenization with Hierarchical Binary Quantization for near-lossless discrete latents plus a global refinement mechanism that progressively corrects and perfects outputs. This combination, paired with entropy-guided adaptive sampling, yields record image reconstruction and class-conditional generation performance on ImageNet while extending to text-to-image and text-to-video tasks at comparable scale.
What carries the argument
Hierarchical Binary Quantization (HBQ) that supplies a near-lossless discrete space, together with a global refinement mechanism that performs progressive correction during autoregressive generation.
If this is right
- Image reconstruction reaches quality levels previously associated only with continuous latent methods.
- Generation becomes adaptive in the number of steps taken, using more computation only where image complexity demands it.
- The same architecture scales directly to text-conditioned image and video tasks while maintaining strong performance.
- Autoregressive visual models gain an explicit correction stage that reduces the impact of early token errors.
Where Pith is reading between the lines
- The refinement loop could be inserted into existing autoregressive pipelines to improve their outputs without full retraining.
- Entropy-guided step allocation may shorten inference time in interactive applications by skipping unnecessary corrections on simple regions.
- If the quantization proves stable across domains, it offers a drop-in replacement for continuous encoders in other sequence-based generators.
- Extending the approach to longer video sequences could test whether global refinement continues to control error buildup over many frames.
Load-bearing premise
That Hierarchical Binary Quantization stays near-lossless in practice and that the global refinement step fixes errors without creating new accumulation problems or needing hidden tuning that affects the reported scores.
What would settle it
A controlled experiment in which continuous latent reconstruction still produces clearly lower FID than the reported 0.56 value, or generated images from GRN show visible artifacts traceable to refinement steps on complex scenes.
Figures
read the original abstract
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by lossy discrete tokenization and error accumulation. In this work, we introduce Generative Refinement Networks (GRN), a next-generation visual synthesis paradigm to address these issues. At its core, GRN addresses the discrete tokenization bottleneck through a theoretically near-lossless Hierarchical Binary Quantization (HBQ), achieving a reconstruction quality comparable to continuous counterparts. Built upon HBQ's latent space, GRN fundamentally upgrades AR generation with a global refinement mechanism that progressively perfects and corrects artworks -- like a human artist painting. Besides, GRN integrates an entropy-guided sampling strategy, enabling complexity-aware, adaptive-step generation without compromising visual quality. On the ImageNet benchmark, GRN establishes new records in image reconstruction (0.56 rFID) and class-conditional image generation (1.81 gFID). We also scale GRN to more challenging text-to-image and text-to-video generation, delivering superior performance on an equivalent scale. We release all models and code to foster further research on GRN.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Generative Refinement Networks (GRN) to improve visual synthesis over diffusion and autoregressive models. It proposes Hierarchical Binary Quantization (HBQ) as a theoretically near-lossless discrete tokenization method, a global refinement mechanism that progressively corrects autoregressive generation errors, and entropy-guided sampling for complexity-aware adaptive inference. The central empirical claims are new ImageNet records of 0.56 rFID for reconstruction and 1.81 gFID for class-conditional generation, plus superior scaled results on text-to-image and text-to-video tasks, with all models and code released.
Significance. If the near-lossless property of HBQ and the error-correction efficacy of refinement hold under rigorous verification, the work offers a meaningful complexity-aware alternative to uniform-cost diffusion models while mitigating AR tokenization and accumulation issues. The public release of models and code is a clear strength that supports reproducibility and community follow-up.
major comments (3)
- [Abstract] Abstract: The assertion that HBQ is 'theoretically near-lossless' and yields reconstruction quality 'comparable to continuous counterparts' is stated without a derivation, error bound, or quantitative analysis of quantization loss; this is load-bearing for interpreting the 0.56 rFID figure as a new record rather than an artifact of the discretization.
- [Abstract] Abstract: No ablation studies, controls, or analysis are provided to demonstrate that the global refinement mechanism corrects AR token errors without introducing new accumulation, requiring post-hoc hyperparameter tuning, or inflating the reported 1.81 gFID; this directly affects the validity of the 'new records' and 'superior on equivalent scale' claims.
- [Experiments] Experiments section: The text-to-image and text-to-video scaling results claim 'superior performance on an equivalent scale' but supply no details on matched model size, training data volume, or compute budget for the baselines, preventing verification of the comparison.
minor comments (1)
- [Abstract] The phrasing 'like a human artist painting' in the abstract is informal and could be replaced with a more precise description of the refinement process.
Simulated Author's Rebuttal
We are grateful to the referee for their thorough review and valuable suggestions. We address each of the major comments point by point below. We have made revisions to the manuscript to incorporate additional analysis and details as requested.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that HBQ is 'theoretically near-lossless' and yields reconstruction quality 'comparable to continuous counterparts' is stated without a derivation, error bound, or quantitative analysis of quantization loss; this is load-bearing for interpreting the 0.56 rFID figure as a new record rather than an artifact of the discretization.
Authors: We acknowledge the need for explicit support for the claim. We will add a derivation of the near-lossless property, including an error bound, to the revised manuscript, along with quantitative analysis of the quantization loss to support the reconstruction results. revision: yes
-
Referee: [Abstract] Abstract: No ablation studies, controls, or analysis are provided to demonstrate that the global refinement mechanism corrects AR token errors without introducing new accumulation, requiring post-hoc hyperparameter tuning, or inflating the reported 1.81 gFID; this directly affects the validity of the 'new records' and 'superior on equivalent scale' claims.
Authors: We agree that demonstrating the efficacy of the global refinement mechanism through ablations is crucial. While the manuscript discusses the mechanism, we will include additional ablation studies in the revised version, such as comparisons of generation with and without refinement, analysis of error accumulation over steps, and sensitivity to hyperparameters. This will provide evidence that the refinement corrects errors without introducing new issues or inflating the gFID score. revision: yes
-
Referee: [Experiments] Experiments section: The text-to-image and text-to-video scaling results claim 'superior performance on an equivalent scale' but supply no details on matched model size, training data volume, or compute budget for the baselines, preventing verification of the comparison.
Authors: We appreciate this feedback on the scaling experiments. To enable verification of the 'equivalent scale' comparisons, we will expand the experiments section with detailed specifications of the model sizes, training data volumes, and compute budgets for all baselines in the text-to-image and text-to-video tasks. This will be presented in a comparative table for clarity. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The abstract and provided excerpts introduce HBQ as 'theoretically near-lossless' and describe a global refinement mechanism plus entropy-guided sampling, with empirical claims of 0.56 rFID and 1.81 gFID on ImageNet. No equations, fitted parameters, or self-referential definitions appear that would make any 'prediction' or record equivalent to its inputs by construction. The performance numbers are presented as benchmark outcomes rather than tautological renamings or fitted-input predictions. Absent any load-bearing self-citation chain or ansatz smuggled via prior work that reduces the core claims to unverified inputs, the derivation remains self-contained against external benchmarks. This is the expected honest non-finding for an empirical methods paper whose central results are falsifiable via reported metrics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hierarchical Binary Quantization achieves reconstruction quality comparable to continuous latent spaces
invented entities (1)
-
Generative Refinement Networks (GRN)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [3]
-
[4]
Brooks, B
T. Brooks, B. Peebles, C. Holmes, W. DePue, Y . Guo, L. Jing, D. Schnurr, J. Taylor, T. Luhman, E. Luhman, C. Ng, R. Wang, and A. Ramesh. Video generation models as world simulators.OpenAI, 2024. 1, 3
2024
-
[5]
H. Cai, S. Cao, R. Du, P. Gao, S. Hoi, Z. Hou, S. Huang, D. Jiang, X. Jin, L. Li, et al. Z-image: An efficient image generation foundation model with single-stream diffusion transformer.arXiv preprint arXiv:2511.22699, 2025. 8, 9 12
work page internal anchor Pith review arXiv 2025
-
[6]
Q. Cai, Y . Li, Y . Pan, T. Yao, and T. Mei. Hidream-i1: An open-source high-efficient image generative foundation model. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13636–13639. ACM, 2025. 8, 9
2025
-
[7]
Chang, H
H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman. Maskgit: Masked generative image transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11315– 11325, 2022. 2, 3, 7, 8, 11, 17
2022
-
[8]
H. Chen, M. Xia, Y . He, Y . Zhang, X. Cun, S. Yang, J. Xing, Y . Liu, Q. Chen, X. Wang, et al. Videocrafter1: Open diffusion models for high-quality video generation.arXiv preprint arXiv:2310.19512, 2023. 8, 9
work page internal anchor Pith review arXiv 2023
-
[9]
J. Chen, J. Yu, C. Ge, L. Yao, E. Xie, Y . Wu, Z. Wang, J. Kwok, P. Luo, H. Lu, et al. Pixart: Fast training of diffusion transformer for photorealistic text-to-image synthesis.arXiv preprint arXiv:2310.00426, 2023. 9
work page internal anchor Pith review arXiv 2023
-
[10]
J. Chen, D. Zou, W. He, J. Chen, E. Xie, S. Han, and H. Cai. Dc-ae 1.5: Accelerating diffusion model convergence with structured latent space. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19628–19637, 2025. 7
2025
-
[11]
X. Chen, Z. Wu, X. Liu, Z. Pan, W. Liu, Z. Xie, X. Yu, and C. Ruan. Janus-pro: Unified multimodal understanding and generation with data and model scaling.arXiv preprint arXiv:2501.17811, 2025. 9
work page internal anchor Pith review arXiv 2025
-
[12]
Dehghani, B
M. Dehghani, B. Mustafa, J. Djolonga, J. Heek, M. Minderer, M. Caron, A. Steiner, J. Puigcerver, R. Geirhos, I. M. Alabdulmohsin, et al. Patch n’pack: Navit, a vision transformer for any aspect ratio and resolution.Advances in Neural Information Processing Systems, 36:2252–2274, 2023. 17
2023
-
[13]
C. Deng, D. Zhu, K. Li, C. Gou, F. Li, Z. Wang, S. Zhong, W. Yu, X. Nie, Z. Song, G. Shi, and H. Fan. Emerging properties in unified multimodal pretraining.arXiv preprint arXiv:2505.14683, 2025. 9
work page internal anchor Pith review arXiv 2025
- [14]
- [15]
-
[16]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee,
-
[17]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2018. 2, 11
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Esser, S
P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first International Conference on Machine Learning, 2024. 3, 7, 8, 9
2024
-
[19]
Esser, R
P. Esser, R. Rombach, and B. Ommer. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883,
-
[20]
Y . Gao, L. Gong, Q. Guo, X. Hou, Z. Lai, F. Li, L. Li, X. Lian, C. Liao, L. Liu, et al. Seedream 3.0 technical report.arXiv preprint arXiv:2504.11346, 2025. 9
work page internal anchor Pith review arXiv 2025
-
[21]
Ghosh, H
D. Ghosh, H. Hajishirzi, and L. Schmidt. Geneval: An object-focused framework for evaluating text-to- image alignment.Advances in Neural Information Processing Systems, 36, 2024. 7, 9
2024
- [22]
-
[23]
Y . Guo, C. Yang, A. Rao, Z. Liang, Y . Wang, Y . Qiao, M. Agrawala, D. Lin, and B. Dai. Animated- iff: Animate your personalized text-to-image diffusion models without specific tuning.arXiv preprint arXiv:2307.04725, 2023. 8, 9
work page internal anchor Pith review arXiv 2023
-
[24]
A. Haar. Zur theorie der orthogonalen funktionensysteme.Mathematische Annalen, 69(3):331–371, 1910. 3
1910
- [25]
- [26]
-
[27]
Heusel, H
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30,
-
[28]
Kondratyuk, L
D. Kondratyuk, L. Yu, X. Gu, J. Lezama, J. Huang, G. Schindler, R. Hornung, V . Birodkar, J. Yan, M.-C. Chiu, K. Somandepalli, H. Akbari, Y . Alon, Y . Cheng, J. Dillon, A. Gupta, M. Hahn, A. Hauth, D. Hendon, A. Martinez, D. Minnen, M. Sirotenko, K. Sohn, X. Yang, H. Adam, M.-H. Yang, I. Essa, H. Wang, D. A. Ross, B. Seybold, and L. Jiang. Videopoet: A l...
2024
-
[29]
W. Kong, Q. Tian, Z. Zhang, R. Min, Z. Dai, J. Zhou, J. Xiong, X. Li, B. Wu, J. Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024. 1, 9
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Kuznetsova, H
A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale.International Journal of Computer Vision, 128(7):1956–1981, 2020. 6, 17
1956
-
[31]
B. F. Labs. Flux.https://blackforestlabs.ai/announcing-black-forest-labs/, 2024. 3, 9 13
2024
-
[32]
Back to Basics: Let Denoising Generative Models Denoise
T. Li and K. He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720,
work page internal anchor Pith review arXiv
-
[33]
T. Li, Y . Tian, H. Li, M. Deng, and K. He. Autoregressive image generation without vector quantization. Advances in Neural Information Processing Systems, 37:56424–56445, 2024. 8
2024
-
[34]
A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024. 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [35]
- [36]
-
[37]
N. Ma, M. Goldstein, M. S. Albergo, N. M. Boffi, E. Vanden-Eijnden, and S. Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024. 7, 8
2024
-
[38]
Y . Ma, X. Liu, X. Chen, W. Liu, C. Wu, Z. Wu, Z. Pan, Z. Xie, H. Zhang, X. Yu, et al. Janusflow: Harmonizing autoregression and rectified flow for unified multimodal understanding and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7739–7751,
-
[39]
Finite scalar quantization: Vq-vae made simple.arXiv preprint arXiv:2309.15505, 2023
F. Mentzer, D. Minnen, E. Agustsson, and M. Tschannen. Finite scalar quantization: Vq-vae made simple. arXiv preprint arXiv:2309.15505, 2023. 2, 16
-
[40]
Introducing gpt-4o image generation
OpenAI. Introducing gpt-4o image generation. https://openai.com/zh-Hans-CN/index/ introducing-4o-image-generation/, 2025. Accessed: 2025-03-05. 9
2025
-
[41]
Ouyang, J
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744, 2022. 2
2022
-
[42]
Z. Pang, T. Zhang, F. Luan, Y . Man, H. Tan, K. Zhang, W. T. Freeman, and Y .-X. Wang. Randar: Decoder- only autoregressive visual generation in random orders. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 45–55, 2025. 8
2025
-
[43]
Peebles and S
W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023. 1, 7, 8
2023
-
[44]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Müller, J. Penna, and R. Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis.arXiv preprint arXiv:2307.01952,
work page internal anchor Pith review Pith/arXiv arXiv
- [45]
-
[46]
Qwen-Image Team. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 8, 9
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Rombach, A
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 6
2022
-
[48]
Salimans, I
T. Salimans, I. Goodfellow, W. Zaremba, V . Cheung, A. Radford, and X. Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 8
2016
-
[49]
Shenoy, Y
A. Shenoy, Y . Lu, S. Jayakumar, D. Chatterjee, M. Moslehpour, P. Chuang, A. Harpale, V . Bhardwaj, D. Xu, S. Zhao, L. Zhao, A. Ramchandani, X. L. Dong, and A. Kumar. Lumos : Empowering multimodal llms with scene text recognition, 2024. 8, 9
2024
-
[50]
P. Sun, Y . Jiang, S. Chen, S. Zhang, B. Peng, P. Luo, and Z. Yuan. Autoregressive model beats diffusion: Llama for scalable image generation.arXiv preprint arXiv:2406.06525, 2024. 3, 6, 7, 8, 17
work page internal anchor Pith review arXiv 2024
- [51]
- [52]
-
[53]
Van Den Oord, O
A. Van Den Oord, O. Vinyals, et al. Neural discrete representation learning.Advances in neural information processing systems, 30, 2017. 2
2017
-
[54]
A. Wang, B. Ai, B. Wen, C. Mao, C.-W. Xie, D. Chen, F. Yu, H. Zhao, J. Yang, J. Zeng, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025. 3, 6, 9
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [55]
-
[56]
X. Wang, X. Zhang, Z. Luo, Q. Sun, Y . Cui, J. Wang, F. Zhang, Y . Wang, Z. Li, Q. Yu, et al. Emu3: Next-token prediction is all you need.arXiv preprint arXiv:2409.18869, 2024. 2, 8, 9
work page internal anchor Pith review arXiv 2024
-
[57]
J. Xie, Z. Yang, and M. Z. Shou. Show-o2: Improved native unified multimodal models.arXiv preprint arXiv:2506.15564, 2025. 9
work page internal anchor Pith review arXiv 2025
-
[58]
W. Yan, Y . Zhang, P. Abbeel, and A. Srinivas. Videogpt: Video generation using vq-vae and transformers. arXiv preprint arXiv:2104.10157, 2021. 2
work page internal anchor Pith review arXiv 2021
-
[59]
Z. Yang, J. Teng, W. Zheng, M. Ding, S. Huang, J. Xu, Y . Yang, W. Hong, X. Zhang, G. Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072,
work page internal anchor Pith review arXiv
-
[60]
T. Yin, M. Gharbi, T. Park, R. Zhang, E. Shechtman, F. Durand, and B. Freeman. Improved distribu- tion matching distillation for fast image synthesis.Advances in neural information processing systems, 14 37:47455–47487, 2024. 3
2024
- [61]
-
[62]
S. Yu, S. Kwak, H. Jang, J. Jeong, J. Huang, J. Shin, and S. Xie. Representation alignment for generation: Training diffusion transformers is easier than you think.arXiv preprint arXiv:2410.06940, 2024. 8
work page internal anchor Pith review arXiv 2024
-
[63]
D. J. Zhang, J. Z. Wu, J.-W. Liu, R. Zhao, L. Ran, Y . Gu, D. Gao, and M. Z. Shou. Show-1: Marrying pixel and latent diffusion models for text-to-video generation.International Journal of Computer Vision, pages 1–15, 2024. 8, 9
2024
-
[64]
Zhang, Z
H. Zhang, Z. Wu, Z. Xing, J. Shao, and Y .-G. Jiang. Adadiff: adaptive step selection for fast diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9914–9922,
-
[65]
Waver: Wave your way to lifelike video generation,
Y . Zhang, H. Yang, Y . Zhang, Y . Hu, F. Zhu, C. Lin, X. Mei, Y . Jiang, B. Peng, and Z. Yuan. Waver: Wave your way to lifelike video generation.arXiv preprint arXiv:2508.15761, 2025. 1
- [66]
-
[67]
Diffusion Transformers with Representation Autoencoders
B. Zheng, N. Ma, S. Tong, and S. Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025. 6, 8
work page internal anchor Pith review arXiv 2025
-
[68]
Open-Sora: Democratizing Efficient Video Production for All
Z. Zheng, X. Peng, T. Yang, C. Shen, S. Li, H. Liu, Y . Zhou, T. Li, and Y . You. Open-sora: Democratizing efficient video production for all.arXiv preprint arXiv:2412.20404, 2024. 8, 9
work page internal anchor Pith review arXiv 2024
-
[69]
C. Zhou, L. Yu, A. Babu, K. Tirumala, M. Yasunaga, L. Shamis, J. Kahn, X. Ma, L. Zettlemoyer, and O. Levy. Transfusion: Predict the next token and diffuse images with one multi-modal model.arXiv preprint arXiv:2408.11039, 2024. 12 15 A Algorithm for Hierarchical Binary Quantization We outline the procedure for our proposed Hierarchical Binary Quantization...
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.