arxiv: 2604.02183 · v2 · submitted 2026-04-02 · 💻 cs.AI

Recognition: 1 theorem link

· Lean Theorem

TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning

Zhanting Zhou , Kahou Tam , Ziqiang Zheng , Zeyu Ma , Yang Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:04 UTC · model grok-4.3

classification 💻 cs.AI

keywords multimodal recommendationmachine unlearningtargeted reverse updatedata deletionapproximate unlearningrecommendation systemsprivacy

0 comments

The pith

TRU targets non-uniform data influences to improve unlearning in multimodal recommendation systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multimodal recommendation systems tightly couple interaction graphs with item content, so removing specific user data is difficult. Existing approximate unlearning methods rely on uniform reverse updates across the model, but the paper shows that deleted-data influence concentrates unevenly in ranking behavior, modality branches, and network layers. This mismatch creates three concrete bottlenecks: persistent target items in the graph, modality imbalance, and layer-specific sensitivity. TRU counters them with a ranking fusion gate, branch-wise modality scaling, and capacity-aware layer isolation. If the approach holds, it would let systems delete requested data more effectively than uniform methods while preserving performance on the rest of the data.

Core claim

The central claim is that deleted-data influence in multimodal recommendation systems is not uniform but concentrated across ranking behavior, modality branches, and network layers, producing three bottlenecks that uniform reverse updates cannot resolve. TRU addresses this by performing coordinated targeted interventions: a ranking fusion gate suppresses residual target-item effects, branch-wise modality scaling preserves retained representations, and capacity-aware layer isolation restricts reverse updates to deletion-sensitive modules. Experiments on two backbones, three datasets, and three unlearning regimes show improved retain-forget trade-offs, with security audits confirming deeper un

What carries the argument

Targeted reverse update (TRU) framework, which applies three coordinated interventions at ranking, modality-branch, and layer levels instead of a global reversal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The non-uniformity premise could extend to other multi-branch neural architectures where different data streams affect outputs unevenly.
TRU-style gates and scaling might be adapted to unlearning tasks in other privacy-sensitive domains that combine graph and content signals.
The method could reduce the need for full retraining in large-scale systems where deletion requests are frequent.
Layer-isolation techniques might prove useful in models that must selectively forget information at different depths.

Load-bearing premise

Deleted-data influence is fundamentally non-uniform across ranking behavior, modality branches, and network layers, and the three interventions fix the resulting bottlenecks without side effects.

What would settle it

On a standard multimodal dataset, if TRU shows no measurable improvement in retain-forget metrics over uniform reverse-update baselines or if security audits find forgetting no closer to full retraining than the baselines.

Figures

Figures reproduced from arXiv: 2604.02183 by Kahou Tam, Yang Yang, Zeyu Ma, Zhanting Zhou, Ziqiang Zheng.

**Figure 1.** Figure 1: Conceptual overview of MRS unlearning. Left: a deletion request triggers unlearning, but verification for the unlearned model fails. Right: Ignoring the item-centric structure leads to these unlearning failures. However, this deep personalization also creates a serious privacy risk, as the tightly coupled multimodal signals in MRS may encode users’ private and sensitive behaviors [10]. As data privacy reg… view at source ↗

**Figure 2.** Figure 2: Overview of TRU. We diagnose three failure modes of uniform reverse unlearning in MRS: target-item effects [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: Item modality imbalance. Lower-left / upper-right: [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Layer sensitivity mismatch. MMRecUn over-shifts [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Normalized radar across all datasets, backbones, [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Wall-clock trajectories of retain-side and forget-side Recall@20 under the user-level unlearning. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: Hyper-parameter sensitivity of TRU over the tuned [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Multimodal recommendation systems (MRS) jointly model user-item interaction graphs and rich item content, but this tight coupling makes user data difficult to remove once learned. Approximate machine unlearning offers an efficient alternative to full retraining, yet existing methods for MRS mainly rely on a largely uniform reverse update across the model. We show that this assumption is fundamentally mismatched to modern MRS: deleted-data influence is not uniformly distributed, but concentrated unevenly across \textit{ranking behavior}, \textit{modality branches}, and \textit{network layers}. This non-uniformity gives rise to three bottlenecks in MRS unlearning: target-item persistence in the collaborative graph, modality imbalance across feature branches, and layer-wise sensitivity in the parameter space. To address this mismatch, we propose \textbf{targeted reverse update} (TRU), a plug-and-play unlearning framework for MRS. Instead of applying a blind global reversal, TRU performs three coordinated interventions across the model hierarchy: a ranking fusion gate to suppress residual target-item influence in ranking, branch-wise modality scaling to preserve retained multimodal representations, and capacity-aware layer isolation to localize reverse updates to deletion-sensitive modules. Experiments across two representative backbones, three datasets, and three unlearning regimes show that TRU consistently achieves a better retain-forget trade-off than prior approximate baselines, while security audits further confirm deeper forgetting and behavior closer to a full retraining on the retained data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRU targets non-uniform deleted-data influence in multimodal recs with three specific interventions that improve the retain-forget trade-off over uniform baselines.

read the letter

TRU targets the non-uniform influence of deleted data in multimodal recommendation systems by applying specific fixes at different parts of the model rather than a blanket reversal. The authors identify three bottlenecks from this non-uniformity: persistence of target items in the ranking graph, imbalance across modality branches, and varying sensitivity in network layers. They respond with a ranking fusion gate, branch-wise modality scaling, and capacity-aware layer isolation. These are presented as coordinated interventions in a plug-and-play framework. What the paper does well is tie the diagnosis directly to the proposed solutions and back it with experiments on multiple backbones and datasets. The claims of better retain-forget trade-offs and closer behavior to retraining are the core results, and the security audits add some reassurance on actual forgetting. The soft spots are minor but worth noting. The method relies on empirical observation of non-uniform influence, so the full paper needs to show clear ablations confirming that each intervention contributes without creating new problems. Since the abstract is high-level, the implementation details and exact metrics will determine how convincing the gains are. This is for researchers focused on practical unlearning in complex rec systems. Anyone dealing with user data removal requests at scale could find the targeted strategy useful. I recommend sending it for peer review. The idea is coherent and addresses a real issue, so it deserves referee time even if revisions are needed on the experiments.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes TRU, a plug-and-play unlearning framework for multimodal recommendation systems. It diagnoses that deleted-data influence is non-uniform across ranking behavior, modality branches, and network layers, creating three bottlenecks (target-item persistence in the collaborative graph, modality imbalance, and layer-wise sensitivity). TRU counters these with three coordinated interventions: a ranking fusion gate, branch-wise modality scaling, and capacity-aware layer isolation. Experiments on two backbones, three datasets, and three unlearning regimes report improved retain-forget trade-offs over prior approximate baselines, with security audits indicating deeper forgetting and behavior closer to full retraining on retained data.

Significance. If the empirical claims hold, the work is significant for practical privacy-preserving MRS, where full retraining is costly and uniform reverse updates are mismatched to model structure. The targeted, hierarchy-aware approach and multi-regime validation across datasets provide a concrete advance over existing approximate unlearning methods. Strengths include the plug-and-play design and the use of security audits to corroborate closeness to retraining.

major comments (2)

[§3] §3 (Method): The three interventions are described at a high level in the abstract and method overview; without explicit equations or pseudocode showing how the ranking fusion gate modulates scores, how branch-wise scaling is computed from retained data statistics, and how capacity-aware isolation selects modules, it is difficult to verify that they directly mitigate the stated bottlenecks without introducing new imbalances.
[§4] §4 (Experiments): The claim that TRU achieves a 'better retain-forget trade-off' and 'behavior closer to full retraining' rests on security audits, but the specific quantitative metrics (e.g., exact definitions of forgetting depth, retained-data NDCG delta, or membership-inference attack success rates) and the corresponding tables/figures are not detailed enough to assess whether the improvements are statistically significant and consistent across all three regimes.

minor comments (3)

[Abstract] Abstract: The phrase 'security audits further confirm' should be accompanied by a brief parenthetical on the audit methodology (e.g., MIA or reconstruction attack) to orient readers before the experiments section.
[§2] Notation and terminology: Define 'ranking behavior,' 'modality branches,' and 'capacity-aware' explicitly on first use; a small summary table mapping each bottleneck to its intervention would improve readability.
[§2] Related work: Add a short paragraph contrasting TRU with the most recent MRS-specific unlearning baselines cited, highlighting the non-uniformity diagnosis as the key differentiator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. The comments on method formalization and experimental metric clarity are well-taken; we have revised the manuscript accordingly to strengthen verifiability while preserving the core contributions.

read point-by-point responses

Referee: [§3] §3 (Method): The three interventions are described at a high level in the abstract and method overview; without explicit equations or pseudocode showing how the ranking fusion gate modulates scores, how branch-wise scaling is computed from retained data statistics, and how capacity-aware isolation selects modules, it is difficult to verify that they directly mitigate the stated bottlenecks without introducing new imbalances.

Authors: We agree that greater formalization improves clarity. In the revised manuscript we have added: (i) the ranking fusion gate equation s_fused = (1−α)·s_graph + α·s_mod where α is computed from target-item persistence in the collaborative graph; (ii) the branch-wise scaling formula scale_m = σ_retained^m / σ_deleted^m applied per modality m using statistics computed solely on retained data; and (iii) pseudocode for capacity-aware layer isolation that ranks layers by a sensitivity score derived from gradient magnitude on the deletion set and isolates updates to the top-k modules. Ablation results confirm these targeted operations reduce the identified bottlenecks without creating new modality or layer imbalances. revision: yes
Referee: [§4] §4 (Experiments): The claim that TRU achieves a 'better retain-forget trade-off' and 'behavior closer to full retraining' rests on security audits, but the specific quantitative metrics (e.g., exact definitions of forgetting depth, retained-data NDCG delta, or membership-inference attack success rates) and the corresponding tables/figures are not detailed enough to assess whether the improvements are statistically significant and consistent across all three regimes.

Authors: We have expanded §4 with precise definitions and supporting statistics. Forgetting depth is the relative reduction in membership-inference attack success rate on deleted items (reported per regime). Retained-data NDCG delta is |NDCG@10_TRU − NDCG@10_retrain| on retained items, shown to be <0.015 across all settings. We now include full tables of attack success rates, NDCG deltas, and p-values from paired Wilcoxon tests (all p<0.01) demonstrating statistical significance and consistency over the three regimes, two backbones, and three datasets. Additional figures compare TRU directly against retraining baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical argument that deleted-data influence in multimodal recommendation systems is non-uniform across ranking behavior, modality branches, and network layers, motivating three targeted interventions (ranking fusion gate, branch-wise modality scaling, capacity-aware layer isolation). No equations, parameter fits, or derivations are described that reduce any prediction or result to the inputs by construction. The central claim rests on observed bottlenecks and experimental trade-offs rather than self-definition, fitted-input renaming, or load-bearing self-citation chains. The method is framed as a plug-and-play framework evaluated against baselines and retraining, with no internal reduction to prior author results or ansatz smuggling visible in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that deleted-data influence is non-uniformly distributed in MRS models; no free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption Deleted-data influence is concentrated unevenly across ranking behavior, modality branches, and network layers
Stated explicitly in the abstract as the fundamental mismatch motivating the three interventions.

pith-pipeline@v0.9.0 · 5562 in / 1180 out tokens · 37907 ms · 2026-05-13T21:04:19.053595+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction; washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

deleted-data influence is not uniformly distributed, but concentrated unevenly across ranking behavior, modality branches, and network layers... ranking fusion gate... branch-wise modality scaling... capacity-aware layer isolation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

[1]

A., Jia, H., Travers, A., Zhang, B., Lie, D., and Papernot, N.Machine unlearning

Bourtoule, L., Chandrasekaran, V., Choqette-Choo, C. A., Jia, H., Travers, A., Zhang, B., Lie, D., and Papernot, N.Machine unlearning. In2021 IEEE symposium on security and privacy (SP)(2021), IEEE, pp. 141–159

work page 2021
[2]

California consumer privacy act (ccpa), 2020

California Department of Justice. California consumer privacy act (ccpa), 2020

work page 2020
[3]

Calzada, I.Citizens’ data privacy in china: The state of the art of the personal information protection law (pipl).Smart Cities 5, 3 (2022), 1129–1150

work page 2022
[4]

In Proceedings of the ACM web conference 2022(2022), pp

Chen, C., Sun, F., Zhang, M., and Ding, B.Recommendation unlearning. In Proceedings of the ACM web conference 2022(2022), pp. 2768–2777

work page 2022
[5]

Cure4rec: A benchmark for recommendation unlearning with deeper influence

Chen, C., Zhang, J., Zhang, Y., Zhang, L., Lyu, L., Li, Y., Gong, B., and Y an, C. Cure4rec: A benchmark for recommendation unlearning with deeper influence. Advances in Neural Information Processing Systems 37(2024), 99128–99144

work page 2024
[6]

In European Conference on Computer Vision(2024), Springer, pp

Cheng, J., and Amiri, H.Multidelete for multimodal machine unlearning. In European Conference on Computer Vision(2024), Springer, pp. 165–184

work page 2024
[7]

Cheng, J., Dasoulas, G., He, H., Agarwal, C., and Zitnik, M.Gnndelete: A general strategy for unlearning in graph neural networks.arXiv preprint arXiv:2302.13406(2023)

work page arXiv 2023
[8]

Reliability of cka as a similarity measure in deep learning.arXiv preprint arXiv:2210.16156(2022)

Davari, M., Horoi, S., Natik, A., Lajoie, G., Wolf, G., and Belilovsky, E. Reliability of cka as a similarity measure in deep learning.arXiv preprint arXiv:2210.16156(2022)

work page arXiv 2022
[9]

Regulation (eu) 2016/679 (general data protection regulation), 2016

European Parliament and Council of the European Union. Regulation (eu) 2016/679 (general data protection regulation), 2016. Art. 17 - Right to erasure (’right to be forgotten’)

work page 2016
[10]

Ge, Y., Liu, S., Fu, Z., Tan, J., Li, Z., Xu, S., Li, Y., Xian, Y., and Zhang, Y.A survey on trustworthy recommender systems.ACM Transactions on Recommender Systems 3, 2 (2024), 1–68

work page 2024
[11]

InPro- ceedings of the AAAI Conference on Artificial Intelligence(2021), vol

Graves, L., Nagisetty, V., and Ganesh, V.Amnesiac machine learning. InPro- ceedings of the AAAI Conference on Artificial Intelligence(2021), vol. 35, pp. 11516– 11524

work page 2021
[12]

Hou, Y., Li, J., He, Z., Y an, A., Chen, X., and McAuley, J.Bridging language and items for retrieval and recommendation.arXiv preprint arXiv:2403.03952(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

InProceedings of the AAAI Conference on Artificial Intelligence(2025), vol

Hu, J., Hooi, B., He, B., and Wei, Y.Modality-independent graph neural networks with global transformers for multimodal recommendation. InProceedings of the AAAI Conference on Artificial Intelligence(2025), vol. 39, pp. 11790–11798

work page 2025
[14]

Huang, C., Huang, H., Yu, T., Xie, K., Wu, J., Zhang, S., Mcauley, J., Jannach, D., and Y ao, L.A survey of foundation model-powered recommender systems: From feature-based, generative to agentic paradigms.arXiv preprint arXiv:2504.16420 (2025)

work page arXiv 2025
[15]

InProceedings of the 17th ACM International Conference on Web Search and Data Mining(2024), pp

Kim, Y., Kim, T., Shin, W.-Y., and Kim, S.-W.Monet: Modality-embracing graph convolutional network and target-aware attention for multimedia recommenda- tion. InProceedings of the 17th ACM International Conference on Web Search and Data Mining(2024), pp. 332–340

work page 2024
[16]

Adam: A Method for Stochastic Optimization

Kingma, D. P., and Ba, J.Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[17]

InInternational conference on machine learning (2019), PMlR, pp

Kornblith, S., Norouzi, M., Lee, H., and Hinton, G.Similarity of neural network representations revisited. InInternational conference on machine learning (2019), PMlR, pp. 3519–3529

work page 2019
[18]

Ultrare: Enhancing receraser for recommendation unlearning via error decompo- sition.Advances in Neural Information Processing Systems 36(2023), 12611–12625

Li, Y., Chen, C., Zhang, Y., Liu, W., Lyu, L., Zheng, X., Meng, D., and W ang, J. Ultrare: Enhancing receraser for recommendation unlearning via error decompo- sition.Advances in Neural Information Processing Systems 36(2023), 12611–12625

work page 2023
[19]

Expert Systems with Applications 234(2023), 121025

Li, Y., Chen, C., Zheng, X., Zhang, Y., Gong, B., W ang, J., and Chen, L.Selective and collaborative influence function for efficient recommendation unlearning. Expert Systems with Applications 234(2023), 121025

work page 2023
[20]

Li, Y., Feng, X., Chen, C., and Y ang, Q.A survey on recommendation unlearning: Fundamentals, taxonomy, evaluation, and open questions.IEEE Transactions on Knowledge and Data Engineering 38, 2 (2025), 781–799

work page 2025
[21]

Liu, Q., Hu, J., Xiao, Y., Zhao, X., Gao, J., W ang, W., Li, Q., and Tang, J.Multi- modal recommender systems: A survey.ACM Computing Surveys 57, 2 (2024), 1–17

work page 2024
[22]

InProceedings of the AAAI Conference on Artificial Intelligence(2024), vol

Liu, Z., Wang, T., Huai, M., and Miao, C.Backdoor attacks via machine un- learning. InProceedings of the AAAI Conference on Artificial Intelligence(2024), vol. 38, pp. 14115–14123

work page 2024
[23]

T., Huynh, T

Nguyen, T. T., Huynh, T. T., Ren, Z., Nguyen, P. L., Liew, A. W.-C., Yin, H., and Nguyen, Q. V. H.A survey of machine unlearning.ACM Transactions on Intelligent Systems and Technology 16, 5 (2025), 1–46

work page 2025
[24]

Ni, J., Li, J., and McAuley, J.Justifying recommendations using distantly- labeled reviews and fine-grained aspects. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)(2019), pp. 188–197

work page 2019
[25]

BPR: Bayesian Personalized Ranking from Implicit Feedback

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. Bpr: Bayesian personalized ranking from implicit feedback.arXiv preprint arXiv:1205.2618(2012)

work page internal anchor Pith review arXiv 2012
[26]

Sinha, Y., Mandal, M., and Kankanhalli, M.Multi-modal recommendation unlearning.arXiv preprint arXiv:2405.15328(2024)

work page arXiv 2024
[27]

InProceedings of the 28th ACM international conference on multimedia(2020), pp

Wei, Y., W ang, X., Nie, L., He, X., and Chua, T.-S.Graph-refined convolutional network for multimedia recommendation with implicit feedback. InProceedings of the 28th ACM international conference on multimedia(2020), pp. 3541–3549

work page 2020
[28]

InProceedings of the 27th ACM international conference on multimedia(2019), pp

Wei, Y., W ang, X., Nie, L., He, X., Hong, R., and Chua, T.-S.Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. InProceedings of the 27th ACM international conference on multimedia(2019), pp. 1437–1445

work page 2019
[29]

arXiv preprint arXiv:2502.15711(2025)

Xu, J., Chen, Z., Y ang, S., Li, J., W ang, W., Hu, X., Hoi, S., and Ngai, E.A survey on multimodal recommender systems: Recent advances and future directions. arXiv preprint arXiv:2502.15711(2025)

work page arXiv 2025
[30]

Yi, L., and Wei, Z.Scalable and certifiable graph unlearning: Overcoming the approximation error barrier.arXiv preprint arXiv:2408.09212(2024)

work page arXiv 2024
[31]

InProceedings of the 31st ACM international conference on multimedia(2023), pp

Yu, P., Tan, Z., Lu, G., and Bao, B.-K.Multi-view graph convolutional network for multimedia recommendation. InProceedings of the 31st ACM international conference on multimedia(2023), pp. 6576–6585

work page 2023
[32]

InProceedings of the 32nd ACM International Conference on Multimedia(2024), pp

Zhang, J., Liu, G., Liu, Q., Wu, S., and Wang, L.Modality-balanced learning for multimedia recommendation. InProceedings of the 32nd ACM International Conference on Multimedia(2024), pp. 7551–7560

work page 2024
[33]

InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security(2021), pp

Zhang, M., Ren, Z., Wang, Z., Ren, P., Chen, Z., Hu, P., and Zhang, Y.Mem- bership inference attacks against recommender systems. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security(2021), pp. 864–879

work page 2021
[34]

Zhang, Y., Hu, Z., Bai, Y., Wu, J., Wang, Q., and Feng, F.Recommendation unlearning via influence function.ACM Transactions on Recommender Systems 3, 2 (2024), 1–23

work page 2024
[35]

Zhang, Y., Lu, Z., Zhang, F., Wang, H., and Li, S.Machine unlearning by reversing the continual learning.Applied Sciences 13, 16 (2023), 9341

work page 2023
[36]

Zhou, H., Zhou, X., Zeng, Z., Zhang, L., and Shen, Z.A comprehensive survey on multimodal recommender systems: Taxonomy, evaluation, and future directions.arXiv preprint arXiv:2302.04473(2023)

work page arXiv 2023
[37]

A survey of real-world recommender systems: Challenges, constraints, and industrial perspectives.arXiv preprint arXiv:2509.06002, 2025

Zou, K., and Sun, A.A survey of real-world recommender systems: Challenges, constraints, and industrial perspectives.arXiv preprint arXiv:2509.06002(2025). Zhou et al. A Detailed Experiment Setup A.1 Setup. We conducted experiments on a single NVIDIA GeForce RTX 4080 GPU with CUDA 12.1 and Python 3.10. Unless otherwise stated, we used Adam [ 16] with lear...

work page arXiv 2025