SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models

Chengjie Hong; Feixiang He; He Wang; Lulu Kang; Yiheng Zeng

arxiv: 2605.17985 · v1 · pith:EADKWU6Enew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models

Chengjie Hong , Feixiang He , Yiheng Zeng , Lulu Kang , He Wang This is my paper

Pith reviewed 2026-05-20 12:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords model compressionphysics foundation modelssingular value decompositionsensitivity analysisAI for sciencefidelity enforcement

0 comments

The pith

A sensitivity-aware SVD compresses physics foundation models at much higher ratios while preserving accuracy and physical fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a compression technique for physics foundation models that explicitly tracks how much each layer affects the model's output functions, particularly the partial derivatives that encode physical dynamics. Standard compression ignores this sensitivity and often ruins the model's ability to represent spatiotemporal behavior, even when overall loss looks acceptable. By guiding the SVD process with loss-aware sensitivity measured in function space, the method keeps the compressed model accurate on physics tasks. This matters because physics foundation models are too large to deploy without compression, yet losing fidelity makes them unusable for real scientific work.

Core claim

The central claim is that modeling loss-aware layer sensitivity in the output function space during SVD compression provides a new route to compressing physics foundation models while preserving accuracy and physical fidelity, yielding substantially higher compression ratios than existing methods across multiple models and datasets, in some cases by orders of magnitude.

What carries the argument

The sensitivity-aware fidelity-enforcing SVD, which measures and incorporates loss-aware layer sensitivity within the output function space to direct the compression.

If this is right

Physics foundation models achieve significantly higher compression ratios than with conventional SVD.
Model accuracy on physics tasks remains high or improves after compression.
Physical fidelity, including dynamics captured by derivatives, degrades far less than under standard compression.
Reduced memory footprint and faster inference make large scientific models more practical to deploy.
The approach supports development of efficient, sustainable physics foundation models for AI for Science.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensitivity modeling in function space could extend to compressing models for other functional data domains such as fluid dynamics or molecular simulations.
Combining this SVD variant with quantization or pruning might produce even larger efficiency improvements for scientific models.
Domain-specific compression that respects output-function sensitivity may become necessary for reliable AI in any field involving differential equations.

Load-bearing premise

Physics data's partial derivatives that encode spatiotemporal dynamics are highly sensitive to compression in ways standard methods do not account for.

What would settle it

Measure prediction error on partial derivatives or simulation outputs for a physics foundation model compressed with and without the sensitivity term; if the performance gap vanishes, the sensitivity modeling is not essential to the gains.

Figures

Figures reproduced from arXiv: 2605.17985 by Chengjie Hong, Feixiang He, He Wang, Lulu Kang, Yiheng Zeng.

**Figure 1.** Figure 1: Predicted velocity fields (u, v) on the NS-PwC dataset at compression ratio 0.2 over three rollout horizons (t = 5, 10, 15). 4.3 Comparisons on VICON Unlike Poseidon, which is fine-tuned separately on each dataset prior to compression, VICON is a unified model trained across multiple physical domains. To account for this multi-domain setting, we evaluate two variants of our method: Ours and Ours*. Both use… view at source ↗

**Figure 2.** Figure 2: Predicted wave field u on the Wave-Gauss dataset at compression ratio 0.2 over three rollout horizons (t = 5, 10, 15). These observations align with the quantitative results in [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Predicted physical fields (ρ, u, v, p) on the CE-RM dataset at compression ratio 0.2 over three rollout horizons (t = 5, 10, 15). 17 [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

We propose a new method for compressing physics foundation models (PFMs) which is a new trend in AI for Science. While model compression is essential for reducing memory use and accelerating inference in large foundation models, it remains under-explored for PFMs, where preserving physical fidelity is crucial. The challenge lies in the functional nature of physics data, where partial derivatives encode spatiotemporal dynamics and exhibit high sensitivity to compression. Conventional compression methods ignore this structure, often causing severe performance degradation or failure. To address this, we introduce a sensitivity-aware fidelity-enforcing compression framework that explicitly models loss-aware layer sensitivity in the output function space during compression. This provides a new route to compressing scientific foundation models while preserving accuracy and physical fidelity. Experiments show substantial gains over existing methods across multiple models and datasets, achieving significantly higher compression ratios while maintaining accuracy, in some cases by orders of magnitude. More broadly, the work potentially leads to a new subfield of efficient, deployable, and sustainable scientific foundation models in AI for Science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a sensitivity-aware SVD for compressing physics foundation models to preserve fidelity, with reported compression gains that look promising but rest on experiments whose details are still thin.

read the letter

The main point is that they have built a compression method around SVD that factors in how sensitive different layers are to the output function, specifically to handle the partial derivatives that carry the dynamics in physics data. Standard low-rank methods tend to ignore that structure and break the model, so the idea of making the compression loss-aware in function space is a reasonable response to the problem. They position it as filling a gap for physics foundation models where fidelity matters more than in generic language or vision tasks. The experiments apparently show better compression ratios than baselines while holding accuracy, sometimes by large margins across a few models and datasets. That kind of practical improvement would matter for running these models on limited hardware in climate or materials work. The framing is straightforward and the motivation tracks with known issues in scientific machine learning. On the weaker side, the abstract gives no equations for the sensitivity term or ablations that isolate whether the sensitivity modeling itself drives the gains versus a plain fidelity penalty. Without seeing the actual tables, error bars, or dataset specifics it is hard to judge how much the claimed orders-of-magnitude edge depends on careful tuning or particular choices of physics problems. The broader suggestion that this opens a new subfield also feels early. Readers working on model efficiency for AI-for-science would get the most out of it, especially if they already use SVD-style compression and want to adapt it to physics constraints. It is worth sending to referees so they can check the implementation and controls directly.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes SAFE-SVD, a sensitivity-aware fidelity-enforcing SVD framework for compressing physics foundation models (PFMs). It explicitly models loss-aware layer sensitivity in the output function space to address the high sensitivity of partial derivatives encoding spatiotemporal dynamics in physics data, which conventional low-rank methods ignore. Experiments across multiple PFMs and datasets are reported to yield substantially higher compression ratios while preserving accuracy and physical fidelity, with gains reaching orders of magnitude in some cases. The work positions this as a route toward efficient, deployable scientific foundation models in AI for Science.

Significance. If the reported empirical gains hold under scrutiny, the contribution could be significant for AI-for-Science by providing a compression technique attuned to the functional structure of physics data. It offers a concrete path to reducing memory and inference costs for large PFMs without the severe degradation often seen in generic SVD or pruning approaches. The framing as a potential new subfield is forward-looking but rests on the reproducibility and generality of the sensitivity modeling.

minor comments (3)

[Abstract] The abstract states that experiments achieve 'orders of magnitude' improvements in compression ratios; the main text should clarify in which specific metrics (e.g., parameter count vs. FLOPs vs. wall-clock) and for which model-dataset pairs this holds, with exact numbers and error bars.
[Section 4] Section 4 (Experiments) would benefit from an explicit ablation isolating the contribution of the sensitivity term versus the fidelity-enforcing regularizer alone, to confirm that the claimed gains are not attributable to generic low-rank approximation.
[Section 3.2] Notation for the layer-sensitivity matrix S_l in Eq. (7) should be cross-referenced to the loss definition in Eq. (3) to avoid ambiguity about whether sensitivity is computed in parameter space or output-function space.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of SAFE-SVD and the recommendation for minor revision. We agree that reproducibility and generality of the sensitivity modeling are important and have strengthened the manuscript accordingly.

read point-by-point responses

Referee: If the reported empirical gains hold under scrutiny, the contribution could be significant... rests on the reproducibility and generality of the sensitivity modeling.

Authors: We appreciate the emphasis on scrutiny. In the revised manuscript we have expanded Section 4 with additional ablation studies on the sensitivity estimation procedure, included pseudocode for the full SAFE-SVD pipeline, and released a public code repository containing all training and evaluation scripts. We have also added results on two further PFMs (one from fluid dynamics and one from climate modeling) to demonstrate broader applicability beyond the original set of models. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and available description present SAFE-SVD as an independent sensitivity-aware SVD framework that models loss-aware layer sensitivity in output function space. No equations, derivations, or self-citations are shown that reduce any claimed result to a fitted input or prior self-referential definition. The central claims rest on empirical gains across models and datasets rather than on any construction that equates outputs to inputs by definition. This is the expected honest non-finding for a compression method paper whose technical details are not reducible from the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method is described at a conceptual level without mathematical details or assumptions listed.

pith-pipeline@v0.9.0 · 5713 in / 1060 out tokens · 35932 ms · 2026-05-20T12:57:42.853968+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min W′ (1−α)E[∥L(WX−W′X)∥²₂] + αE[∥L(WX−W′X′)∥²₂] ... FZ = L⊤L ... SVDk(M∗), W′=L⁻¹MR⁻¹
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Sobolev loss ... orders of partial derivatives ... physics-informed layer importance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 3 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 9 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Slicegpt: Compress large language models by deleting rows and columns

Saleh Ashkboos, Maximilian L Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. Slicegpt: Compress large language models by deleting rows and columns. arXiv preprint arXiv:2401.15024, 2024

work page arXiv 2024
[3]

Neural operators for accelerating scientific simulations and design

Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, and Anima Anandkumar. Neural operators for accelerating scientific simulations and design. Nature Reviews Physics, 6(5):320–328, 2024

work page 2024
[4]

Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley Osher. Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

work page arXiv 2024
[5]

Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

work page 2023
[6]

Dipsvd: Dual-importance protected svd for efficient llm compression

Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuan- long Xie, and Yao Zhu. Dipsvd: Dual-importance protected svd for efficient llm compression. arXiv preprint arXiv:2506.20353, 2025

work page arXiv 2025
[7]

Sparsegpt: Massive language models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. Sparsegpt: Massive language models can be accurately pruned in one-shot. InInternational conference on machine learning, pages 10323–10337. PMLR, 2023

work page 2023
[8]

Poseidon: Efficient foundation models for pdes

Maximilian Herde, Bogdan Raoni ´c, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Em- manuel De Bezenac, and Siddhartha Mishra. Poseidon: Efficient foundation models for pdes. Advances in Neural Information Processing Systems, 37:72525–72624, 2024

work page 2024
[9]

Language model compression with weighted low-rank factorization.arXiv preprint arXiv:2207.00112, 2022

Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, and Hongxia Jin. Language model compression with weighted low-rank factorization.arXiv preprint arXiv:2207.00112, 2022

work page arXiv 2022
[10]

Saes-svd: Self-adaptive suppression of accumulated and local errors for svd-based llm compression.arXiv preprint arXiv:2602.03051, 2026

Xing Hu, Dawei Yang, Yuan Cheng, Zhixuan Chen, and Zukang Xu. Saes-svd: Self-adaptive suppression of accumulated and local errors for svd-based llm compression.arXiv preprint arXiv:2602.03051, 2026

work page arXiv 2026
[11]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model, 2024.URL https://arxiv. org/abs/2406.09246, 1(2):4, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Graphcast: Learning skillful medium-range global weather forecasting

Richard Lam et al. Graphcast: Learning skillful medium-range global weather forecasting. Science, 2023

work page 2023
[13]

Brecq: Pushing the limit of post-training quantization by block reconstruction

Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426, 2021

work page arXiv 2021
[14]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems, 6:87–100, 2024

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems, 6:87–100, 2024

work page 2024
[15]

Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024

work page 2024
[16]

Llm-pruner: On the structural pruning of large language models.Advances in neural information processing systems, 36:21702–21720, 2023

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in neural information processing systems, 36:21702–21720, 2023

work page 2023
[17]

Multiple physics pretraining for spatiotemporal surrogate models.Advances in Neural Information Processing Systems, 37:119301–119335, 2024

Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, et al. Multiple physics pretraining for spatiotemporal surrogate models.Advances in Neural Information Processing Systems, 37:119301–119335, 2024. 10

work page 2024
[18]

Gupta, and Aditya Grover

Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K Gupta, and Aditya Grover. Climax: A foundation model for weather and climate.arXiv preprint arXiv:2301.10343, 2023

work page arXiv 2023
[19]

Pretraining codomain attention neural operators for solving multiphysics pdes.Advances in Neural Information Processing Systems, 37:104035–104064, 2024

Md Ashiqur Rahman, Robert Joseph George, Mogab Elleithy, Daniel Leibovici, Zongyi Li, Boris Bonev, Colin White, Julius Berner, Raymond A Yeh, Jean Kossaifi, et al. Pretraining codomain attention neural operators for solving multiphysics pdes.Advances in Neural Information Processing Systems, 37:104035–104064, 2024

work page 2024
[20]

Pb-llm: Partially binarized large language models.arXiv preprint arXiv:2310.00034, 2023

Yuzhang Shang, Zhihang Yuan, Qiang Wu, and Zhen Dong. Pb-llm: Partially binarized large language models.arXiv preprint arXiv:2310.00034, 2023

work page arXiv 2023
[21]

Towards a foundation model for partial differential equations across physics domains

Eduardo Soares, Emilio Vital Brazil, Victor Shirasuna, Breno WSR de Carvalho, and Cristiano Malossi. Towards a foundation model for partial differential equations across physics domains. arXiv preprint arXiv:2511.21861, 2025

work page arXiv 2025
[22]

Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior.Advances in Neural Information Pro- cessing Systems, 36:71242–71262, 2023

Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael W Mahoney, and Amir Gholami. Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior.Advances in Neural Information Pro- cessing Systems, 36:71242–71262, 2023

work page 2023
[23]

Pdebench: An extensive benchmark for scientific machine learning.Advances in neural information processing systems, 35:1596–1611, 2022

Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning.Advances in neural information processing systems, 35:1596–1611, 2022

work page 2022
[24]

Lin Wang and Kuk-Jin Yoon. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks.IEEE transactions on pattern analysis and machine intelligence, 44(6):3048–3068, 2021

work page 2021
[25]

Dobi-svd: Differentiable svd for llm compression and some new perspectives.arXiv preprint arXiv:2502.02723, 2025

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, and Chenfeng Xu. Dobi-svd: Differentiable svd for llm compression and some new perspectives.arXiv preprint arXiv:2502.02723, 2025

work page arXiv 2025
[26]

Svd-llm: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. Svd-llm: Truncation-aware singular value decomposition for large language model compression.arXiv preprint arXiv:2403.07378, 2024

work page arXiv 2024
[27]

Svd-llm v2: Optimizing singular value truncation for large language model compression

Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, and Mi Zhang. Svd-llm v2: Optimizing singular value truncation for large language model compression. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4287–4296, 2025

work page 2025
[28]

Sheared llama: Accelerating language model pre-training via structured pruning.arXiv preprint arXiv:2310.06694, 2023

Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, and Danqi Chen. Sheared llama: Accelerating language model pre-training via structured pruning.arXiv preprint arXiv:2310.06694, 2023

work page arXiv 2023
[29]

A survey on model compression and acceleration for pretrained language models

Canwen Xu and Julian McAuley. A survey on model compression and acceleration for pretrained language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10566–10575, 2023

work page 2023
[30]

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Zhihang Yuan, Yuzhang Shang, Yue Song, Dawei Yang, Qiang Wu, Yan Yan, and Guangyu Sun. Asvd: Activation-aware singular value decomposition for compressing large language models. arXiv preprint arXiv:2312.05821, 2023

work page internal anchor Pith review arXiv 2023
[31]

Accelerating very deep convolutional networks for classification and detection.IEEE transactions on pattern analysis and machine intelligence, 38(10):1943–1955, 2015

Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating very deep convolutional networks for classification and detection.IEEE transactions on pattern analysis and machine intelligence, 38(10):1943–1955, 2015

work page 1943
[32]

Decoupled knowledge distillation

Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. Decoupled knowledge distillation. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022. 11 Appendix A Detailed Derivation of of Physics-Aware Compression This appendix provides the preliminaries, detailed derivations and rank selection re...

work page 2022

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 9 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Slicegpt: Compress large language models by deleting rows and columns

Saleh Ashkboos, Maximilian L Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. Slicegpt: Compress large language models by deleting rows and columns. arXiv preprint arXiv:2401.15024, 2024

work page arXiv 2024

[3] [3]

Neural operators for accelerating scientific simulations and design

Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, and Anima Anandkumar. Neural operators for accelerating scientific simulations and design. Nature Reviews Physics, 6(5):320–328, 2024

work page 2024

[4] [4]

Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley Osher. Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

work page arXiv 2024

[5] [5]

Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms.Advances in neural information processing systems, 36:10088– 10115, 2023

work page 2023

[6] [6]

Dipsvd: Dual-importance protected svd for efficient llm compression

Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuan- long Xie, and Yao Zhu. Dipsvd: Dual-importance protected svd for efficient llm compression. arXiv preprint arXiv:2506.20353, 2025

work page arXiv 2025

[7] [7]

Sparsegpt: Massive language models can be accurately pruned in one-shot

Elias Frantar and Dan Alistarh. Sparsegpt: Massive language models can be accurately pruned in one-shot. InInternational conference on machine learning, pages 10323–10337. PMLR, 2023

work page 2023

[8] [8]

Poseidon: Efficient foundation models for pdes

Maximilian Herde, Bogdan Raoni ´c, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Em- manuel De Bezenac, and Siddhartha Mishra. Poseidon: Efficient foundation models for pdes. Advances in Neural Information Processing Systems, 37:72525–72624, 2024

work page 2024

[9] [9]

Language model compression with weighted low-rank factorization.arXiv preprint arXiv:2207.00112, 2022

Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, and Hongxia Jin. Language model compression with weighted low-rank factorization.arXiv preprint arXiv:2207.00112, 2022

work page arXiv 2022

[10] [10]

Saes-svd: Self-adaptive suppression of accumulated and local errors for svd-based llm compression.arXiv preprint arXiv:2602.03051, 2026

Xing Hu, Dawei Yang, Yuan Cheng, Zhixuan Chen, and Zukang Xu. Saes-svd: Self-adaptive suppression of accumulated and local errors for svd-based llm compression.arXiv preprint arXiv:2602.03051, 2026

work page arXiv 2026

[11] [11]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model, 2024.URL https://arxiv. org/abs/2406.09246, 1(2):4, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Graphcast: Learning skillful medium-range global weather forecasting

Richard Lam et al. Graphcast: Learning skillful medium-range global weather forecasting. Science, 2023

work page 2023

[13] [13]

Brecq: Pushing the limit of post-training quantization by block reconstruction

Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426, 2021

work page arXiv 2021

[14] [14]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems, 6:87–100, 2024

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems, 6:87–100, 2024

work page 2024

[15] [15]

Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024

work page 2024

[16] [16]

Llm-pruner: On the structural pruning of large language models.Advances in neural information processing systems, 36:21702–21720, 2023

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in neural information processing systems, 36:21702–21720, 2023

work page 2023

[17] [17]

Multiple physics pretraining for spatiotemporal surrogate models.Advances in Neural Information Processing Systems, 37:119301–119335, 2024

Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, et al. Multiple physics pretraining for spatiotemporal surrogate models.Advances in Neural Information Processing Systems, 37:119301–119335, 2024. 10

work page 2024

[18] [18]

Gupta, and Aditya Grover

Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K Gupta, and Aditya Grover. Climax: A foundation model for weather and climate.arXiv preprint arXiv:2301.10343, 2023

work page arXiv 2023

[19] [19]

Pretraining codomain attention neural operators for solving multiphysics pdes.Advances in Neural Information Processing Systems, 37:104035–104064, 2024

Md Ashiqur Rahman, Robert Joseph George, Mogab Elleithy, Daniel Leibovici, Zongyi Li, Boris Bonev, Colin White, Julius Berner, Raymond A Yeh, Jean Kossaifi, et al. Pretraining codomain attention neural operators for solving multiphysics pdes.Advances in Neural Information Processing Systems, 37:104035–104064, 2024

work page 2024

[20] [20]

Pb-llm: Partially binarized large language models.arXiv preprint arXiv:2310.00034, 2023

Yuzhang Shang, Zhihang Yuan, Qiang Wu, and Zhen Dong. Pb-llm: Partially binarized large language models.arXiv preprint arXiv:2310.00034, 2023

work page arXiv 2023

[21] [21]

Towards a foundation model for partial differential equations across physics domains

Eduardo Soares, Emilio Vital Brazil, Victor Shirasuna, Breno WSR de Carvalho, and Cristiano Malossi. Towards a foundation model for partial differential equations across physics domains. arXiv preprint arXiv:2511.21861, 2025

work page arXiv 2025

[22] [22]

Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior.Advances in Neural Information Pro- cessing Systems, 36:71242–71262, 2023

Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael W Mahoney, and Amir Gholami. Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior.Advances in Neural Information Pro- cessing Systems, 36:71242–71262, 2023

work page 2023

[23] [23]

Pdebench: An extensive benchmark for scientific machine learning.Advances in neural information processing systems, 35:1596–1611, 2022

Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning.Advances in neural information processing systems, 35:1596–1611, 2022

work page 2022

[24] [24]

Lin Wang and Kuk-Jin Yoon. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks.IEEE transactions on pattern analysis and machine intelligence, 44(6):3048–3068, 2021

work page 2021

[25] [25]

Dobi-svd: Differentiable svd for llm compression and some new perspectives.arXiv preprint arXiv:2502.02723, 2025

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, and Chenfeng Xu. Dobi-svd: Differentiable svd for llm compression and some new perspectives.arXiv preprint arXiv:2502.02723, 2025

work page arXiv 2025

[26] [26]

Svd-llm: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. Svd-llm: Truncation-aware singular value decomposition for large language model compression.arXiv preprint arXiv:2403.07378, 2024

work page arXiv 2024

[27] [27]

Svd-llm v2: Optimizing singular value truncation for large language model compression

Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, and Mi Zhang. Svd-llm v2: Optimizing singular value truncation for large language model compression. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4287–4296, 2025

work page 2025

[28] [28]

Sheared llama: Accelerating language model pre-training via structured pruning.arXiv preprint arXiv:2310.06694, 2023

Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, and Danqi Chen. Sheared llama: Accelerating language model pre-training via structured pruning.arXiv preprint arXiv:2310.06694, 2023

work page arXiv 2023

[29] [29]

A survey on model compression and acceleration for pretrained language models

Canwen Xu and Julian McAuley. A survey on model compression and acceleration for pretrained language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10566–10575, 2023

work page 2023

[30] [30]

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Zhihang Yuan, Yuzhang Shang, Yue Song, Dawei Yang, Qiang Wu, Yan Yan, and Guangyu Sun. Asvd: Activation-aware singular value decomposition for compressing large language models. arXiv preprint arXiv:2312.05821, 2023

work page internal anchor Pith review arXiv 2023

[31] [31]

Accelerating very deep convolutional networks for classification and detection.IEEE transactions on pattern analysis and machine intelligence, 38(10):1943–1955, 2015

Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. Accelerating very deep convolutional networks for classification and detection.IEEE transactions on pattern analysis and machine intelligence, 38(10):1943–1955, 2015

work page 1943

[32] [32]

Decoupled knowledge distillation

Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. Decoupled knowledge distillation. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022. 11 Appendix A Detailed Derivation of of Physics-Aware Compression This appendix provides the preliminaries, detailed derivations and rank selection re...

work page 2022