{"total":64,"items":[{"citing_arxiv_id":"2606.26538","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry","primary_cat":"cs.LG","submitted_at":"2026-06-25T02:25:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25174","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Growing a Neural Network in Breadth, Depth, and Time","primary_cat":"q-bio.NC","submitted_at":"2026-05-24T17:11:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Recurrent CNNs are trained with joint task and resource costs on breadth, depth, and time, yielding organic growth in all three dimensions that trades off for accuracy and matches human reaction times on object recognition.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23226","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MASQ: Accelerating Masked Diffusion via Stage-Wise Multi-Precision Quantization","primary_cat":"cs.AR","submitted_at":"2026-05-22T04:37:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MASQ claims up to 16.06x speedup and 4.18x energy gain over A100 for masked diffusion via stage-wise multi-precision quantization and specialized hardware units while preserving quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21964","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Dual-Integrated Low-Latency Single-Lens Infrared Computational Imaging for Object Detection","primary_cat":"cs.CV","submitted_at":"2026-05-21T03:50:52+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21421","ref_index":82,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing","primary_cat":"cs.CV","submitted_at":"2026-05-20T17:14:57+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21560","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems","primary_cat":"cs.LG","submitted_at":"2026-05-20T14:59:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AutoMCU uses feasibility-first LLM multi-agent coordination to automate MCU-constrained neural network design, delivering competitive accuracy on CIFAR-10/100 in 1-2 hours versus hundreds of GPU hours for prior HW-NAS methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20669","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection","primary_cat":"cs.CV","submitted_at":"2026-05-20T03:36:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"GSA-YOLO modifies YOLOv8n with structured sparsity via Group Lasso and Sparse Structure Selection plus Adaptive Knowledge Distillation, reporting 189.62 FPS and mAP50:95 gains of 2.4% and 1.8% on HiXray and PIDray datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19568","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder","primary_cat":"cs.CL","submitted_at":"2026-05-19T09:13:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"m3BERT uses a three-stage Matryoshka pretraining approach on a bidirectional encoder to support variable embedding sizes while outperforming prior models on large-scale retrieval tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17160","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When Bits Break Recourse: Counterfactual-Faithful Quantization","primary_cat":"cs.LG","submitted_at":"2026-05-16T21:19:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CFQ trains quantizer parameters and mixed-precision allocation to preserve counterfactual recourse validity, cost, and direction on Adult, German Credit, and COMPAS while matching accuracy of standard quantizers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18878","ref_index":260,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis","primary_cat":"eess.SP","submitted_at":"2026-05-16T02:49:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15551","ref_index":57,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis","primary_cat":"cs.LG","submitted_at":"2026-05-15T02:44:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"3 (Finite-table Saturation) Fixq,r,π, andS (q) π,r. We omit the fixedqfor readability and writeS π,r withH π,r :=|S π,r|. Let Ππ(zq,r) = (w1, . . . ,wm)be the exposed blocks of sizeπ. For eachu∈S π,r, defineI u :=1{c π(u;z q,r)>0}. Thena π(zq,r) =P u∈Sπ,r Iu, and by linearity E[aπ(zq,r)] =E   X u∈Sπ,r Iu   = X u∈Sπ,r E[Iu] = X u∈Sπ,r P(cπ(u;z q,r)>0).(57) 20 Letρ π,r(u)be the probability that one exposed block equalsu. Since the exposed blocks are sampled independently, the probability that none of themexposed blocks equalsuis(1−ρ π,r(u))m. Hence E[aπ(zq,r)] = X u∈Sπ,r [1−(1−ρ π,r(u))m].(58) In the uniform setting, which serves as a worst-case upper bound on the number of blocks required for saturation, we assume"},{"citing_arxiv_id":"2605.15435","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"On the Stability of Growth in Structural Plasticity","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:27:38+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13997","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts","primary_cat":"cs.LG","submitted_at":"2026-05-13T18:07:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"HodgeCover isolates the harmonic kernel of a simplicial Laplacian on an expert 2-complex to identify irreducible merge cycles and selects experts for aggressive compression, matching or exceeding baselines on open-weight MoE models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13979","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Winning Lottery Tickets in Neural Networks via a Quantum-Inspired Classical Algorithm","primary_cat":"quant-ph","submitted_at":"2026-05-13T18:00:56+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A classical polynomial-time algorithm for optimized sampling of lottery tickets in neural networks removes the exponential dependence on data dimension from prior classical approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16397","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Trajectory-Aware Adaptive Inference in Object Detection Models","primary_cat":"cs.CV","submitted_at":"2026-05-12T16:04:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Introduces an early-exit mechanism in YOLOv8 that uses inter-vessel distance and closing speed from trajectories to adapt computation depth per frame in maritime scenes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11800","ref_index":51,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems","primary_cat":"cs.LG","submitted_at":"2026-05-12T08:57:59+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ROMER cuts perplexity by up to 59% in noisy analog CIM environments for MoE LLMs via expert replacement and router recalibration calibrated on real-chip measurements.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11222","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-11T20:33:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ADMM-Q is a new post-training quantization method using ADMM operator splitting that reduces WikiText-2 perplexity compared to GPTQ on Qwen3-8B across W3A16, W4A8, and W2A4KV4 settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10933","ref_index":57,"ref_count":3,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices","primary_cat":"cs.LG","submitted_at":"2026-05-11T17:58:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18800","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Theory-optimal Quantization Based on Flatness","primary_cat":"cs.LG","submitted_at":"2026-05-11T10:51:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper introduces the Flatness metric, derives a theory-optimal quantization solution, and presents BDQ that uses bidirectional diagonal transformations to reduce outlier impact, achieving under 1% drop at W4A4 on LLaMA-3-8B.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08885","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning","primary_cat":"cs.LG","submitted_at":"2026-05-09T11:07:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"MLIPs learn a parameterized energy E= Φ θ(R,Z) over atomic positions R and species Z, with forces derived as Fi =−∇ riΦθ. State-of-the-art MLIPs encode physical symmetries via SO(3) equivariant message passing [8, 9, 23], where node features are higher-order tensorsh(t) j,klm indexed by channel k and irreducible representation (l, m), coupled via Clebsch-Gordan products with O(L6 max) complexity. Structural pruning [48, 47] removes coherent parameter groups to yield dense, hardware- friendly compressed models. However, directly applying it to SO(3) equivariant architectures is non-trivial, as naive removal of feature dimensions breaks the symmetry of the tensor product path (see Appendix A and B for details). Designing a structural pruning framework for SO(3) equivariant atomistic foundation models requires"},{"citing_arxiv_id":"2605.07378","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns","primary_cat":"cs.LG","submitted_at":"2026-05-08T07:33:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SWAP-Score evaluates neural networks without training by quantifying sample-wise activation patterns, achieving high correlation with true performance on CIFAR-10 for CNNs and GLUE for Transformers while enabling fast NAS.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Index Terms-Zero-shot, Training-free, Performance Evalua- tion, Convolutional Neural Network, Transformer, Neural Archi- tecture Search. I. INTRODUCTION P ERFORMANCE evaluation of neural networks is es- sential in the deep learning field, spanning areas such as knowledge distillation, reinforcement learning, neural ar- chitecture search, and model pruning [1]-[6]. Conventional approaches evaluate neural network performance via back- propagation training, which often leads to prohibitively high computational costs in certain research areas. In knowledge distillation, training a student model to replicate the teacher model's behaviour through back-propagation imposes consid- erable computational demands, especially with large teacher"},{"citing_arxiv_id":"2605.07160","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals","primary_cat":"cs.CR","submitted_at":"2026-05-08T02:46:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Efficient and Adaptive Access via Sparsification.Wide lay- ers are computationally demanding even in standard, non-privacy- preserving settings. When a single layer contains tens of thousands of neurons, the cost of organizing, storing, and accessing its param- eters becomes a bottleneck. This state of affairs has motivated a sub- stantial body of work onsparsification[ 45]. That is, rather than acti- vating all neurons in the wide layer for every input, only a small sub- set participates, dramatically reducing computation per access. Spar- sification comes in two flavors:non-adaptive[ 35, 41, 42], in which the network is pruned to permanently remove neurons that do not contribute to model performance; andadaptive[ 10, 63, 92, 97], in"},{"citing_arxiv_id":"2605.06714","ref_index":109,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey","primary_cat":"cs.CV","submitted_at":"2026-05-07T00:19:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A comprehensive survey of edge deep learning in computer vision and medical diagnostics that presents a novel categorization of hardware platforms by performance and usage scenarios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02196","ref_index":37,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning","primary_cat":"cs.LG","submitted_at":"2026-05-04T03:54:14+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704-2713, 2018. [36] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research, 18:1-30, 2018. [37] Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016. URL https://arxiv.org/abs/1510.00149. [38] Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. Learned step size quantization. In International Conference on Learning Representations, 2020."},{"citing_arxiv_id":"2604.26587","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators","primary_cat":"cs.AR","submitted_at":"2026-04-29T12:10:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Sparse neural networks achieve better area and energy efficiency when executed on dense matrix multiplication accelerators using a Sparse-on-Dense approach than on dedicated sparse accelerators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26979","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Multibit neural inference in a N-ary crossbar architecture","primary_cat":"cs.AR","submitted_at":"2026-04-28T13:29:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Simulation of 4-state MTJ crossbars achieves 94.48% MNIST accuracy for neural inference, close to 97.56% software baseline, with analysis showing quantization as primary error and an optimal number of states per cell.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25421","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices","primary_cat":"cs.LG","submitted_at":"2026-04-28T09:29:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24940","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-27T19:29:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while matching or exceeding it on two text-classification benchmarks and compressing the","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24805","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"minAction.net: Energy-First Neural Architecture Design -- From Biological Principles to Systematic Validation","primary_cat":"cs.LG","submitted_at":"2026-04-27T06:26:36+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Large-scale experiments show architecture performance depends on task type, not universality, and a single-parameter energy penalty reduces computational energy by ~1000x with negligible accuracy cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20079","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks","primary_cat":"cs.LG","submitted_at":"2026-04-22T00:53:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Diffusion coding model CoDA shows smaller accuracy drops than Qwen3-1.7B under 2-4 bit quantization on HumanEval and MBPP.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18496","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Homodyne Photonic Tensor Processor exceeds 1,000-TOPS","primary_cat":"cs.ET","submitted_at":"2026-04-20T16:44:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A homodyne photonic tensor processor using TFLN transmitters and Si/SiN circuits demonstrates 1,000-6,000 TOPS throughput with 6-7 bit accuracy at up to 120 Gbaud/s clock rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17172","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"UCCL-Zip: Lossless Compression Supercharged GPU Communication","primary_cat":"cs.DC","submitted_at":"2026-04-19T00:05:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tensively explored quantization and other lossy compres- sion techniques to reduce data volume [1, 22, 30, 44, 48, 50]. While effective in improving bandwidth utilization, these approaches inevitably introduce numerical errors. These errors can slow convergence during training or degrade model accuracy at inference time, and may also cause training-inference mismatch in RL [ 18]. More recently, lossless compression techniques on GPUs have begun to emerge [3, 17, 19, 34, 37, 52], demonstrating the potential to reduce data size without sacrificing numerical fidelity. This observation opens up a promising opportunity: apply- ing lossless compression to GPU communication. However, despite its potential, this direction remains largely under-"},{"citing_arxiv_id":"2604.16113","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition","primary_cat":"cs.AR","submitted_at":"2026-04-17T14:49:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"devices [12], where inference must be performed under strict latency constraints. To enable deployment and reduce infer- ence latency under such constraints, CNNs are designed to be shallower [13] and architecturally simpler [14], with a reduced number of parameters [15, 16]. Techniques such as pruning and quantization have been widely adopted to reduce the hardware footprint of ML models [17]. Similarly, TinyML acceleration has focused on developing hardware support for such techniques, including native computation at lower bitwidths [18] or sparse processing [19]. However, these ap- proaches still heavily rely on multiplication operations, which naturally introduce performance and resource overheads. Multiplier-less inference has emerged as a promising"},{"citing_arxiv_id":"2604.09544","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism","primary_cat":"cs.CL","submitted_at":"2026-04-10T17:58:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08847","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization","primary_cat":"cs.CV","submitted_at":"2026-04-10T01:09:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeFakeQ introduces an adaptive bidirectional quantization method tailored for deepfake detectors that maintains detection accuracy while enabling real-time performance on resource-constrained edge devices.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07868","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"On the Decompositionality of Neural Networks","primary_cat":"cs.LO","submitted_at":"2026-04-09T06:32:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Neural decompositionality is defined via decision-boundary semantic preservation, and language transformers largely satisfy it under SAVED while vision models often do not.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Concretely, for each component 𝑚𝑘, we associate a mask 𝑀𝑘 ∈ { 0, 1}𝑁 over the units of the original model, where 𝑀𝑘 (𝑖)= 1indicates that unit 𝑖 is utilized by component 𝑚𝑘. The structural support 𝑆𝑘 is then defined as 𝑆𝑘 :={𝑖|𝑀 𝑘 (𝑖)= 1}. The masks {𝑀𝑘 } can be obtained through standard mask-learning or sparsification techniques, such as magnitude-based pruning [13], gradient-based gating, or learned binary masking with straight-through estimators [2]. In practice, we jointly optimize the masks with respect to (i) task performance and (ii) sparsity or separation regularizers that encourage disjointness across components. 3.4 Global Decompositionality We combine the two aforementioned concepts into a unified definition of decompositionality."},{"citing_arxiv_id":"2604.04493","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-06T07:36:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.21651","ref_index":4,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models","primary_cat":"cs.LG","submitted_at":"2025-12-25T12:39:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A post-training quantization technique for 1-bit LLMs that corrects layer-wise error accumulation and anisotropic representation distortion to preserve output behavior more effectively than existing methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.02010","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling","primary_cat":"cs.CL","submitted_at":"2025-12-01T18:59:45+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.12340","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LILogic Net: Compact Logic Gate Networks with Learnable Connectivity for Efficient Hardware Deployment","primary_cat":"cs.LG","submitted_at":"2025-11-15T19:44:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LILogicNet trains compact logic-gate networks with learnable sparse connectivity via Top-K selection, reaching 98.45% MNIST accuracy with 8k gates and 60.98% CIFAR-10 accuracy with 256k gates while using far fewer gates than prior logic models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.04776","ref_index":15,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Quantifying the Climate Risk of Generative AI: Region-Aware Carbon Accounting with G-TRACE and the AI Sustainability Pyramid","primary_cat":"cs.CY","submitted_at":"2025-11-06T19:52:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"G-TRACE provides region-aware estimates of GenAI carbon emissions including 4309 MWh and 2068 tCO2 for a 2024-2025 image generation trend, paired with a seven-level AI Sustainability Pyramid for policy guidance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.09696","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression","primary_cat":"cs.LG","submitted_at":"2025-10-09T15:17:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"VCON is a unified framework for smooth iterative DNN compression that uses parallel execution and an affine combination to progressively replace the original model with its compressed form during fine-tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.12876","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs","primary_cat":"cs.LG","submitted_at":"2025-06-15T15:02:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.06307","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights","primary_cat":"cs.LG","submitted_at":"2025-04-07T21:56:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Quantization and local inference reduce LLM energy consumption and emissions by up to 45% in a presented case study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.18091","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning","primary_cat":"cs.AI","submitted_at":"2024-12-24T02:05:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AutoSculpt models DNNs as graphs, embeds pruning patterns, and uses deep reinforcement learning to reach up to 90% pruning and 18% better FLOPs reduction than baselines on ResNet, MobileNet, VGG, and Vision Transformers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.00923","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization","primary_cat":"cs.CV","submitted_at":"2024-08-01T21:27:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CoRa reclaims quantization residuals in pre-trained ConvNets by searching low-rank adapter architectures instead of weights, matching SOTA accuracy on ImageNet in 3-4 bit settings with under 250 iterations on 1600 images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2407.15389","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Poisoning with A Pill: Circumventing Detection in Federated Learning","primary_cat":"cs.LG","submitted_at":"2024-07-22T05:34:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A three-stage pill-based augmentation makes existing FL poisoning attacks evade popular defenses while raising error rates up to 7x on both IID and non-IID data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.12508","ref_index":195,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation","primary_cat":"cs.LG","submitted_at":"2023-10-19T06:17:17+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.14048","ref_index":55,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models","primary_cat":"cs.LG","submitted_at":"2023-06-24T20:11:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[53] Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, and Sinong Wang. Lm- infinite: Simple on-the-fly length generalization for large language models. arXiv preprint arXiv:2308.16137, 2023. [54] Jack W Rae, Anna Potapenko, Siddhant M Jayakumar, and Timothy P Lillicrap. Compressive transformers for long-range sequence modelling. In The International Conference on Learning Representations (ICLR), 2020. 13 [55] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015. [56] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for"},{"citing_arxiv_id":"2305.05176","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance","primary_cat":"cs.LG","submitted_at":"2023-05-09T05:11:02+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FrugalGPT learns query-specific cascades across heterogeneous LLM APIs to match or exceed top-model accuracy at far lower cost.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[XLS+22] Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, and Song Han. Smoothquant: Accurate and eﬃcient post-training quantization for large language models. arXiv preprint arXiv:2211.10438, 2022. [YLLL14] Fan Yang, Xuan Li, Qianmu Li, and Tao Li. Exploring the diversity in cluster ensemble generation: Random sampling and random projection. Expert Systems with Applications , 41(10):4844-4866, 2014. [YLW+23] Zhewei Yao, Cheng Li, Xiaoxia Wu, Stephen Youn, and Yuxiong He. A comprehen- sive study on post-training quantization for large language models. arXiv preprint arXiv:2303.08302, 2023. [ZGA+21] Lucia Zheng, Neel Guha, Brandon R Anderson, Peter Henderson, and Daniel E Ho. When does pretraining help? assessing self-supervised learning for law and the casehold dataset"}],"limit":50,"offset":0}