hub Mixed citations

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia · 2017 · cs.AI · arXiv 1710.03740

Mixed citation behavior. Most common role is background (67%).

52 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 52 citing papers arXiv PDF

abstract

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of information with half-precision gradients. We demonstrate that this approach works for a wide variety of models including convolution neural networks, recurrent neural networks and generative adversarial networks. This technique works for large scale models with more than 100 million parameters trained on large datasets. Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x. In future processors, we can also expect a significant computation speedup using half-precision hardware units.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 4 dataset 1

citation-polarity summary

background 10 use method 4 use dataset 1

representative citing papers

Efficient Training on Multiple Consumer GPUs with RoundPipe

cs.DC · 2026-04-29 · conditional · novelty 8.0

RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.

ImplicitTerrainV2: Wavelet-Guided Spatially Adaptive Neural Terrain Representation

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

A wavelet-guided adaptive INR for DEMs achieves 66.25 dB PSNR on Swiss tiles with 3.2x fewer parameters than prior work, plus post-training compression to 1.23 bpp.

TransDot: An Area-efficient Reconfigurable Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines

cs.AR · 2026-05-08 · unverdicted · novelty 7.0

TransDot unifies SIMD FMA and trans-precision DPA in one reconfigurable FPU, achieving 2x FP16, 4x FP8, and 8x FP4 throughput with FP32 accumulation plus 1.46x to 2.92x area efficiency gains over the FPnew baseline.

Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell Methods

cs.CE · 2026-04-21 · unverdicted · novelty 7.0

Mass matrix assembly for implicit PIC methods can be exactly reformulated cell-by-cell as tensor-core matrix products, delivering up to 3x kernel speedup and 15% end-to-end runtime reduction in ECSIM simulations.

From Characterization to Microarchitecture: Designing an Elegant and Reliable BFP-Based NPU

cs.AR · 2026-04-12 · unverdicted · novelty 7.0

A BFP NPU microarchitecture using row/column blocking and per-path protections achieves near-DMR reliability at 3.55% geometric mean performance overhead and under 2% hardware cost.

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark

cs.CR · 2026-04-09 · unverdicted · novelty 7.0

Creates the BGTD benchmark and mmTraffic architecture to enable explainable multimodal interpretation of encrypted network traffic using LLMs.

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

PrecisionDiff is a differential testing framework that uncovers widespread precision-induced behavioral disagreements in aligned LLMs, including safety-critical jailbreak divergences across precision formats.

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

cs.LG · 2026-02-06 · conditional · novelty 7.0

Aurora unifies speculative decoder training and serving via asynchronous RL on inference traces, delivering 1.5x day-0 speedup on frontier models and 1.25x adaptation gains on distribution shifts.

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

cs.CL · 2025-12-01 · conditional · novelty 7.0

Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

cs.LG · 2025-10-05 · unverdicted · novelty 7.0

Low-precision Flash Attention fails due to similar low-rank attention representations combined with biased rounding errors that accumulate and corrupt weight updates; a minimal fix to reduce rounding bias stabilizes training.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

cs.CV · 2021-12-20 · accept · novelty 7.0

A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

cs.LG · 2021-01-11 · accept · novelty 7.0

Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.

Generating Long Sequences with Sparse Transformers

cs.LG · 2019-04-23 · unverdicted · novelty 7.0

Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.

CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

cs.SE · 2026-04-27 · unverdicted · novelty 6.0

Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.

Training Time Prediction for Mixed Precision-based Distributed Training

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

A precision-aware predictor for distributed training time achieves 9.8% MAPE across precision settings, compared to errors up to 147.85% when precision is ignored.

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.

SHARE: Social-Humanities AI for Research and Education

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

SHARE models are the first causal LMs pretrained exclusively for SSH and match general models like Phi-4 on SSH texts despite using 100 times fewer tokens, paired with a non-generative MIRROR interface to support scholarly review.

LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training

cs.AR · 2026-04-12 · unverdicted · novelty 6.0

LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.

citing papers explorer

Showing 50 of 52 citing papers.

Efficient Training on Multiple Consumer GPUs with RoundPipe cs.DC · 2026-04-29 · conditional · none · ref 32 · internal anchor
RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.
ImplicitTerrainV2: Wavelet-Guided Spatially Adaptive Neural Terrain Representation cs.LG · 2026-05-21 · unverdicted · none · ref 56 · internal anchor
A wavelet-guided adaptive INR for DEMs achieves 66.25 dB PSNR on Swiss tiles with 3.2x fewer parameters than prior work, plus post-training compression to 1.23 bpp.
TransDot: An Area-efficient Reconfigurable Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines cs.AR · 2026-05-08 · unverdicted · none · ref 7 · internal anchor
TransDot unifies SIMD FMA and trans-precision DPA in one reconfigurable FPU, achieving 2x FP16, 4x FP8, and 8x FP4 throughput with FP32 accumulation plus 1.46x to 2.92x area efficiency gains over the FPnew baseline.
Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell Methods cs.CE · 2026-04-21 · unverdicted · none · ref 2 · internal anchor
Mass matrix assembly for implicit PIC methods can be exactly reformulated cell-by-cell as tensor-core matrix products, delivering up to 3x kernel speedup and 15% end-to-end runtime reduction in ECSIM simulations.
From Characterization to Microarchitecture: Designing an Elegant and Reliable BFP-Based NPU cs.AR · 2026-04-12 · unverdicted · none · ref 43 · internal anchor
A BFP NPU microarchitecture using row/column blocking and per-path protections achieves near-DMR reliability at 3.55% geometric mean performance overhead and under 2% hardware cost.
Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings q-bio.QM · 2026-04-09 · unverdicted · none · ref 61 · internal anchor
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark cs.CR · 2026-04-09 · unverdicted · none · ref 26 · internal anchor
Creates the BGTD benchmark and mmTraffic architecture to enable explainable multimodal interpretation of encrypted network traffic using LLMs.
Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements cs.AI · 2026-04-02 · unverdicted · none · ref 30 · internal anchor
PrecisionDiff is a differential testing framework that uncovers widespread precision-induced behavioral disagreements in aligned LLMs, including safety-critical jailbreak divergences across precision formats.
When RL Meets Adaptive Speculative Training: A Unified Training-Serving System cs.LG · 2026-02-06 · conditional · none · ref 20 · internal anchor
Aurora unifies speculative decoder training and serving via asynchronous RL on inference traces, delivering 1.5x day-0 speedup on frontier models and 1.25x adaptation gains on distribution shifts.
Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling cs.CL · 2025-12-01 · conditional · none · ref 1 · internal anchor
Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention cs.LG · 2025-10-05 · unverdicted · none · ref 15 · internal anchor
Low-precision Flash Attention fails due to similar low-rank attention representations combined with biased rounding errors that accumulate and corrupt weight updates; a minimal fix to reduce rounding bias stabilizes training.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 61 · internal anchor
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
OPT: Open Pre-trained Transformer Language Models cs.CL · 2022-05-02 · unverdicted · none · ref 262 · internal anchor
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models cs.CV · 2021-12-20 · accept · none · ref 16 · internal anchor
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 38 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021-01-11 · accept · none · ref 21 · internal anchor
Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.
Generating Long Sequences with Sparse Transformers cs.LG · 2019-04-23 · unverdicted · none · ref 16 · internal anchor
Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale cs.LG · 2026-05-11 · unverdicted · none · ref 52 · 2 links · internal anchor
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification cs.CL · 2026-05-05 · unverdicted · none · ref 51 · internal anchor
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study cs.SE · 2026-04-27 · unverdicted · none · ref 28 · internal anchor
Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.
Training Time Prediction for Mixed Precision-based Distributed Training cs.LG · 2026-04-17 · unverdicted · none · ref 11 · internal anchor
A precision-aware predictor for distributed training time achieves 9.8% MAPE across precision settings, compared to errors up to 147.85% when precision is ignored.
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference cs.LG · 2026-04-16 · unverdicted · none · ref 16 · internal anchor
FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.
SHARE: Social-Humanities AI for Research and Education cs.CL · 2026-04-13 · unverdicted · none · ref 3 · internal anchor
SHARE models are the first causal LMs pretrained exclusively for SSH and match general models like Phi-4 on SSH texts despite using 100 times fewer tokens, paired with a non-generative MIRROR interface to support scholarly review.
LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training cs.AR · 2026-04-12 · unverdicted · none · ref 27 · internal anchor
LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning cs.LG · 2026-04-10 · unverdicted · none · ref 141 · internal anchor
MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter and memory usage.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control cs.LG · 2026-04-06 · unverdicted · none · ref 52 · 2 links · internal anchor
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction cs.CV · 2026-04-01 · unverdicted · none · ref 40 · internal anchor
Neural Harmonic Textures add periodic feature interpolation and deferred neural decoding to primitive representations, achieving state-of-the-art real-time novel-view synthesis and bridging primitive and neural-field methods.
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling cs.LG · 2026-03-15 · unverdicted · none · ref 22 · internal anchor
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
Recovering Sub-threshold S-wave Arrivals in Deep Learning Phase Pickers via Shape-Aware Loss physics.geo-ph · 2025-11-10 · unverdicted · none · ref 18 · internal anchor
A shape-aware loss strategy recovers sub-threshold S-wave arrivals in deep learning seismic phase pickers by treating labels as coherent shapes, achieving a 64% increase in effective detections.
Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs cs.IR · 2025-08-13 · accept · none · ref 23 · internal anchor
CCE- is a Triton kernel implementation of cross-entropy loss with negative sampling that reduces memory by more than 10x and accelerates training by up to 2x for large-catalog sequential recommenders.
Shap-E: Generating Conditional 3D Implicit Functions cs.CV · 2023-05-03 · accept · none · ref 39 · internal anchor
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel cs.DC · 2023-04-21 · unverdicted · none · ref 18 · internal anchor
PyTorch Fully Sharded Data Parallel enables training of significantly larger models than Distributed Data Parallel with comparable speed and near-linear TFLOPS scaling.
FP8 Formats for Deep Learning cs.LG · 2022-09-12 · unverdicted · none · ref 14 · internal anchor
FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.
ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022-02-17 · unverdicted · none · ref 176 · internal anchor
ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.
Linformer: Self-Attention with Linear Complexity cs.LG · 2020-06-08 · conditional · none · ref 10 · internal anchor
Linformer approximates self-attention with a low-rank projection to achieve O(n) time and space complexity while matching Transformer accuracy on standard NLP tasks.
Probing Routing-Conditional Calibration in Attention-Residual Transformers cs.CV · 2026-05-11 · unverdicted · none · ref 6 · internal anchor
Routing summaries and auxiliary features do not provide stable evidence of conditional miscalibration in AR transformers once confidence-matched baselines, capacity controls, and permutation nulls are applied.
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay cs.CV · 2026-05-02 · unverdicted · none · ref 34 · internal anchor
Colinearity-Decay regularizer trains ViTs that maintain or improve full-precision accuracy while delivering higher accuracy after low-bit quantization on ImageNet and COCO tasks.
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training cs.DC · 2026-04-27 · unverdicted · none · ref 36 · internal anchor
TACO compresses tensor-parallel intermediate tensors with an adaptive FP8 scheme and fused kernels, yielding up to 1.87X throughput gains on GPT and Qwen models with near-lossless accuracy.
PINNACLE: An Open-Source Computational Framework for Classical and Quantum PINNs cs.LG · 2026-04-17 · accept · none · ref 46 · internal anchor
PINNACLE is an open-source framework for classical and quantum PINNs that supplies modular training methods and benchmarks showing high sensitivity to architecture choices plus parameter-efficiency gains in some hybrid quantum regimes.
BAAI Cardiac Agent: An intelligent multimodal agent for automated reasoning and diagnosis of cardiovascular diseases from cardiac magnetic resonance imaging eess.IV · 2026-04-05 · unverdicted · none · ref 45 · internal anchor
BAAI Cardiac Agent automates end-to-end cardiac MRI analysis for seven cardiovascular diseases, achieving AUC >0.93 internally and >0.81 externally with high correlation to expert measurements.
Energy Scaling Laws for Diffusion Models: Quantifying Compute in Image Generation cs.LG · 2025-11-21 · unverdicted · none · ref 21 · internal anchor
An adapted scaling law predicts GPU energy consumption for diffusion model inference with R² > 0.9 within architectures and strong cross-architecture generalization.
Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts cs.LG · 2025-10-09 · unverdicted · none · ref 14 · internal anchor
Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.
SGEMM-cube: Precision-Recovery FP32 GEMM Approximation on Ascend NPUs with FP16 Matrix Engines cs.DC · 2025-07-31 · unverdicted · none · ref 7 · internal anchor
SGEMM-cube approximates FP32 GEMM on Ascend NPUs via a two-component FP16 splitting strategy, recovering near-FP32 accuracy for moderate-range inputs at up to 65.3 TFLOP/s (77% of the three-GEMM peak).
ProTrain: Efficient LLM Training via Memory-Aware Techniques cs.DC · 2024-06-12 · unverdicted · none · ref 29 · internal anchor
ProTrain automates memory management for LLM training via cost models from profiling to deliver 1.43x-2.71x throughput gains over state-of-the-art systems without accuracy loss.
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model cs.CL · 2022-01-28 · unverdicted · none · ref 41 · internal anchor
Trained the largest monolithic 530B-parameter transformer language model to date and reported new state-of-the-art zero- and few-shot results on multiple NLP benchmarks.
Stellar Density Classification and Regression for CSST Multi-color Imaging Using Deep Learning astro-ph.IM · 2026-05-17 · unverdicted · none · ref 30 · internal anchor
A ResNet-34 classifier achieves 98.83% accuracy on six stellar density categories while a ResNet-50 regressor predicts bright-star counts with 0.0824 dex MAE for CSST image processing.
Assessing Performance and Porting Strategies for Gravitational $N$-Body Simulations on the RISC-V-Based Tenstorrent Wormhole\textsuperscript{\texttrademark} cs.DC · 2026-05-04 · unverdicted · none · ref 10 · internal anchor
Three scaling strategies for an N-body code on Tenstorrent Wormhole accelerators are compared via execution time and energy measurements, identifying the configuration with the best efficiency-performance balance.
CurEvo: Curriculum-Guided Self-Evolution for Video Understanding cs.CV · 2026-04-29 · unverdicted · none · ref 56 · internal anchor
CurEvo integrates curriculum guidance into self-evolution to structure autonomous improvement of video understanding models, yielding gains on VideoQA benchmarks.
AdvSynGNN: Structure-Adaptive Graph Neural Nets via Adversarial Synthesis and Self-Corrective Propagation cs.LG · 2026-02-19 · unverdicted · none · ref 40 · internal anchor
AdvSynGNN uses multi-resolution structural synthesis, contrastive objectives, an adaptive transformer, and an adversarial propagation engine with residual label correction to improve node-level predictions on challenging graph topologies.
Mixed-Precision in adaptive Runge-Kutta method for large ODE systems math.NA · 2026-05-22 · unverdicted · none · ref 33 · internal anchor
Empirical tests show mixed-precision Bogacki-Shampine 3(2) Runge-Kutta preserves most high-precision accuracy on large ODE systems like Kuramoto and circadian models, with accuracy improving at larger scales.

Mixed Precision Training

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer