{"total":14,"items":[{"citing_arxiv_id":"2606.29100","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Exascale AI for Science: A Scalable AI Skill for Autonomous Microkinetics Discovery","primary_cat":"cs.CE","submitted_at":"2026-06-27T22:10:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Introduces a scalable AI skill framework for autonomous microkinetics discovery that automates workflows and evaluates surrogate reliability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28911","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MALOQ: Massively Accelerated Learning of Operators for Quantum Transport","primary_cat":"cs.LG","submitted_at":"2026-06-27T13:34:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MALOQ introduces a scalable SO(2)-equivariant ML framework with custom kernels and edge-wise graph distribution for predicting large-scale quantum transport operators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25693","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dynamic Load Balancing for Uncertainty Quantification with Applications in Bayesian Inversion","primary_cat":"cs.DC","submitted_at":"2026-06-24T11:02:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A dynamic load balancer for UM-Bridge achieves near-millisecond average node idle time on heterogeneous tsunami simulation workloads in Bayesian inversion without prior workload assumptions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25595","ref_index":6,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Optimizing Semiconductor Device Simulations through Low-Precision Arithmetic","primary_cat":"cs.CE","submitted_at":"2026-06-24T09:02:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The quatrex quantum transport solver achieves up to 51% higher throughput using low-precision formats while maintaining accuracy on realistic semiconductor systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25453","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EmuGEMM: Fused Tensor Core Kernels for Precision Emulation in Matrix Multiplication","primary_cat":"cs.DC","submitted_at":"2026-06-24T06:27:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Fused Tensor Core kernels for Ozaki Schemes I and II achieve up to 83% of INT8 peak throughput and outperform cuBLAS TF32 and ZGEMM on large matrices at comparable accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12850","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"High-Order Spectral Element Methods for Wave Propagation on ARM Multicore CPU with SME: Optimizations and Implications","primary_cat":"cs.DC","submitted_at":"2026-06-11T03:30:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SME-aware kernel and hybrid execution optimizations for SPECFEM3D on LX2 ARM processors deliver 4-6x speedup and shift the favorable (h,p) operating point to higher orders along the dispersion-based iso-accuracy frontier.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17064","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tensor network compression using fluid dynamics as a testbed: Analytical foundations in one dimension","primary_cat":"physics.comp-ph","submitted_at":"2026-06-04T11:12:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Tensor networks enable tunable, objective compression of 1D fluid data with lossless reconstruction at high bond dimension and efficient in-compressed-space operations like periodic convolution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24682","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Scalable High-Dimensional Bayesian Field Reconstruction with Finite Elements: Application to 3D Porous Media Flow","primary_cat":"cs.CE","submitted_at":"2026-05-23T17:38:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A finite-element variational inference method delivers full-covariance Bayesian field reconstruction at dimensions exceeding 400,000 for 3D porous media flow using sparse precision parameterization from SPDE priors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24091","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales","primary_cat":"cs.DC","submitted_at":"2026-04-27T06:30:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AtomWorld enables the first direct atomistic simulation of RPV steel at year-and-meter scales, handling ten-quintillion-atom systems and simulating one service year in 1.71 days with 92-97% scaling efficiency on leadership supercomputers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"axial positionz v in the Chinese third-generation CAP1400 RPV [65]. The RPV base material is ASME SA508 Grade 3 Class 1, and we adopt a representative composition reported for China domestic A508-3 steel: Fe (bal.), C 0.167 wt.%, Si 0.193 wt.%, Mn 1.35 wt.%, S 0.002 wt.%, P 0.005 wt.%, Cr 0.086 wt.%, Ni 0.738 wt.%, Cu 0.027 wt.%, Mo 0.481 wt.%, and V 0.007 wt.% [66]. The local irradiation condition is prescribed as: ϕv =ϕ inner exp(−µxv)f ϕ(zv),(11) whereϕ inner is the reference neutron flux at the inner wall,µis the through-wall attenuation coefficient, andf ϕ(zv)describes the axial flux distribution, which peaks in the core belt region, as illustrated in Fig. 1(b). Accordingly, the initial vacancy concentration in voxelvis treated as a function of its local"},{"citing_arxiv_id":"2604.18801","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Preserving Clusters in Error-Bounded Lossy Compression of Particle Data","primary_cat":"cs.LG","submitted_at":"2026-04-20T20:10:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A clustering-aware correction algorithm using spatial partitioning and projected gradient descent preserves single-linkage clusters in lossy-compressed particle data while keeping competitive compression ratios.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"In cosmology, the Hardware/Hybrid Accelerated Cosmology Code (HACC) framework [1] is cur- rently scaling toward tens of trillions of particles to model the universe's evolution. At the Exascale, these simulations generate individual snapshots exceeding 500 TB, with aggre- gate I/O throughput peaking at over 30 TB/s and cumulative data products reaching the exabyte scale [2]. Similarly, molec- ular dynamics simulations in biology and materials science now produce trillion-particle snapshots [3] to study complex phenomena such as polymer clustering [4] and shock-induced plasticity [5]. Unlike lossless compression, which preserves the data bit-for-bit but yields limited compression ratios (around 2:1 for floating-point data [6]), error-bounded lossy compres-"},{"citing_arxiv_id":"2604.17981","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Efficiently emulating distribution functions in gigaparsec volumes for varying cosmological parameters","primary_cat":"astro-ph.CO","submitted_at":"2026-04-20T09:02:04+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new overdensity-conditioned emulator trained on small subvolumes from Quijote recovers the global halo mass function via integration over the overdensity distribution at 0.026% of the simulation cost.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"matter power spectrum and transfer functions obtained by rescaling the𝑧=0outputsfromCAMB(Lewisetal.2000).Thegravitational softening length is set to 50ℎ−1 kpc. TheBSQsuiteconsistsof32,768simulationsvaryingfivestandard cosmologicalparameters(Ω m,Ω b,ℎ,𝑛 𝑠,𝜎 8),wherethecosmological parameters are arranged in a Sobol low-discrepancy sequence with bounds Ωm ∈ [0.10,0.50] Ωb ∈ [0.02,0.08] ℎ∈ [0.50,0.90] 𝑛𝑠 ∈ [0.80,1.20] 𝜎8 ∈ [0.60,1.00] The remaining cosmological parameters are fixed to𝑀𝜈 =0 eV, 𝑤=−1, andΩ 𝑘 =0. Haloes are identified using the Friends-of- Friends (FoF) algorithm (Davis et al. 1985) with a linking length 𝑏=0.2, and no additional unbinding step. We train using all haloes containing greater than 20 particles, and use the total mass of the halo as our target variable2."},{"citing_arxiv_id":"2604.08812","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sensor Placement for Tsunami Early Warning via Large-Scale Bayesian Optimal Experimental Design","primary_cat":"cs.DC","submitted_at":"2026-04-09T23:04:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A reformulation of Bayesian OED as dense matrix subset selection plus a pipelined Schur-complement greedy algorithm on hundreds of GPUs enables optimization of 175-sensor networks for billion-degree-of-freedom tsunami models with near-perfect scaling.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"the tsunami source, independent of assumptions about the fault geometry [5], [6]. The 2025 Gordon Bell Prize-winning work [7] showed that tsunami forecasting via inference of extreme-scale spatiotem- poral seafloor motion from pressure sensor data, based on high-fidelity physics models, can be achieved in real time. In that work, the authors created adigital twin[8] for tsunami early warning on the Cascadia Subduction Zone (CSZ). The CSZ is a 1000 km long region spanning from Northern Cali- fornia to British Columbia (see Figure 1), where paleoseismic evidence suggests a magnitude 8.0-9.0 megathrust earthquake is overdue [9], [10]. Community-driven initiatives and feasibility studies are cur- rently laying the groundwork for the installation of offshore"},{"citing_arxiv_id":"2604.06035","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"cuRAMSES: Scalable AMR Optimizations for Large-Scale Cosmological Simulations","primary_cat":"astro-ph.GA","submitted_at":"2026-04-07T16:30:54+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"M(i, j, k) = B−1X b=0 h bitb(i)·2 3b +bit b(j)·2 3b+1 +bit b(k)·2 3b+2 i , (9) whereBis the number of bits per coordinate and bit b(n) extracts bitbof integern. Two key widths are supported via a compile-time flag: a 64-bit key withB= 21 (default, compatible with the Intelifxcompiler), and a 128-bit key withB= 42. At AMR levelℓ, the integer coordinate range is[0,2 ℓ−1nx), wheren x is the number of root-level cells per dimension. WithB= 21 the maximum allowed level is 22 (nx = 1) or 20 (n x = 4) whileB= 42 extends these to 43 and 41, respectively. The integer coordinates of a grid at levelℓare computed from its floating-point centre positionr g as id =iFloor(2 ℓ−1rg,d), d∈ {x, y, z},(10) whereiFlooris the integer floor function and coordinates are in units of the coarse grid spacing. The integer coordinates use zero-based (C-style) indexing, starting fromi d = 0 even though Fortran arrays are conventionally one-based. Note that AMR does not populate all possible grid positions at a given level since only regions that satisfy the refinement criteria contain grids. The Morton key therefore serves as a unique spatial address for eachexistinggrid. The hash table (Appendix B) stores only the grids that are actually allocated making the look-up cost independent of the total number of potential grid positions at that level. The neighbour key in directionjis obtained by decoding, shifting the appropriate coordinate with periodic wrapping, and re-encoding (Appendix A). Parent and child keys follow from 3-bit shifts asM parent =M>>3,M child = (M<<3)|i child. The Morton keys are stored in a per-level open-addressing hash table that maps keys to grid indices, providingO(1) expected neighbour look-up (Appendix B). Each hash ta- ble entry stores a 16-byte Morton key and a 4-byte grid in- dex (20 bytes total). With a maximum filling factor of 1.4 (i.e. 70 per cent occupancy), the aggregate memory footprint across all levels is approximately 1.4×20×N grids = 28N grids byte"},{"citing_arxiv_id":"2603.09038","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores","primary_cat":"cs.DC","submitted_at":"2026-03-10T00:12:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FP64 tensor cores accelerate high-order finite-element kernels in MFEM by up to 2x with 83% energy gains and near-perfect weak scaling on exascale hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}