GRINQH introduces a graded input-based quantization hierarchy that dynamically assigns multi-precision weights using activation magnitudes as importance proxy, unifying quantization with sparsification to improve LLM decoding speed and quality trade-offs on Llama3 and Qwen3 models.
JURECA: Data Centric and Booster module s implementing the modular supercomputing architecture at Jülich Supercomputing Centre
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
SBI matches MCMC posterior accuracy on a SECIR model but runs 15-120 times faster on GPU for 31-day and 201-day inference windows.
The work shows a gate-tunable interference effect in a voltage-biased triple quantum dot system that produces a minimum in charge current coinciding with a maximum in energy transfer to a capacitively coupled damped resonator representing a vibrational mode.
Engineering report detailing HPC infrastructure, software choices, and performance measurements for training a 7B LLM using 3D parallelism on JUWELS Booster.
citing papers explorer
-
GRINQH: Graded Input-based Quantization Hierarchy for Efficient LLM Generation
GRINQH introduces a graded input-based quantization hierarchy that dynamically assigns multi-precision weights using activation magnitudes as importance proxy, unifying quantization with sparsification to improve LLM decoding speed and quality trade-offs on Llama3 and Qwen3 models.
-
Simulation-based inference for rapid Bayesian parameter estimation in epidemiological models: a comparison with MCMC
SBI matches MCMC posterior accuracy on a SECIR model but runs 15-120 times faster on GPU for 31-day and 201-day inference windows.