MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.
arXiv preprint arXiv:2211.02001 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Full development of 7B and 32B Olmo 3 models used 12.3 GWh datacenter energy and emitted 4,251 tCO2eq, with development overheads accounting for 82% of compute and reasoning models costing 17x more to post-train than instruction-tuned ones.
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
An adapted scaling law predicts GPU energy consumption for diffusion model inference with R² > 0.9 within architectures and strong cross-architecture generalization.
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
Empirical comparison shows smaller open-weight LLMs achieve strong performance on everyday work tasks, supporting task-aware selection over always using the largest models for sustainability and cost reasons.
citing papers explorer
-
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.
-
The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining
Full development of 7B and 32B Olmo 3 models used 12.3 GWh datacenter energy and emitted 4,251 tCO2eq, with development overheads accounting for 82% of compute and reasoning models costing 17x more to post-train than instruction-tuned ones.
-
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
-
Energy Scaling Laws for Diffusion Models: Quantifying Compute in Image Generation
An adapted scaling law predicts GPU energy consumption for diffusion model inference with R² > 0.9 within architectures and strong cross-architecture generalization.
-
From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint
A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.
-
Sustainability via LLM Right-sizing
Empirical comparison shows smaller open-weight LLMs achieve strong performance on everyday work tasks, supporting task-aware selection over always using the largest models for sustainability and cost reasons.