The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.
hub Canonical reference
Hilbert’s sixth problem: derivation of fluid equations via Boltzmann’s kinetic theory
Canonical reference. 75% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
SeePhys Pro benchmark reveals multimodal models degrade on physics reasoning as information transfers from text to images, with blind training improvements often stemming from textual cues rather than visual evidence.
A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
A simulation-based inference framework that jointly models type Ia supernovae brightness dependences, host galaxy evolution, and cosmology from photometric observations.
Uses VLMs to detect instance concepts and LLMs to infer abstract relationships, assembling them into 3D scene graph forests that are evaluated on uHumans2 and ScanNet and tested in open-vocabulary retrieval on a Spot robot.
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global feature support under mild assumptions.
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
Fine-tuned multimodal LLMs predict mouse social dominance from raw tube test videos with high agreement to traditional rankings.
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.
SPT-3G delivers the most precise CMB EE and TE spectra at high multipoles to date, giving LCDM parameters with H0 = 66.66 ± 0.60 km/s/Mpc from ground-based data alone and reaching Planck-level constraints when combined with ACT.
Systematic ablation of TrOCR fine-tuning for medieval HTR finds that freezing up to three encoder or six decoder layers does not significantly harm accuracy and that removing CLAHE contrast normalization yields comparable 7.84% CER on the Cortonese manuscript.
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
A hybrid tensor network framework interpolates between classical and quantum models via controllable post-selection, with a trainable hyperparameter that complements bond dimension to enhance quantum machine learning.
MeerKLASS reports successful single-dish HI intensity mapping detections that validate the technique for SKAO cosmology surveys.
Proposes SKA observations of El Gordo to study high-redshift ICM magnetic fields via continuum and polarization measurements and test amplification models.
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
A synthesis of van der Waals Josephson junction research showing how 2D material diversity and symmetry control open routes to novel quantum devices and sensors.
citing papers explorer
-
Beyond Compliance: How AI Could Help Creative Writers by Refusing Them
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.