The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.
hub Canonical reference
Hilbert’s sixth problem: derivation of fluid equations via Boltzmann’s kinetic theory
Canonical reference. 75% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
SeePhys Pro benchmark reveals multimodal models degrade on physics reasoning as information transfers from text to images, with blind training improvements often stemming from textual cues rather than visual evidence.
A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
A simulation-based inference framework that jointly models type Ia supernovae brightness dependences, host galaxy evolution, and cosmology from photometric observations.
Uses VLMs to detect instance concepts and LLMs to infer abstract relationships, assembling them into 3D scene graph forests that are evaluated on uHumans2 and ScanNet and tested in open-vocabulary retrieval on a Spot robot.
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global feature support under mild assumptions.
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
Fine-tuned multimodal LLMs predict mouse social dominance from raw tube test videos with high agreement to traditional rankings.
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.
SPT-3G delivers the most precise CMB EE and TE spectra at high multipoles to date, giving LCDM parameters with H0 = 66.66 ± 0.60 km/s/Mpc from ground-based data alone and reaching Planck-level constraints when combined with ACT.
Systematic ablation of TrOCR fine-tuning for medieval HTR finds that freezing up to three encoder or six decoder layers does not significantly harm accuracy and that removing CLAHE contrast normalization yields comparable 7.84% CER on the Cortonese manuscript.
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
A hybrid tensor network framework interpolates between classical and quantum models via controllable post-selection, with a trainable hyperparameter that complements bond dimension to enhance quantum machine learning.
MeerKLASS reports successful single-dish HI intensity mapping detections that validate the technique for SKAO cosmology surveys.
Proposes SKA observations of El Gordo to study high-redshift ICM magnetic fields via continuum and polarization measurements and test amplification models.
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
A synthesis of van der Waals Josephson junction research showing how 2D material diversity and symmetry control open routes to novel quantum devices and sensors.
SKAO will enable the first large-scale high-resolution surveys of cm-wavelength disk emission to constrain dust growth, pebble demographics, and planet formation processes.
citing papers explorer
-
Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents
The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.
-
SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
SeePhys Pro benchmark reveals multimodal models degrade on physics reasoning as information transfers from text to images, with blind training improvements often stemming from textual cues rather than visual evidence.
-
TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On
A new large-scale triplet dataset and diffusion transformer model using coarse human masks deliver improved video virtual try-on quality and generalization in challenging real-world conditions.
-
Evaluating LLM Agents on Automated Software Analysis Tasks
A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
-
CIGaRS I: Combined simulation-based inference from type Ia supernovae and host photometry
A simulation-based inference framework that jointly models type Ia supernovae brightness dependences, host galaxy evolution, and cosmology from photometric observations.
-
From Pixels to Concepts: Growing Rich 3D Semantic Scene Graph Forests utilizing Foundation Models
Uses VLMs to detect instance concepts and LLMs to infer abstract relationships, assembling them into 3D scene graph forests that are evaluated on uHumans2 and ScanNet and tested in open-vocabulary retrieval on a Spot robot.
-
Robust Mutation Analysis of Quantum Programs Under Noise
Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.
-
Feature Starvation as Geometric Instability in Sparse Autoencoders
Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global feature support under mild assumptions.
-
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
-
MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models
Fine-tuned multimodal LLMs predict mouse social dominance from raw tube test videos with high agreement to traditional rankings.
-
Beyond Compliance: How AI Could Help Creative Writers by Refusing Them
A qualitative study with 22 creative writers finds that the reflective value of AI refusals depends on alignment with users' situational thinking phases, cognitive beliefs, and views of AI roles.
-
SPT-3G D1: CMB temperature and polarization power spectra and cosmology from 2019 and 2020 observations of the SPT-3G Main field
SPT-3G delivers the most precise CMB EE and TE spectra at high multipoles to date, giving LCDM parameters with H0 = 66.66 ± 0.60 km/s/Mpc from ground-based data alone and reaching Planck-level constraints when combined with ACT.
-
TrOCR for Medieval HTR: A Systematic Ablation Study with Cross-Dataset Validation
Systematic ablation of TrOCR fine-tuning for medieval HTR finds that freezing up to three encoder or six decoder layers does not significantly harm accuracy and that removing CLAHE contrast normalization yields comparable 7.84% CER on the Cortonese manuscript.
-
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application
This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.
-
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.
-
Heterogeneous Scientific Foundation Model Collaboration
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
-
Entanglement is Half the Story: Post-Selection vs. Partial Traces
A hybrid tensor network framework interpolates between classical and quantum models via controllable post-selection, with a trainable hyperparameter that complements bond dimension to enhance quantum machine learning.
-
Single-dish HI Intensity Mapping with the SKAO: Precursor Progress with MeerKAT's Large Area Synoptic Survey (MeerKLASS)
MeerKLASS reports successful single-dish HI intensity mapping detections that validate the technique for SKAO cosmology surveys.
-
Probing High-redshift Intracluster Medium Using SKA
Proposes SKA observations of El Gordo to study high-redshift ICM magnetic fields via continuum and polarization measurements and test amplification models.
-
Rethinking Agentic Reinforcement Learning In Large Language Models
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
-
New frontiers in quantum science and technology using van der Waals Josephson junctions
A synthesis of van der Waals Josephson junction research showing how 2D material diversity and symmetry control open routes to novel quantum devices and sensors.
-
Demographics of planet-forming disks with the SKAO
SKAO will enable the first large-scale high-resolution surveys of cm-wavelength disk emission to constrain dust growth, pebble demographics, and planet formation processes.
- FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
- Hilbert's Sixth Problem and Soft Logic