hub Mixed citations

arXiv preprint arXiv:2504.06196 (2025)

Wang, E · 2025 · arXiv 2504.06196

Mixed citation behavior. Most common role is background (57%).

10 Pith papers citing it

Background 57% of classified citations

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 3

citation-polarity summary

background 4 baseline 3

representative citing papers

MedPRMBench: A Fine-grained Benchmark for Process Reward Models in Medical Reasoning

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

MedPRMBench is the first fine-grained benchmark for process reward models in medical reasoning, featuring 6500 questions, 13000 chains, 113910 step labels, and a baseline that improves downstream QA accuracy by 3.2-6.7 points.

VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

q-bio.QM · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

VibeProteinBench is a new benchmark evaluating LLMs on open-ended language-interfaced protein design across recognition, engineering, and generation, with no model showing strong performance in all areas.

OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning

q-bio.GN · 2026-05-07 · unverdicted · novelty 7.0

OmicsLM integrates continuous omics embeddings into LLMs for multi-sample biological reasoning, matching specialized models on profile tasks while outperforming them and general LLMs on language-guided QA over real expression data.

The limits of bio-molecular modeling with large language models : a cross-scale evaluation

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.

MolDeTox: Evaluating Language Model's Stepwise Fragment Editing for Molecular Detoxification

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

MolDeTox is a new benchmark that shows fragment-level stepwise editing by LLMs and VLMs improves structural validity and detoxification quality over prior toxicity-focused evaluations.

An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

HADES is an agentic AI system that generates mechanistic hypotheses for drug-induced liver injury using molecular, metabolite, and pathway evidence, outperforming prior binary classifiers on the new DILER benchmark while establishing a baseline for hypothesis alignment.

Benchmarking open-source tools for in silico antiviral drug discovery

q-bio.BM · 2026-05-05 · conditional · novelty 5.0

Boltz-2 and fine-tuned DrugFormDTA lead ML-based binding prediction while GNINA leads docking tools on a cleaned antiviral dataset, with performance varying by viral protein.

Bolek: A Multimodal Language Model for Molecular Reasoning

cs.LG · 2026-05-04 · unverdicted · novelty 5.0

Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.

ToxiEval-ZKP: A Structure-Private Verification Framework for Molecular Toxicity Repair Tasks

cs.CR · 2025-08-16 · unverdicted · novelty 5.0

ToxiEval-ZKP applies zero-knowledge proofs to enable private verification that generative AI molecules meet multidimensional toxicity repair criteria.

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

cond-mat.mtrl-sci · 2026-05-04 · unverdicted · novelty 2.0

Hackathon submissions indicate LLMs are moving from general assistants toward composable multi-agent systems for structuring scientific knowledge and automating tasks in materials science and chemistry.

citing papers explorer

Showing 10 of 10 citing papers.

MedPRMBench: A Fine-grained Benchmark for Process Reward Models in Medical Reasoning cs.CL · 2026-04-19 · unverdicted · none · ref 31
MedPRMBench is the first fine-grained benchmark for process reward models in medical reasoning, featuring 6500 questions, 13000 chains, 113910 step labels, and a baseline that improves downstream QA accuracy by 3.2-6.7 points.
VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design q-bio.QM · 2026-05-09 · unverdicted · none · ref 84 · 2 links
VibeProteinBench is a new benchmark evaluating LLMs on open-ended language-interfaced protein design across recognition, engineering, and generation, with no model showing strong performance in all areas.
OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning q-bio.GN · 2026-05-07 · unverdicted · none · ref 34
OmicsLM integrates continuous omics embeddings into LLMs for multi-sample biological reasoning, matching specialized models on profile tasks while outperforming them and general LLMs on language-guided QA over real expression data.
The limits of bio-molecular modeling with large language models : a cross-scale evaluation cs.LG · 2026-04-03 · unverdicted · none · ref 16
LLMs perform adequately on bio-molecular classification tasks but remain weak on regression, with hybrid architectures outperforming others on long sequences and fine-tuning hurting generalization.
MolDeTox: Evaluating Language Model's Stepwise Fragment Editing for Molecular Detoxification cs.AI · 2026-05-12 · unverdicted · none · ref 22
MolDeTox is a new benchmark that shows fragment-level stepwise editing by LLMs and VLMs improves structural validity and detoxification quality over prior toxicity-focused evaluations.
An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES cs.AI · 2026-05-04 · unverdicted · none · ref 10
HADES is an agentic AI system that generates mechanistic hypotheses for drug-induced liver injury using molecular, metabolite, and pathway evidence, outperforming prior binary classifiers on the new DILER benchmark while establishing a baseline for hypothesis alignment.
Benchmarking open-source tools for in silico antiviral drug discovery q-bio.BM · 2026-05-05 · conditional · none · ref 43
Boltz-2 and fine-tuned DrugFormDTA lead ML-based binding prediction while GNINA leads docking tools on a cleaned antiviral dataset, with performance varying by viral protein.
Bolek: A Multimodal Language Model for Molecular Reasoning cs.LG · 2026-05-04 · unverdicted · none · ref 35
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
ToxiEval-ZKP: A Structure-Private Verification Framework for Molecular Toxicity Repair Tasks cs.CR · 2025-08-16 · unverdicted · none · ref 11
ToxiEval-ZKP applies zero-knowledge proofs to enable private verification that generative AI molecules meet multidimensional toxicity repair criteria.
From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry cond-mat.mtrl-sci · 2026-05-04 · unverdicted · none · ref 81
Hackathon submissions indicate LLMs are moving from general assistants toward composable multi-agent systems for structuring scientific knowledge and automating tasks in materials science and chemistry.

arXiv preprint arXiv:2504.06196 (2025)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer