RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.
Christina V
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
TACK dataset enables scaffold-based evaluation showing classical ML methods outperform a domain-specific GNN for PROTAC activity prediction, with potency far more predictable than maximum degradation.
OmicsLM integrates continuous omics embeddings into LLMs for multi-sample biological reasoning, matching specialized models on profile tasks while outperforming them and general LLMs on language-guided QA over real expression data.
Introduces TEDBench benchmark and MiAE self-supervised framework that outperforms baselines for large-scale protein fold classification.
2D-ProteinRAG is a dual-dimensional RAG framework that incorporates BLAST workflows plus horizontal attribute alignment and vertical homology denoising to improve protein-text QA on both in-distribution and out-of-distribution cases.
Bucket Masking improves protein fitness prediction by up to 14% over random masking by preferentially masking structurally coupled residue groups on four downstream tasks.
Hyformer jointly models molecule generation and property prediction via alternating attention and joint pre-training, showing synergistic gains in conditional sampling, OOD prediction, and a drug design case for antimicrobial peptides.
citing papers explorer
-
RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts
RefusalBench shows strict refusal rates fail to rank frontier LLMs correctly on biological safety, with provider effects and partial-compliance patterns that binary metrics miss.
-
TACK: A statistical evaluation of degradation activity on a novel TArgeting Chimeras Knowledge dataset
TACK dataset enables scaffold-based evaluation showing classical ML methods outperform a domain-specific GNN for PROTAC activity prediction, with potency far more predictable than maximum degradation.
-
OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning
OmicsLM integrates continuous omics embeddings into LLMs for multi-sample biological reasoning, matching specialized models on profile tasks while outperforming them and general LLMs on language-guided QA over real expression data.
-
Protein Fold Classification at Scale: Benchmarking and Pretraining
Introduces TEDBench benchmark and MiAE self-supervised framework that outperforms baselines for large-scale protein fold classification.
-
Unlocking Biological Workflows for Robust Protein-Text Question Answering: A Dual-Dimensional RAG Framework
2D-ProteinRAG is a dual-dimensional RAG framework that incorporates BLAST workflows plus horizontal attribute alignment and vertical homology denoising to improve protein-text QA on both in-distribution and out-of-distribution cases.
-
Structure-Aware Masking for Protein Representation Learning
Bucket Masking improves protein fitness prediction by up to 14% over random masking by preferentially masking structurally coupled residue groups on four downstream tasks.
-
Synergistic Benefits of Joint Molecule Generation and Property Prediction
Hyformer jointly models molecule generation and property prediction via alternating attention and joint pre-training, showing synergistic gains in conditional sampling, OOD prediction, and a drug design case for antimicrobial peptides.