OmicsLM integrates continuous omics embeddings into LLMs for multi-sample biological reasoning, matching specialized models on profile tasks while outperforming them and general LLMs on language-guided QA over real expression data.
hub
Buenrostro, Nir Yosef, Carolina Caldas, Rui Sun, and Bing He
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Residual feature integration with a trainable target-side encoder provably prevents negative transfer, achieving convergence rates no worse than training from scratch under informative target distributions.
OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.
OCOO-T is a flow-matching Transformer model that directly denoises continuous gene expression profiles to predict transcriptional responses to perturbations and reports state-of-the-art results on Tahoe100M, Replogle, and PBMC benchmarks.
EpiAwareNet is a prior-guided multi-omic Transformer that uses gene-peak cross-attention for adaptive accessibility aggregation and bulk GRN priors for weak supervision to improve single-cell GRN reconstruction over baselines.
MEDAL distills manifold embeddings into autoencoders to enable out-of-sample extension and held-out validation of dimension reduction methods.
ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.
Geometric stability, defined as the directional coherence of cellular responses to perturbation, provides a framework for assessing whether resulting cellular states are stable beyond conventional metrics of intervention success.
RAG-GNN augments GNNs with retrieved literature knowledge via gated fusion to improve functional clustering of 379 proteins in cancer signaling networks, raising silhouette score by 0.093.
K-nearest neighbor from a knowledge graph beats most methods on out-of-distribution transcriptomic perturbation prediction, and an RL-trained reasoning LLM matches SOTA on Replogle et al. (2022) cell lines while improving downstream differential expression prediction.
scHelix uses explicit gene-level partitioning into Anchors and Variants plus an asymmetric Align-Refine-Fuse dual-stream architecture to improve batch correction in scRNA-seq without over-correcting biological signals.
Two new methods distill implicit regulatory knowledge from single-cell foundation models to enable generalizable gene regulatory network inference on unseen data.
citing papers explorer
-
OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning
OmicsLM integrates continuous omics embeddings into LLMs for multi-sample biological reasoning, matching specialized models on profile tasks while outperforming them and general LLMs on language-guided QA over real expression data.
-
Residual Feature Integration is Sufficient to Prevent Negative Transfer
Residual feature integration with a trainable target-side encoder provably prevents negative transfer, achieving convergence rates no worse than training from scratch under informative target distributions.
-
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation
OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.
-
OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction
OCOO-T is a flow-matching Transformer model that directly denoises continuous gene expression profiles to predict transcriptional responses to perturbations and reports state-of-the-art results on Tahoe100M, Replogle, and PBMC benchmarks.
-
Prior-Guided Multi-Omic Transformers for Single-Cell Gene Regulatory Network Inference
EpiAwareNet is a prior-guided multi-omic Transformer that uses gene-peak cross-attention for adaptive accessibility aggregation and bulk GRN priors for weak supervision to improve single-cell GRN reconstruction over baselines.
-
MEDAL: Manifold Embedding Distillation via Autoencoder Learning
MEDAL distills manifold embeddings into autoencoders to enable out-of-sample extension and held-out validation of dimension reduction methods.
-
ORBIT: Learning Gene Program Co-Activation Structure for Cell-Type-Stratified Pathway Rewiring Analysis in Single-Cell Transcriptomics
ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.
-
From Syntax to Semantics: Geometric Stability as the Missing Axis of Perturbation Biology
Geometric stability, defined as the directional coherence of cellular responses to perturbation, provides a framework for assessing whether resulting cellular states are stable beyond conventional metrics of intervention success.
-
RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine
RAG-GNN augments GNNs with retrieved literature knowledge via gated fusion to improve functional clustering of 379 proteins in cancer signaling networks, raising silhouette score by 0.093.
-
Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors
K-nearest neighbor from a knowledge graph beats most methods on out-of-distribution transcriptomic perturbation prediction, and an RL-trained reasoning LLM matches SOTA on Replogle et al. (2022) cell lines while improving downstream differential expression prediction.
-
scHelix: Asymmetric Dual-Stream Integration via Explicit Gene-Level Disentanglement
scHelix uses explicit gene-level partitioning into Anchors and Variants plus an asymmetric Align-Refine-Fuse dual-stream architecture to improve batch correction in scRNA-seq without over-correcting biological signals.
-
Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models
Two new methods distill implicit regulatory knowledge from single-cell foundation models to enable generalizable gene regulatory network inference on unseen data.
- Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress