pith. sign in

arxiv: 2312.00718 · v1 · pith:PQYF6EQYnew · submitted 2023-12-01 · 💻 cs.LG · cs.AI· q-bio.BM

Removing Biases from Molecular Representations via Information Maximization

classification 💻 cs.LG cs.AIq-bio.BM
keywords infocorebatchdrugdatainformationmolecularrepresentationsdistribution
0
0 comments X
read the original abstract

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

    cs.CV 2026-04 unverdicted novelty 6.0

    Intervention-aware distillation transfers mechanistic knowledge from perturbational transcriptomics to imaging phenomics for improved one-shot transfer to unseen drugs and target gene discovery.