Plex: Towards Reliability using Pretrained Large Model Extensions

Andreas Kirsch; Balaji Lakshminarayanan; D. Sculley; Du Phan; Dustin Tran; Honglin Yuan; Huiyi Hu; Jasper Snoek; Jeremiah Liu; Jie Ren

arxiv: 2207.07411 · v1 · pith:XCTUDM7Anew · submitted 2022-07-15 · 💻 cs.LG · stat.ML

Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran , Jeremiah Liu , Michael W. Dusenberry , Du Phan , Mark Collier , Jie Ren , Kehang Han , Zi Wang

show 18 more authors

Zelda Mariet Huiyi Hu Neil Band Tim G. J. Rudner Karan Singhal Zachary Nado Joost van Amersfoort Andreas Kirsch Rodolphe Jenatton Nithum Thain Honglin Yuan Kelly Buchanan Kevin Murphy D. Sculley Yarin Gal Zoubin Ghahramani Jasper Snoek Balaji Lakshminarayanan

This is my paper

classification 💻 cs.LG stat.ML

keywords modelreliabilitytaskslanguagemodelsperformanceplexpretrained

0 comments

read the original abstract

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees
cs.CV 2026-06 unverdicted novelty 7.0

Introduces object-level semantic uncertainty for VLM memory, the UQ-DAAAM refinement system, and probabilistic guarantees that selected high-quality views reduce uncertainty more effectively.
Towards an AI co-scientist
cs.AI 2025-02 unverdicted novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Towards Expert-Level Medical Question Answering with Large Language Models
cs.CL 2023-05 unverdicted novelty 6.0

Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.
Medical Model Synthesis Architectures: A Case Study
cs.AI 2026-05 unverdicted novelty 5.0

MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
From pre-training to downstream performance: Does domain-specific pre-training make sense?
cs.CV 2026-05 unverdicted novelty 4.0

Pre-training on modality-matched data significantly improves downstream performance in medical imaging models while self-supervised learning benefits depend on context.