pith. sign in

Measuring massive multitask language understanding

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

representative citing papers

LAB-Bench: Measuring Capabilities of Language Models for Biology Research

cs.AI · 2024-07-14 · accept · novelty 8.0

LAB-Bench provides over 2,400 multiple-choice questions to measure LLM performance on real biology research tasks like literature recall, figure reading, database access, and sequence manipulation, with initial results compared against human expert biologists.

STS: Efficient Sparse Attention with Speculative Token Sparsity

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.

Zephyr: Direct Distillation of LM Alignment

cs.LG · 2023-10-25 · accept · novelty 6.0

Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.

citing papers explorer

Showing 7 of 7 citing papers.