DPrivBench is a new benchmark for evaluating LLMs on differential privacy reasoning, with results showing good performance on textbook mechanisms but substantial failures on advanced algorithms.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Evaluations of 53 LLMs on 14 basic math tasks show reasoning models use ~18x more tokens with sometimes lower accuracy, non-monotonic gains from extended budgets, and sharp performance drops under token constraints.
citing papers explorer
-
DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy
DPrivBench is a new benchmark for evaluating LLMs on differential privacy reasoning, with results showing good performance on textbook mechanisms but substantial failures on advanced algorithms.
-
Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models
Evaluations of 53 LLMs on 14 basic math tasks show reasoning models use ~18x more tokens with sometimes lower accuracy, non-monotonic gains from extended budgets, and sharp performance drops under token constraints.