Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
Won't get fooled again: Answering questions with false premises
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
C3RL is a new RL algorithm combining correctness, calibration, and reference accuracy rewards to improve LLM confidence calibration, enabling CAS to outperform majority voting with up to 12.33x lower inference cost.
citing papers explorer
-
Gemini: A Family of Highly Capable Multimodal Models
Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.