Gemini: A Family of Highly Capable Multimodal Models

Gemini Team Google: Rohan Anil , Sebastian Borgeaud , Jean-Baptiste Alayrac , Jiahui Yu , Radu Soricut , Johan Schalkwyk , Andrew M. Dai , Anja Hauth

show 1334 more authors

Katie Millican David Silver Melvin Johnson Ioannis Antonoglou Julian Schrittwieser Amelia Glaese Jilin Chen Emily Pitler Timothy Lillicrap Angeliki Lazaridou Orhan Firat James Molloy Michael Isard Paul R. Barham Tom Hennigan Benjamin Lee Fabio Viola Malcolm Reynolds Yuanzhong Xu Ryan Doherty Eli Collins Clemens Meyer Eliza Rutherford Erica Moreira Kareem Ayoub Megha Goel Jack Krawczyk Cosmo Du Ed Chi Heng-Tze Cheng Eric Ni Purvi Shah Patrick Kane Betty Chan Manaal Faruqui Aliaksei Severyn Hanzhao Lin Yaguang Li Yong Cheng Abe Ittycheriah Mahdis Mahdieh Mia Chen Pei Sun Dustin Tran Sumit Bagri Balaji Lakshminarayanan Jeremiah Liu Andras Orban Fabian G\"ura Hao Zhou Xinying Song Aurelien Boffy Harish Ganapathy Steven Zheng HyunJeong Choe \'Agoston Weisz Tao Zhu Yifeng Lu Siddharth Gopal Jarrod Kahn Maciej Kula Jeff Pitman Rushin Shah Emanuel Taropa Majd Al Merey Martin Baeuml Zhifeng Chen Laurent El Shafey Yujing Zhang Olcan Sercinoglu George Tucker Enrique Piqueras Maxim Krikun Iain Barr Nikolay Savinov Ivo Danihelka Becca Roelofs Ana\"is White Anders Andreassen Tamara von Glehn Lakshman Yagati Mehran Kazemi Lucas Gonzalez Misha Khalman Jakub Sygnowski Alexandre Frechette Charlotte Smith Laura Culp Lev Proleev Yi Luan Xi Chen James Lottes Nathan Schucher Federico Lebron Alban Rrustemi Natalie Clay Phil Crone Tomas Kocisky Jeffrey Zhao Bartek Perz Dian Yu Heidi Howard Adam Bloniarz Jack W. Rae Han Lu Laurent Sifre Marcello Maggioni Fred Alcober Dan Garrette Megan Barnes Shantanu Thakoor Jacob Austin Gabriel Barth-Maron William Wong Rishabh Joshi Rahma Chaabouni Deeni Fatiha Arun Ahuja Gaurav Singh Tomar Evan Senter Martin Chadwick Ilya Kornakov Nithya Attaluri I\~naki Iturrate Ruibo Liu Yunxuan Li Sarah Cogan Jeremy Chen Chao Jia Chenjie Gu Qiao Zhang Jordan Grimstad Ale Jakse Hartman Xavier Garcia Thanumalayan Sankaranarayana Pillai Jacob Devlin Michael Laskin Diego de las Casas Dasha Valter Connie Tao Lorenzo Blanco Adri\`a Puigdom\`enech Badia David Reitter Mianna Chen Jenny Brennan Clara Rivera Sergey Brin Shariq Iqbal Gabriela Surita Jane Labanowski Abhi Rao Stephanie Winkler Emilio Parisotto Yiming Gu Kate Olszewska Ravi Addanki Antoine Miech Annie Louis Denis Teplyashin Geoff Brown Elliot Catt Jan Balaguer Jackie Xiang Pidong Wang Zoe Ashwood Anton Briukhov Albert Webson Sanjay Ganapathy Smit Sanghavi Ajay Kannan Ming-Wei Chang Axel Stjerngren Josip Djolonga Yuting Sun Ankur Bapna Matthew Aitchison Pedram Pejman Henryk Michalewski Tianhe Yu Cindy Wang Juliette Love Junwhan Ahn Dawn Bloxwich Kehang Han Peter Humphreys Thibault Sellam James Bradbury Varun Godbole Sina Samangooei Bogdan Damoc Alex Kaskasoli S\'ebastien M. R. Arnold Vijay Vasudevan Shubham Agrawal Jason Riesa Dmitry Lepikhin Richard Tanburn Srivatsan Srinivasan Hyeontaek Lim Sarah Hodkinson Pranav Shyam Johan Ferret Steven Hand Ankush Garg Tom Le Paine Jian Li Yujia Li Minh Giang Alexander Neitz Zaheer Abbas Sarah York Machel Reid Elizabeth Cole Aakanksha Chowdhery Dipanjan Das Dominika Rogozi\'nska Vitaliy Nikolaev Pablo Sprechmann Zachary Nado Lukas Zilka Flavien Prost Luheng He Marianne Monteiro Gaurav Mishra Chris Welty Josh Newlan Dawei Jia Miltiadis Allamanis Clara Huiyi Hu Raoul de Liedekerke Justin Gilmer Carl Saroufim Shruti Rijhwani Shaobo Hou Disha Shrivastava Anirudh Baddepudi Alex Goldin Adnan Ozturel Albin Cassirer Yunhan Xu Daniel Sohn Devendra Sachan Reinald Kim Amplayo Craig Swanson Dessie Petrova Shashi Narayan Arthur Guez Siddhartha Brahma Jessica Landon Miteyan Patel Ruizhe Zhao Kevin Villela Luyu Wang Wenhao Jia Matthew Rahtz Mai Gim\'enez Legg Yeung James Keeling Petko Georgiev Diana Mincu Boxi Wu Salem Haykal Rachel Saputro Kiran Vodrahalli James Qin Zeynep Cankara Abhanshu Sharma Nick Fernando Will Hawkins Behnam Neyshabur Solomon Kim Adrian Hutter Priyanka Agrawal Alex Castro-Ros George van den Driessche Tao Wang Fan Yang Shuo-yiin Chang Paul Komarek Ross Mcilroy Mario Lu\v{c}i\'c Guodong Zhang Wael Farhan Michael Sharman Paul Natsev Paul Michel Yamini Bansal Siyuan Qiao Kris Cao Siamak Shakeri Christina Butterfield Justin Chung Paul Kishan Rubenstein Shivani Agrawal Arthur Mensch Kedar Soparkar Karel Lenc Timothy Chung Aedan Pope Loren Maggiore Jackie Kay Priya Jhakra Shibo Wang Joshua Maynez Mary Phuong Taylor Tobin Andrea Tacchetti Maja Trebacz Kevin Robinson Yash Katariya Sebastian Riedel Paige Bailey Kefan Xiao Nimesh Ghelani Lora Aroyo Ambrose Slone Neil Houlsby Xuehan Xiong Zhen Yang Elena Gribovskaya Jonas Adler Mateo Wirth Lisa Lee Music Li Thais Kagohara Jay Pavagadhi Sophie Bridgers Anna Bortsova Sanjay Ghemawat Zafarali Ahmed Tianqi Liu Richard Powell Vijay Bolina Mariko Iinuma Polina Zablotskaia James Besley Da-Woon Chung Timothy Dozat Ramona Comanescu Xiance Si Jeremy Greer Guolong Su Martin Polacek Rapha\"el Lopez Kaufman Simon Tokumine Hexiang Hu Elena Buchatskaya Yingjie Miao Mohamed Elhawaty Aditya Siddhant Nenad Tomasev Jinwei Xing Christina Greer Helen Miller Shereen Ashraf Aurko Roy Zizhao Zhang Ada Ma Angelos Filos Milos Besta Rory Blevins Ted Klimenko Chih-Kuan Yeh Soravit Changpinyo Jiaqi Mu Oscar Chang Mantas Pajarskas Carrie Muir Vered Cohen Charline Le Lan Krishna Haridasan Amit Marathe Steven Hansen Sholto Douglas Rajkumar Samuel Mingqiu Wang Sophia Austin Chang Lan Jiepu Jiang Justin Chiu Jaime Alonso Lorenzo Lars Lowe Sj\"osund S\'ebastien Cevey Zach Gleicher Thi Avrahami Anudhyan Boral Hansa Srinivasan Vittorio Selo Rhys May Konstantinos Aisopos L\'eonard Hussenot Livio Baldini Soares Kate Baumli Michael B. Chang Adri\`a Recasens Ben Caine Alexander Pritzel Filip Pavetic Fabio Pardo Anita Gergely Justin Frye Vinay Ramasesh Dan Horgan Kartikeya Badola Nora Kassner Subhrajit Roy Ethan Dyer V\'ictor Campos Campos Alex Tomala Yunhao Tang Dalia El Badawy Elspeth White Basil Mustafa Oran Lang Abhishek Jindal Sharad Vikram Zhitao Gong Sergi Caelles Ross Hemsley Gregory Thornton Fangxiaoyu Feng Wojciech Stokowiec Ce Zheng Phoebe Thacker \c{C}a\u{g}lar \"Unl\"u Zhishuai Zhang Mohammad Saleh James Svensson Max Bileschi Piyush Patil Ankesh Anand Roman Ring Katerina Tsihlas Arpi Vezer Marco Selvi Toby Shevlane Mikel Rodriguez Tom Kwiatkowski Samira Daruki Keran Rong Allan Dafoe Nicholas FitzGerald Keren Gu-Lemberg Mina Khan Lisa Anne Hendricks Marie Pellat Vladimir Feinberg James Cobon-Kerr Tara Sainath Maribeth Rauh Sayed Hadi Hashemi Richard Ives Yana Hasson Eric Noland Yuan Cao Nathan Byrd Le Hou Qingze Wang Thibault Sottiaux Michela Paganini Jean-Baptiste Lespiau Alexandre Moufarek Samer Hassan Kaushik Shivakumar Joost van Amersfoort Amol Mandhane Pratik Joshi Anirudh Goyal Matthew Tung Andrew Brock Hannah Sheahan Vedant Misra Cheng Li Nemanja Raki\'cevi\'c Mostafa Dehghani Fangyu Liu Sid Mittal Junhyuk Oh Seb Noury Eren Sezener Fantine Huot Matthew Lamm Nicola De Cao Charlie Chen Sidharth Mudgal Romina Stella Kevin Brooks Gautam Vasudevan Chenxi Liu Mainak Chain Nivedita Melinkeri Aaron Cohen Venus Wang Kristie Seymore Sergey Zubkov Rahul Goel Summer Yue Sai Krishnakumaran Brian Albert Nate Hurley Motoki Sano Anhad Mohananey Jonah Joughin Egor Filonov Tomasz K\k{e}pa Yomna Eldawy Jiawern Lim Rahul Rishi Shirin Badiezadegan Taylor Bos Jerry Chang Sanil Jain Sri Gayatri Sundara Padmanabhan Subha Puttagunta Kalpesh Krishna Leslie Baker Norbert Kalb Vamsi Bedapudi Adam Kurzrok Shuntong Lei Anthony Yu Oren Litvin Xiang Zhou Zhichun Wu Sam Sobell Andrea Siciliano Alan Papir Robby Neale Jonas Bragagnolo Tej Toor Tina Chen Valentin Anklin Feiran Wang Richie Feng Milad Gholami Kevin Ling Lijuan Liu Jules Walter Hamid Moghaddam Arun Kishore Jakub Adamek Tyler Mercado Jonathan Mallinson Siddhinita Wandekar Stephen Cagle Eran Ofek Guillermo Garrido Clemens Lombriser Maksim Mukha Botu Sun Hafeezul Rahman Mohammad Josip Matak Yadi Qian Vikas Peswani Pawel Janus Quan Yuan Leif Schelin Oana David Ankur Garg Yifan He Oleksii Duzhyi Anton \"Algmyr Timoth\'ee Lottaz Qi Li Vikas Yadav Luyao Xu Alex Chinien Rakesh Shivanna Aleksandr Chuklin Josie Li Carrie Spadine Travis Wolfe Kareem Mohamed Subhabrata Das Zihang Dai Kyle He Daniel von Dincklage Shyam Upadhyay Akanksha Maurya Luyan Chi Sebastian Krause Khalid Salama Pam G Rabinovitch Pavan Kumar Reddy M Aarush Selvan Mikhail Dektiarev Golnaz Ghiasi Erdem Guven Himanshu Gupta Boyi Liu Deepak Sharma Idan Heimlich Shtacher Shachi Paul Oscar Akerlund Fran\c{c}ois-Xavier Aubet Terry Huang Chen Zhu Eric Zhu Elico Teixeira Matthew Fritze Francesco Bertolini Liana-Eleonora Marinescu Martin B\"olle Dominik Paulus Khyatti Gupta Tejasi Latkar Max Chang Jason Sanders Roopa Wilson Xuewei Wu Yi-Xuan Tan Lam Nguyen Thiet Tulsee Doshi Sid Lall Swaroop Mishra Wanming Chen Thang Luong Seth Benjamin Jasmine Lee Ewa Andrejczuk Dominik Rabiej Vipul Ranjan Krzysztof Styrc Pengcheng Yin Jon Simon Malcolm Rose Harriott Mudit Bansal Alexei Robsky Geoff Bacon David Greene Daniil Mirylenka Chen Zhou Obaid Sarvana Abhimanyu Goyal Samuel Andermatt Patrick Siegler Ben Horn Assaf Israel Francesco Pongetti Chih-Wei "Louis" Chen Marco Selvatici Pedro Silva Kathie Wang Jackson Tolins Kelvin Guu Roey Yogev Xiaochen Cai Alessandro Agostini Maulik Shah Hung Nguyen Noah \'O Donnaile S\'ebastien Pereira Linda Friso Adam Stambler Chenkai Kuang Yan Romanikhin Mark Geller ZJ Yan Kane Jang Cheng-Chun Lee Wojciech Fica Eric Malmi Qijun Tan Dan Banica Daniel Balle Ryan Pham Yanping Huang Diana Avram Hongzhi Shi Jasjot Singh Chris Hidey Niharika Ahuja Pranab Saxena Dan Dooley Srividya Pranavi Potharaju Eileen O'Neill Anand Gokulchandran Ryan Foley Kai Zhao Mike Dusenberry Yuan Liu Pulkit Mehta Ragha Kotikalapudi Chalence Safranek-Shrader Andrew Goodman Joshua Kessinger Eran Globen Prateek Kolhar Chris Gorgolewski Ali Ibrahim Yang Song Ali Eichenbaum Thomas Brovelli Sahitya Potluri Preethi Lahoti Cip Baetu Ali Ghorbani Charles Chen Andy Crawford Shalini Pal Mukund Sridhar Petru Gurita Asier Mujika Igor Petrovski Pierre-Louis Cedoz Chenmei Li Shiyuan Chen Niccol\`o Dal Santo Siddharth Goyal Jitesh Punjabi Karthik Kappaganthu Chester Kwak Pallavi LV Sarmishta Velury Himadri Choudhury Jamie Hall Premal Shah Ricardo Figueira Matt Thomas Minjie Lu Ting Zhou Chintu Kumar Thomas Jurdi Sharat Chikkerur Yenai Ma Adams Yu Soo Kwak Victor \"Ahdel Sujeevan Rajayogam Travis Choma Fei Liu Aditya Barua Colin Ji Ji Ho Park Vincent Hellendoorn Alex Bailey Taylan Bilal Huanjie Zhou Mehrdad Khatir Charles Sutton Wojciech Rzadkowski Fiona Macintosh Roopali Vij Konstantin Shagin Paul Medina Chen Liang Jinjing Zhou Pararth Shah Yingying Bi Attila Dankovics Shipra Banga Sabine Lehmann Marissa Bredesen Zifan Lin John Eric Hoffmann Jonathan Lai Raynald Chung Kai Yang Nihal Balani Arthur Bra\v{z}inskas Andrei Sozanschi Matthew Hayes H\'ector Fern\'andez Alcalde Peter Makarov Will Chen Antonio Stella Liselotte Snijders Michael Mandl Ante K\"arrman Pawe{\l} Nowak Xinyi Wu Alex Dyck Krishnan Vaidyanathan Raghavender R Jessica Mallet Mitch Rudominer Eric Johnston Sushil Mittal Akhil Udathu Janara Christensen Vishal Verma Zach Irving Andreas Santucci Gamaleldin Elsayed Elnaz Davoodi Marin Georgiev Ian Tenney Nan Hua Geoffrey Cideron Edouard Leurent Mahmoud Alnahlawi Ionut Georgescu Nan Wei Ivy Zheng Dylan Scandinaro Heinrich Jiang Jasper Snoek Mukund Sundararajan Xuezhi Wang Zack Ontiveros Itay Karo Jeremy Cole Vinu Rajashekhar Lara Tumeh Eyal Ben-David Rishub Jain Jonathan Uesato Romina Datta Oskar Bunyan Shimu Wu John Zhang Piotr Stanczyk Ye Zhang David Steiner Subhajit Naskar Michael Azzam Matthew Johnson Adam Paszke Chung-Cheng Chiu Jaume Sanchez Elias Afroz Mohiuddin Faizan Muhammad Jin Miao Andrew Lee Nino Vieillard Jane Park Jiageng Zhang Jeff Stanway Drew Garmon Abhijit Karmarkar Zhe Dong Jong Lee Aviral Kumar Luowei Zhou Jonathan Evens William Isaac Geoffrey Irving Edward Loper Michael Fink Isha Arkatkar Nanxin Chen Izhak Shafran Ivan Petrychenko Zhe Chen Johnson Jia Anselm Levskaya Zhenkai Zhu Peter Grabowski Yu Mao Alberto Magni Kaisheng Yao Javier Snaider Norman Casagrande Evan Palmer Paul Suganthan Alfonso Casta\~no Irene Giannoumis Wooyeol Kim Miko{\l}aj Rybi\'nski Ashwin Sreevatsa Jennifer Prendki David Soergel Adrian Goedeckemeyer Willi Gierke Mohsen Jafari Meenu Gaba Jeremy Wiesner Diana Gage Wright Yawen Wei Harsha Vashisht Yana Kulizhskaya Jay Hoover Maigo Le Lu Li Chimezie Iwuanyanwu Lu Liu Kevin Ramirez Andrey Khorlin Albert Cui Tian LIN Marcus Wu Ricardo Aguilar Keith Pallo Abhishek Chakladar Ginger Perng Elena Allica Abellan Mingyang Zhang Ishita Dasgupta Nate Kushman Ivo Penchev Alena Repina Xihui Wu Tom van der Weide Priya Ponnapalli Caroline Kaplan Jiri Simsa Shuangfeng Li Olivier Dousse Jeff Piper Nathan Ie Rama Pasumarthi Nathan Lintz Anitha Vijayakumar Daniel Andor Pedro Valenzuela Minnie Lui Cosmin Paduraru Daiyi Peng Katherine Lee Shuyuan Zhang Somer Greene Duc Dung Nguyen Paula Kurylowicz Cassidy Hardin Lucas Dixon Lili Janzer Kiam Choo Ziqiang Feng Biao Zhang Achintya Singhal Dayou Du Dan McKinnon Natasha Antropova Tolga Bolukbasi Orgad Keller David Reid Daniel Finchelstein Maria Abi Raad Remi Crocker Peter Hawkins Robert Dadashi Colin Gaffney Ken Franko Anna Bulanova R\'emi Leblond Shirley Chung Harry Askham Luis C. Cobo Kelvin Xu Felix Fischer Jun Xu Christina Sorokin Chris Alberti Chu-Cheng Lin Colin Evans Alek Dimitriev Hannah Forbes Dylan Banarse Zora Tung Mark Omernick Colton Bishop Rachel Sterneck Rohan Jain Jiawei Xia Ehsan Amid Francesco Piccinno Xingyu Wang Praseem Banzal Daniel J. Mankowitz Alex Polozov Victoria Krakovna Sasha Brown MohammadHossein Bateni Dennis Duan Vlad Firoiu Meghana Thotakuri Tom Natan Matthieu Geist Ser tan Girgin Hui Li Jiayu Ye Ofir Roval Reiko Tojo Michael Kwong James Lee-Thorp Christopher Yew Danila Sinopalnikov Sabela Ramos John Mellor Abhishek Sharma Kathy Wu David Miller Nicolas Sonnerat Denis Vnukov Rory Greig Jennifer Beattie Emily Caveness Libin Bai Julian Eisenschlos Alex Korchemniy Tomy Tsai Mimi Jasarevic Weize Kong Phuong Dao Zeyu Zheng Frederick Liu Rui Zhu Tian Huey Teh Jason Sanmiya Evgeny Gladchenko Nejc Trdin Daniel Toyama Evan Rosen Sasan Tavakkol Linting Xue Chen Elkind Oliver Woodman John Carpenter George Papamakarios Rupert Kemp Sushant Kafle Tanya Grunina Rishika Sinha Alice Talbert Diane Wu Denese Owusu-Afriyie Chloe Thornton Jordi Pont-Tuset Pradyumna Narayana Jing Li Saaber Fatehi John Wieting Omar Ajmeri Benigno Uria Yeongil Ko Laura Knight Am\'elie H\'eliou Ning Niu Shane Gu Chenxi Pang Yeqing Li Nir Levine Ariel Stolovich Rebeca Santamaria-Fernandez Sonam Goenka Wenny Yustalim Robin Strudel Ali Elqursh Charlie Deck Hyo Lee Zonglin Li Kyle Levin Raphael Hoffmann Dan Holtmann-Rice Olivier Bachem Sho Arora Christy Koh Soheil Hassas Yeganeh Siim P\~oder Mukarram Tariq Yanhua Sun Lucian Ionita Mojtaba Seyedhosseini Pouya Tafti Zhiyu Liu Anmol Gulati Jasmine Liu Xinyu Ye Bart Chrzaszcz Lily Wang Nikhil Sethi Tianrun Li Ben Brown Shreya Singh Wei Fan Aaron Parisi Joe Stanton Vinod Koverkathu Christopher A. Choquette-Choo Yunjie Li TJ Lu Prakash Shroff Mani Varadarajan Sanaz Bahargam Rob Willoughby David Gaddy Guillaume Desjardins Marco Cornero Brona Robenek Bhavishya Mittal Ben Albrecht Ashish Shenoy Fedor Moiseev Henrik Jacobsson Alireza Ghaffarkhah Morgane Rivi\`ere Alanna Walton Cl\'ement Crepy Alicia Parrish Zongwei Zhou Clement Farabet Carey Radebaugh Praveen Srinivasan Claudia van der Salm Andreas Fidjeland Salvatore Scellato Eri Latorre-Chimoto Hanna Klimczak-Pluci\'nska David Bridson Dario de Cesare Tom Hudson Piermaria Mendolicchio Lexi Walker Alex Morris Matthew Mauger Alexey Guseynov Alison Reid Seth Odoom Lucia Loher Victor Cotruta Madhavi Yenugula Dominik Grewe Anastasia Petrushkina Tom Duerig Antonio Sanchez Steve Yadlowsky Amy Shen Amir Globerson Lynette Webb Sahil Dua Dong Li Surya Bhupatiraju Dan Hurt Haroon Qureshi Ananth Agarwal Tomer Shani Matan Eyal Anuj Khare Shreyas Rammohan Belle Lei Wang Chetan Tekur Mihir Sanjay Kale Jinliang Wei Ruoxin Sang Brennan Saeta Tyler Liechty Yi Sun Yao Zhao Stephan Lee Pandu Nayak Doug Fritz Manish Reddy Vuyyuru John Aslanides Nidhi Vyas Martin Wicke Xiao Ma Evgenii Eltyshev Nina Martin Hardie Cate James Manyika Keyvan Amiri Yelin Kim Xi Xiong Kai Kang Florian Luisier Nilesh Tripuraneni David Madras Mandy Guo Austin Waters Oliver Wang Joshua Ainslie Jason Baldridge Han Zhang Garima Pruthi Jakob Bauer Feng Yang Riham Mansour Jason Gelman Yang Xu George Polovets Ji Liu Honglong Cai Warren Chen XiangHai Sheng Emily Xue Sherjil Ozair Christof Angermueller Xiaowei Li Anoop Sinha Weiren Wang Julia Wiesinger Emmanouil Koukoumidis Yuan Tian Anand Iyer Madhu Gurumurthy Mark Goldenson Parashar Shah MK Blake Hongkun Yu Anthony Urbanowicz Jennimaria Palomaki Chrisantha Fernando Ken Durden Harsh Mehta Nikola Momchev Elahe Rahimtoroghi Maria Georgaki Amit Raul Sebastian Ruder Morgan Redshaw Jinhyuk Lee Denny Zhou Komal Jalan Dinghua Li Blake Hechtman Parker Schuh Milad Nasr Kieran Milan Vladimir Mikulik Juliana Franco Tim Green Nam Nguyen Joe Kelley Aroma Mahendru Andrea Hu Joshua Howland Ben Vargas Jeffrey Hui Kshitij Bansal Vikram Rao Rakesh Ghiya Emma Wang Ke Ye Jean Michel Sarr Melanie Moranski Preston Madeleine Elish Steve Li Aakash Kaku Jigar Gupta Ice Pasupat Da-Cheng Juan Milan Someswar Tejvi M. Xinyun Chen Aida Amini Alex Fabrikant Eric Chu Xuanyi Dong Amruta Muthal Senaka Buthpitiya Sarthak Jauhari Urvashi Khandelwal Ayal Hitron Jie Ren Larissa Rinaldi Shahar Drath Avigail Dabush Nan-Jiang Jiang Harshal Godhia Uli Sachs Anthony Chen Yicheng Fan Hagai Taitelbaum Hila Noga Zhuyun Dai James Wang Jenny Hamer Chun-Sung Ferng Chenel Elkind Aviel Atias Paulina Lee V\'it List\'ik Mathias Carlen Jan van de Kerkhof Marcin Pikus Krunoslav Zaher Paul M\"uller Sasha Zykova Richard Stefanec Vitaly Gatsko Christoph Hirnschall Ashwin Sethi Xingyu Federico Xu Chetan Ahuja Beth Tsai Anca Stefanoiu Bo Feng Keshav Dhandhania Manish Katyal Akshay Gupta Atharva Parulekar Divya Pitta Jing Zhao Vivaan Bhatia Yashodha Bhavnani Omar Alhadlaq Xiaolin Li Peter Danenberg Dennis Tu Alex Pine Vera Filippova Abhipso Ghosh Ben Limonchik Bhargava Urala Chaitanya Krishna Lanka Derik Clive Edward Li Hao Wu Kevin Hongtongsak Ianna Li Kalind Thakkar Kuanysh Omarov Kushal Majmundar Michael Alverson Michael Kucharski Mohak Patel Mudit Jain Maksim Zabelin Paolo Pelagatti Rohan Kohli Saurabh Kumar Joseph Kim Swetha Sankar Vineet Shah Lakshmi Ramachandruni Xiangkai Zeng Ben Bariach Laura Weidinger Tu Vu Alek Andreev Antoine He Kevin Hui Sheleem Kashem Amar Subramanya Sissie Hsiao Demis Hassabis Koray Kavukcuoglu Adam Sadovsky Quoc Le Trevor Strohman Yonghui Wu Slav Petrov Jeffrey Dean Oriol Vinyals

Authors on Pith no claims yet

classification 💻 cs.CL cs.AIcs.CV

keywords geminifamilybenchmarksmodelsmultimodalcapabilitiesmodelreasoning

0 comments

read the original abstract

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
cs.CL 2026-05 accept novelty 8.0

CiteVQA requires models to cite specific document regions with bounding boxes alongside answers and finds that even the strongest MLLMs frequently cite the wrong region, with top SAA scores of only 76.0 for closed mod...
Cross-Modal Backdoors in Multimodal Large Language Models
cs.CR 2026-05 unverdicted novelty 8.0

Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.
Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers
cs.LG 2026-05 unverdicted novelty 8.0

A standard Transformer with O(ε^{-d0/α}) blocks can approximate any bounded d0-dimensional Hölder function of smoothness α to accuracy ε, but at least Ω(ε^{-d0/(4α)}) blocks are required.
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
cs.LG 2026-05 unverdicted novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.
Efficient Preference Poisoning Attack on Offline RLHF
cs.LG 2026-05 unverdicted novelty 8.0

Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
cs.SE 2026-04 unverdicted novelty 8.0

MLLMs exhibit a Mirage effect by bypassing circuit diagrams in favor of header semantics for Verilog generation; VeriGround with identifier anonymization and D-ORPO training reaches 46% Functional Pass@1 while refusin...
S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images
cs.CV 2026-04 unverdicted novelty 8.0

S1-VL combines structured scientific reasoning with iterative image manipulation via code execution to reach state-of-the-art results on visual and scientific reasoning benchmarks.
When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models
cs.CV 2026-04 unverdicted novelty 8.0

VLMs hallucinate by prioritizing contradictory on-screen text over visual content, addressed via the VisualTextTrap benchmark with 6,057 human-validated samples and the VTHM-MoE dual-encoder framework using dimension-...
Diffusion-CAM: Faithful Visual Explanations for dMLLMs
cs.AI 2026-04 unverdicted novelty 8.0

Diffusion-CAM is the first method for visual explanations in dMLLMs, using differentiable probing of intermediates plus four refinement modules to produce activation maps that outperform prior CAM approaches in locali...
PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos
cs.CV 2026-04 unverdicted novelty 8.0

PinpointQA is the first benchmark dataset for small object-centric spatial understanding in indoor videos, with four tasks showing MLLM capability gaps that improve via supervised fine-tuning.
HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing
cs.CV 2026-04 accept novelty 8.0

HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.
TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation
cs.CR 2026-04 unverdicted novelty 8.0

TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
cs.CR 2026-04 unverdicted novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks
cs.AI 2026-04 unverdicted novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction parado...
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
cs.CL 2024-09 accept novelty 8.0

MMMU-Pro is a stricter multimodal benchmark that removes text-only solvable questions, augments options, and requires reading text from images, yielding substantially lower model scores of 16.8-26.9%.
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
cs.CR 2024-06 unverdicted novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
cs.AI 2024-04 accept novelty 8.0

OSWorld provides the first unified real-computer benchmark for open-ended multimodal agent tasks, exposing large performance gaps between humans and state-of-the-art LLM/VLM agents.
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
cs.CV 2026-05 unverdicted novelty 7.0

SceneFunRI benchmark shows current VLMs struggle severely with inferring locations of invisible functional objects, with the strongest model (Gemini 3 Flash) reaching only 15.20 CAcc@75.
GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding
cs.CV 2026-05 unverdicted novelty 7.0

GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolutio...
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
cs.CL 2026-05 unverdicted novelty 7.0

New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
cs.LG 2026-05 unverdicted novelty 7.0

BBCritic uses contrastive learning to align GUI actions in a continuous affordance space, outperforming larger binary critic models on a new four-level hierarchical benchmark while enabling zero-shot transfer.
CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL
cs.CV 2026-05 conditional novelty 7.0

CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight...
Sampling from Flow Language Models via Marginal-Conditioned Bridges
cs.LG 2026-05 unverdicted novelty 7.0

Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and r...
CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models
cs.CV 2026-05 conditional novelty 7.0

LiteLVLM prunes visual tokens for pixel grounding by reversing CLIP visual-text similarity to retain referent region tokens, outperforming prior methods by over 5% with 22% speedup and 2.3x memory reduction without an...
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
cs.RO 2026-05 unverdicted novelty 7.0

MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
cs.CV 2026-05 unverdicted novelty 7.0

CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
G$^2$TR: Generation-Guided Visual Token Reduction for Separate-Encoder Unified Multimodal Models
cs.CV 2026-05 unverdicted novelty 7.0

G²TR reduces visual tokens and prefill computation by 1.94x in separate-encoder UMMs via generation-guided importance from VAE latent consistency while preserving reasoning accuracy and editing quality.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 7.0

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
cs.CR 2026-05 unverdicted novelty 7.0

PII can be reconstructed from SFT models via prefix attacks, with the new COVA algorithm improving success rates and leakage varying by attacker knowledge and PII type.
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
cs.MM 2026-05 unverdicted novelty 7.0

UniPath adaptively models coordination-path diversity in unified multimodal models by training a path-conditioned executor and using a lightweight planner for input-dependent selection, improving performance over fixe...
Kairos: A Scalable Serving System for Physical AI
cs.RO 2026-05 unverdicted novelty 7.0

Kairos is the first multi-robot serving system that treats the generate-execute loop as a first-class citizen and reduces average task latency by 31.8-66.5% versus digital AI serving systems.
HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
cs.CL 2026-05 unverdicted novelty 7.0

Hebatron is the first open-weight Hebrew MoE LLM adapted from Nemotron-3, reaching 73.8% on Hebrew reasoning benchmarks while activating only 3B parameters per pass and supporting 65k-token context.
ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
cs.RO 2026-05 unverdicted novelty 7.0

ALAM creates algebraically consistent latent action transitions from videos to act as auxiliary generative targets, raising robot policy success rates from 47.9% to 85.0% on MetaWorld MT50 and 94.1% to 98.1% on LIBERO.
PhyGround: Benchmarking Physical Reasoning in Generative World Models
cs.CV 2026-05 accept novelty 7.0

PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
cs.CY 2026-05 unverdicted novelty 7.0

StereoTales shows that LLMs produce harmful, culturally adapted stereotypes in open-ended multilingual stories, with patterns consistent across providers and aligned human-LLM harm judgments.
StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
cs.CY 2026-05 accept novelty 7.0

StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
cs.AI 2026-05 unverdicted novelty 7.0

PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
cs.CV 2026-05 unverdicted novelty 7.0

SciVQR is a new benchmark dataset for evaluating multimodal AI models on complex scientific reasoning tasks across six disciplines, including expert solutions for nearly half the items.
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
cs.CV 2026-05 unverdicted novelty 7.0

ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
cs.AI 2026-05 conditional novelty 7.0

State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating l...
MOTOR-Bench: A Real-world Dataset and Multi-agent Framework for Zero-shot Human Mental State Understanding
cs.CV 2026-05 unverdicted novelty 7.0

MOTOR-Bench supplies a real-world video dataset for structured mental state understanding in learning settings, while MOTOR-MAS improves zero-shot prediction of behavior, cognition, and emotion labels over single mode...
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
cs.CL 2026-05 conditional novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

SeePhys Pro benchmark reveals multimodal models degrade on physics reasoning as information transfers from text to images, with blind training improvements often stemming from textual cues rather than visual evidence.
SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

Multimodal AI models for physics reasoning lose performance when information shifts from text to images, and RLVR training gains often come from non-visual textual or distributional cues rather than actual visual evidence.
VORT: Adaptive Power-Law Memory for NLP Transformers
cs.LG 2026-05 unverdicted novelty 7.0

VORT assigns learnable fractional orders to tokens and approximates their power-law retention kernels via sum-of-exponentials for efficient long-range dependency modeling in transformers.
PPI2Text: Captioning Protein-Protein Interactions with Coordinate-Aligned Pair-Map Decoding
cs.CE 2026-05 unverdicted novelty 7.0

PPI2Text generates natural-language captions for protein-protein interactions from sequences by encoding each protein with ESM3, building a residue-pair map, and decoding with Qwen3 using coordinate-aligned positional...
SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding
cs.CV 2026-05 unverdicted novelty 7.0

SYNCR benchmark shows leading MLLMs reach only 52.5% average accuracy on cross-video reasoning tasks against an 89.5% human baseline, with major weaknesses in physical and spatial reasoning.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
cs.RO 2026-05 unverdicted novelty 7.0

MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency...
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
cs.RO 2026-05 unverdicted novelty 7.0

MemCompiler reframes memory use as state-conditioned compilation, delivering relevant guidance via text and latent channels to improve embodied agent performance up to 129% and cut latency 60% versus static injection.
Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs
cs.CV 2026-05 unverdicted novelty 7.0

ScaleEarth conditions remote sensing VLMs on continuous GSD via CS-HLoRA and a visual GSD predictor, creating a closed training loop with GeoScale-VQA to achieve SOTA on Earth observation benchmarks.
Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

LLM agents reach only 50.6% accuracy on chemical cost estimation within 25% error even with tools, dropping with noise due to parsing, pack selection, and tool-use failures.
Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding
cs.CV 2026-05 unverdicted novelty 7.0

Qwen3-VL-Seg decodes MLLM bounding boxes into pixel-level referring segmentation via a lightweight box-guided mask decoder, new SA1B-ORS training data, and ORS-Bench evaluation, showing strong open-world performance.
$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses
cs.LG 2026-05 unverdicted novelty 7.0

The paper establishes the first O(log T) regret and O(1/T) sub-optimality bounds for online RLHF under general f-divergence regularization via two sampling algorithms.
Rollback-Free Stable Brick Structures Generation
cs.LG 2026-05 unverdicted novelty 7.0

Reinforcement learning internalizes physical stability rules for brick structures, enabling the first rollback-free generation with orders-of-magnitude faster inference.
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media
cs.CL 2026-05 unverdicted novelty 7.0

MultiSoc-4D benchmark shows LLMs annotating Bengali social media exhibit instruction-induced label collapse, preferring fallback labels and missing 79% of hate speech and 75% of sarcasm instances despite high agreemen...
IntentGrasp: A Comprehensive Benchmark for Intent Understanding
cs.CL 2026-05 unverdicted novelty 7.0

IntentGrasp benchmark demonstrates that LLMs have low intent understanding capabilities, with most models underperforming random guessing on a challenging subset, but Intentional Fine-Tuning provides large improvements.
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
cs.CR 2026-05 unverdicted novelty 7.0

PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM
cs.CL 2026-05 unverdicted novelty 7.0

TextPro-SLM minimizes the speech-text modality gap from the input side via a prosody-aware unified encoder, delivering the lowest gap and strong performance at 3B/7B scales with only ~1000 hours of audio.
Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models
cs.CV 2026-05 unverdicted novelty 7.0

ContactPrompt uses part-wise vertex grids and multi-stage part-conditioned reasoning in MLLMs to achieve training-free dense hand contact estimation that outperforms prior supervised methods.
Retain-Neutral Surrogates for Min-Max Unlearning
cs.LG 2026-05 unverdicted novelty 7.0

ROSU derives a closed-form retain-neutral perturbation for min-max unlearning that bounds retain damage via curvature and improves performance when gradients are aligned.