Noam Shazeer — Pith Author Registry

Identifiers

name variant Noam Shazeer 0.60 · backfill

Papers (32)

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025 · author #2478
Gemma 3 Technical Report cs.CL · 2025 · author #198
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022 · author #18
ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022 · author #7
LaMDA: Language Models for Dialog Applications cs.CL · 2022 · author #4
GSPMD: General and Scalable Parallelization for ML Computation Graphs cs.DC · 2021 · author #12
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021 · author #3
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding cs.CL · 2020 · author #8
GLU Variants Improve Transformer cs.LG · 2020 · author #1
How Much Knowledge Can You Pack Into the Parameters of a Language Model? cs.CL · 2020 · author #3
Fast Transformer Decoding: One Write-Head is All You Need cs.NE · 2019 · author #1
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019 · author #2
Corpora Generation for Grammatical Error Correction cs.CL · 2019 · author #4
Blockwise Parallel Decoding for Deep Autoregressive Models cs.LG · 2018 · author #2
Mesh-TensorFlow: Deep Learning for Supercomputers cs.LG · 2018 · author #1
Weakly Supervised Grammatical Error Correction using Iterative Decoding cs.CL · 2018 · author #4
Music Transformer cs.LG · 2018 · author #4
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost cs.LG · 2018 · author #1
Tensor2Tensor for Neural Machine Translation cs.LG · 2018 · author #12
Fast Decoding in Sequence Models using Discrete Latent Variables cs.LG · 2018 · author #7
Image Transformer cs.CV · 2018 · author #5
Generating Wikipedia by Summarizing Long Sequences cs.CL · 2018 · author #7
One Model To Learn Them All cs.LG · 2017 · author #3
Attention Is All You Need cs.CL · 2017 · author #2
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer cs.LG · 2017 · author #1
NN-grams: Unifying neural network and n-gram language models for Speech Recognition cs.CL · 2016 · author #3
Exploring the Limits of Language Modeling cs.CL · 2016 · author #4
Swivel: Improving Embeddings by Noticing What's Missing cs.CL · 2016 · author #1
End-to-End Text-Dependent Speaker Verification cs.LG · 2015 · author #4
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks cs.LG · 2015 · author #4
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation cs.LG · 2014 · author #1
Variational Program Inference cs.AI · 2010 · author #2

Mentions

1509.08062 #4 · backfill · confidence 0.70 Noam Shazeer
1506.03099 #4 · backfill · confidence 0.70 Noam Shazeer
1412.1454 #1 · backfill · confidence 0.70 Noam Shazeer
2105.04663 #12 · arxiv_oai · confidence 0.70 Noam Shazeer
1006.0991 #2 · backfill · confidence 0.70 Noam Shazeer

Frequent Coauthors

Niki Parmar 8 shared papers
Ashish Vaswani 7 shared papers
Jakob Uszkoreit 7 shared papers
Jeff Dean 5 shared papers
Yanping Huang 5 shared papers
Adam Roberts 4 shared papers
Dmitry Lepikhin 4 shared papers
Dustin Tran 4 shared papers
Maxim Krikun 4 shared papers
Oriol Vinyals 4 shared papers
Ryan Sepassi 4 shared papers
Samy Bengio 4 shared papers
Yuanzhong Xu 4 shared papers
Aidan N. Gomez 3 shared papers
Barret Zoph 3 shared papers
Blake Hechtman 3 shared papers
Dehao Chen 3 shared papers
Douglas Eck 3 shared papers
Erica Moreira 3 shared papers
Etienne Pot 3 shared papers