Noam Shazeer
Identifiers
No identifiers captured yet.
Papers (31)
- Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025 · author #2478
- Gemma 3 Technical Report cs.CL · 2025 · author #198
- PaLM: Scaling Language Modeling with Pathways cs.CL · 2022 · author #18
- ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022 · author #7
- LaMDA: Language Models for Dialog Applications cs.CL · 2022 · author #4
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021 · author #3
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding cs.CL · 2020 · author #8
- GLU Variants Improve Transformer cs.LG · 2020 · author #1
- How Much Knowledge Can You Pack Into the Parameters of a Language Model? cs.CL · 2020 · author #3
- Fast Transformer Decoding: One Write-Head is All You Need cs.NE · 2019 · author #1
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019 · author #2
- Corpora Generation for Grammatical Error Correction cs.CL · 2019 · author #4
- Blockwise Parallel Decoding for Deep Autoregressive Models cs.LG · 2018 · author #2
- Mesh-TensorFlow: Deep Learning for Supercomputers cs.LG · 2018 · author #1
- Weakly Supervised Grammatical Error Correction using Iterative Decoding cs.CL · 2018 · author #4
- Music Transformer cs.LG · 2018 · author #4
- Adafactor: Adaptive Learning Rates with Sublinear Memory Cost cs.LG · 2018 · author #1
- Tensor2Tensor for Neural Machine Translation cs.LG · 2018 · author #12
- Fast Decoding in Sequence Models using Discrete Latent Variables cs.LG · 2018 · author #7
- Image Transformer cs.CV · 2018 · author #5
- Generating Wikipedia by Summarizing Long Sequences cs.CL · 2018 · author #7
- One Model To Learn Them All cs.LG · 2017 · author #3
- Attention Is All You Need cs.CL · 2017 · author #2
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer cs.LG · 2017 · author #1
- NN-grams: Unifying neural network and n-gram language models for Speech Recognition cs.CL · 2016 · author #3
- Exploring the Limits of Language Modeling cs.CL · 2016 · author #4
- Swivel: Improving Embeddings by Noticing What's Missing cs.CL · 2016 · author #1
- End-to-End Text-Dependent Speaker Verification cs.LG · 2015 · author #4
- Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks cs.LG · 2015 · author #4
- Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation cs.LG · 2014 · author #1
- Variational Program Inference cs.AI · 2010 · author #2
Mentions
No mention provenance yet.
Frequent Coauthors
- Niki Parmar 8 shared papers
- Ashish Vaswani 7 shared papers
- Jakob Uszkoreit 7 shared papers
- Jeff Dean 5 shared papers
- Adam Roberts 4 shared papers
- Dustin Tran 4 shared papers
- Oriol Vinyals 4 shared papers
- Ryan Sepassi 4 shared papers
- Samy Bengio 4 shared papers
- Yanping Huang 4 shared papers
- Aidan N. Gomez 3 shared papers
- Barret Zoph 3 shared papers
- Dmitry Lepikhin 3 shared papers
- Douglas Eck 3 shared papers
- Erica Moreira 3 shared papers
- Etienne Pot 3 shared papers
- Jared Lichtarge 3 shared papers
- Katherine Lee 3 shared papers
- Llion Jones 3 shared papers
- {\L}ukasz Kaiser 3 shared papers