pith. sign in

Noam Shazeer

Identifiers

  • name variant Noam Shazeer 0.60 · backfill

Papers (32)

  1. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025 · author #2478
  2. Gemma 3 Technical Report cs.CL · 2025 · author #198
  3. PaLM: Scaling Language Modeling with Pathways cs.CL · 2022 · author #18
  4. ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022 · author #7
  5. LaMDA: Language Models for Dialog Applications cs.CL · 2022 · author #4
  6. GSPMD: General and Scalable Parallelization for ML Computation Graphs cs.DC · 2021 · author #12
  7. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021 · author #3
  8. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding cs.CL · 2020 · author #8
  9. GLU Variants Improve Transformer cs.LG · 2020 · author #1
  10. How Much Knowledge Can You Pack Into the Parameters of a Language Model? cs.CL · 2020 · author #3
  11. Fast Transformer Decoding: One Write-Head is All You Need cs.NE · 2019 · author #1
  12. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019 · author #2
  13. Corpora Generation for Grammatical Error Correction cs.CL · 2019 · author #4
  14. Blockwise Parallel Decoding for Deep Autoregressive Models cs.LG · 2018 · author #2
  15. Mesh-TensorFlow: Deep Learning for Supercomputers cs.LG · 2018 · author #1
  16. Weakly Supervised Grammatical Error Correction using Iterative Decoding cs.CL · 2018 · author #4
  17. Music Transformer cs.LG · 2018 · author #4
  18. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost cs.LG · 2018 · author #1
  19. Tensor2Tensor for Neural Machine Translation cs.LG · 2018 · author #12
  20. Fast Decoding in Sequence Models using Discrete Latent Variables cs.LG · 2018 · author #7
  21. Image Transformer cs.CV · 2018 · author #5
  22. Generating Wikipedia by Summarizing Long Sequences cs.CL · 2018 · author #7
  23. One Model To Learn Them All cs.LG · 2017 · author #3
  24. Attention Is All You Need cs.CL · 2017 · author #2
  25. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer cs.LG · 2017 · author #1
  26. NN-grams: Unifying neural network and n-gram language models for Speech Recognition cs.CL · 2016 · author #3
  27. Exploring the Limits of Language Modeling cs.CL · 2016 · author #4
  28. Swivel: Improving Embeddings by Noticing What's Missing cs.CL · 2016 · author #1
  29. End-to-End Text-Dependent Speaker Verification cs.LG · 2015 · author #4
  30. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks cs.LG · 2015 · author #4
  31. Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation cs.LG · 2014 · author #1
  32. Variational Program Inference cs.AI · 2010 · author #2

Mentions

  • 1509.08062 #4 · backfill · confidence 0.70 Noam Shazeer
  • 1506.03099 #4 · backfill · confidence 0.70 Noam Shazeer
  • 1412.1454 #1 · backfill · confidence 0.70 Noam Shazeer
  • 2105.04663 #12 · arxiv_oai · confidence 0.70 Noam Shazeer
  • 1006.0991 #2 · backfill · confidence 0.70 Noam Shazeer

Frequent Coauthors