pith. machine review for the scientific record. sign in

Noam Shazeer

Identifiers

No identifiers captured yet.

Papers (31)

  1. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025 · author #2478
  2. Gemma 3 Technical Report cs.CL · 2025 · author #198
  3. PaLM: Scaling Language Modeling with Pathways cs.CL · 2022 · author #18
  4. ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022 · author #7
  5. LaMDA: Language Models for Dialog Applications cs.CL · 2022 · author #4
  6. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021 · author #3
  7. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding cs.CL · 2020 · author #8
  8. GLU Variants Improve Transformer cs.LG · 2020 · author #1
  9. How Much Knowledge Can You Pack Into the Parameters of a Language Model? cs.CL · 2020 · author #3
  10. Fast Transformer Decoding: One Write-Head is All You Need cs.NE · 2019 · author #1
  11. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019 · author #2
  12. Corpora Generation for Grammatical Error Correction cs.CL · 2019 · author #4
  13. Blockwise Parallel Decoding for Deep Autoregressive Models cs.LG · 2018 · author #2
  14. Mesh-TensorFlow: Deep Learning for Supercomputers cs.LG · 2018 · author #1
  15. Weakly Supervised Grammatical Error Correction using Iterative Decoding cs.CL · 2018 · author #4
  16. Music Transformer cs.LG · 2018 · author #4
  17. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost cs.LG · 2018 · author #1
  18. Tensor2Tensor for Neural Machine Translation cs.LG · 2018 · author #12
  19. Fast Decoding in Sequence Models using Discrete Latent Variables cs.LG · 2018 · author #7
  20. Image Transformer cs.CV · 2018 · author #5
  21. Generating Wikipedia by Summarizing Long Sequences cs.CL · 2018 · author #7
  22. One Model To Learn Them All cs.LG · 2017 · author #3
  23. Attention Is All You Need cs.CL · 2017 · author #2
  24. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer cs.LG · 2017 · author #1
  25. NN-grams: Unifying neural network and n-gram language models for Speech Recognition cs.CL · 2016 · author #3
  26. Exploring the Limits of Language Modeling cs.CL · 2016 · author #4
  27. Swivel: Improving Embeddings by Noticing What's Missing cs.CL · 2016 · author #1
  28. End-to-End Text-Dependent Speaker Verification cs.LG · 2015 · author #4
  29. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks cs.LG · 2015 · author #4
  30. Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation cs.LG · 2014 · author #1
  31. Variational Program Inference cs.AI · 2010 · author #2

Mentions

No mention provenance yet.

Frequent Coauthors