Attention is all you need.Advancesin neural information processing systems, 30

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin · 2017

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Scaling Latent Reasoning via Looped Language Models

cs.CL · 2025-10-29 · unverdicted · novelty 7.0

Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

cs.AI · 2026-03-05 · unverdicted · novelty 6.0

Timer-S1 is a released 8.3B-parameter MoE time series model that achieves state-of-the-art MASE and CRPS scores on GIFT-Eval using serial scaling and Serial-Token Prediction.

Protein Autoregressive Modeling via Multiscale Structure Generation

cs.LG · 2026-02-04 · unverdicted · novelty 6.0

PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.

Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

cs.CL · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.

Seedream 3.0 Technical Report

cs.CV · 2025-04-15 · unverdicted · novelty 4.0

Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.

citing papers explorer

Showing 6 of 6 citing papers.

Scaling Latent Reasoning via Looped Language Models cs.CL · 2025-10-29 · unverdicted · none · ref 33
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling cs.AI · 2026-03-05 · unverdicted · none · ref 52
Timer-S1 is a released 8.3B-parameter MoE time series model that achieves state-of-the-art MASE and CRPS scores on GIFT-Eval using serial scaling and Serial-Token Prediction.
Protein Autoregressive Modeling via Multiscale Structure Generation cs.LG · 2026-02-04 · unverdicted · none · ref 42
PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.
Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation cs.CV · 2025-05-08 · unverdicted · none · ref 78
Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training cs.CL · 2026-05-06 · unverdicted · none · ref 26 · 2 links
LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.
Seedream 3.0 Technical Report cs.CV · 2025-04-15 · unverdicted · none · ref 23
Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.

Attention is all you need.Advancesin neural information processing systems, 30

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer