pith. sign in

Attention is all you need.Advancesin neural information processing systems, 30

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 3 2025 3

verdicts

UNVERDICTED 6

roles

background 2

polarities

background 2

representative citing papers

Scaling Latent Reasoning via Looped Language Models

cs.CL · 2025-10-29 · unverdicted · novelty 7.0

Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.

Protein Autoregressive Modeling via Multiscale Structure Generation

cs.LG · 2026-02-04 · unverdicted · novelty 6.0

PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.

Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

Seedream 3.0 Technical Report

cs.CV · 2025-04-15 · unverdicted · novelty 4.0

Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.

citing papers explorer

Showing 6 of 6 citing papers.

  • Scaling Latent Reasoning via Looped Language Models cs.CL · 2025-10-29 · unverdicted · none · ref 33

    Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.

  • Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling cs.AI · 2026-03-05 · unverdicted · none · ref 52

    Timer-S1 is a released 8.3B-parameter MoE time series model that achieves state-of-the-art MASE and CRPS scores on GIFT-Eval using serial scaling and Serial-Token Prediction.

  • Protein Autoregressive Modeling via Multiscale Structure Generation cs.LG · 2026-02-04 · unverdicted · none · ref 42

    PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.

  • Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation cs.CV · 2025-05-08 · unverdicted · none · ref 78

    Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

  • Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training cs.CL · 2026-05-06 · unverdicted · none · ref 26 · 2 links

    LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.

  • Seedream 3.0 Technical Report cs.CV · 2025-04-15 · unverdicted · none · ref 23

    Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.