hub

Improving language understanding by generative pre-training

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al · 2018

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

browse 15 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

NEST: Nested Event Stream Transformer for Sequences of Multisets

cs.LG · 2026-01-31 · unverdicted · novelty 7.0

NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

cs.CL · 2024-06-12 · unverdicted · novelty 7.0

Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

cs.CL · 2023-06-05 · unverdicted · novelty 7.0

RepoBench is a new benchmark with retrieval, completion, and pipeline tasks to evaluate code auto-completion systems on entire repositories instead of single files.

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.

UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces

cs.NE · 2026-04-30 · unverdicted · novelty 6.0

UniBCI is a unified pretrained model for invasive neural spike data that uses CST tokenization, IAA attention, and self-supervised masked reconstruction to achieve SOTA downstream performance with better generalization and efficiency.

Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM

cs.CV · 2025-05-23 · unverdicted · novelty 6.0

Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.

MMaDA: Multimodal Large Diffusion Language Models

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

MMaDA is a unified multimodal diffusion model using mixed chain-of-thought fine-tuning and a new UniGRPO reinforcement learning algorithm that outperforms specialized models in reasoning, understanding, and text-to-image tasks.

TrainMover: An Interruption-Resilient Runtime for ML Training

cs.DC · 2024-12-17 · unverdicted · novelty 6.0

TrainMover achieves ~20s downtime for interruptions in 1024-GPU LLM training via two-phase delta-based communication setup, communication-free sandboxed warmup, and general standby design, projecting 55% reduction in wasted GPU hours.

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

cs.CV · 2024-02-24 · unverdicted · novelty 6.0

NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

cs.RO · 2023-12-20 · conditional · novelty 6.0

A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.

Efficient Streaming Language Models with Attention Sinks

cs.CL · 2023-09-29 · accept · novelty 6.0

StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.

Lossless Anti-Distillation Sampling

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

cs.AI · 2023-08-10 · accept · novelty 5.0

Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

Comparative Analysis of Large Language Models in Healthcare

cs.CL · 2026-04-11 · unverdicted · novelty 3.0

Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.

citing papers explorer

Showing 15 of 15 citing papers.

NEST: Nested Event Stream Transformer for Sequences of Multisets cs.LG · 2026-01-31 · unverdicted · none · ref 27
NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing cs.CL · 2024-06-12 · unverdicted · none · ref 134
Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems cs.CL · 2023-06-05 · unverdicted · none · ref 33
RepoBench is a new benchmark with retrieval, completion, and pipeline tasks to evaluate code auto-completion systems on entire repositories instead of single files.
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer cs.LG · 2026-05-07 · unverdicted · none · ref 34
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists cs.AI · 2026-04-30 · unverdicted · none · ref 12
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces cs.NE · 2026-04-30 · unverdicted · none · ref 40
UniBCI is a unified pretrained model for invasive neural spike data that uses CST tokenization, IAA attention, and self-supervised masked reconstruction to achieve SOTA downstream performance with better generalization and efficiency.
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM cs.CV · 2025-05-23 · unverdicted · none · ref 54
Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.
MMaDA: Multimodal Large Diffusion Language Models cs.CV · 2025-05-21 · unverdicted · none · ref 1
MMaDA is a unified multimodal diffusion model using mixed chain-of-thought fine-tuning and a new UniGRPO reinforcement learning algorithm that outperforms specialized models in reasoning, understanding, and text-to-image tasks.
TrainMover: An Interruption-Resilient Runtime for ML Training cs.DC · 2024-12-17 · unverdicted · none · ref 29
TrainMover achieves ~20s downtime for interruptions in 1024-GPU LLM training via two-phase delta-based communication setup, communication-free sandboxed warmup, and general standby design, projecting 55% reduction in wasted GPU hours.
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation cs.CV · 2024-02-24 · unverdicted · none · ref 75
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation cs.RO · 2023-12-20 · conditional · none · ref 43
A GPT-style model pre-trained on large video datasets achieves 94.9% success on CALVIN multi-task manipulation and 85.4% zero-shot generalization, outperforming prior baselines.
Efficient Streaming Language Models with Attention Sinks cs.CL · 2023-09-29 · accept · none · ref 41
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
Lossless Anti-Distillation Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 113
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment cs.AI · 2023-08-10 · accept · none · ref 34
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
Comparative Analysis of Large Language Models in Healthcare cs.CL · 2026-04-11 · unverdicted · none · ref 22
Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.

Improving language understanding by generative pre-training

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer