Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
Pythia: A suite for analyzing large language models across training and scaling
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.
Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
citing papers explorer
-
Detecting Pretraining Data from Large Language Models
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
-
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
REINFORCE-style variants outperform PPO, DPO, and RAFT in RLHF for LLMs by removing unnecessary PPO components and adapting the simpler method to LLM alignment characteristics.
-
Linear Representations of Sentiment in Large Language Models
Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.
-
Efficient Streaming Language Models with Attention Sinks
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.