Fork-think with confidence identifies forking points via model confidence in a single path before sampling continuations, cutting tokens up to 30% and runtime up to 57% on reasoning benchmarks while matching or exceeding parallel thinking performance.
hub
Contrastive decoding: Open-ended text generation as optimization
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 17roles
background 2polarities
background 2representative citing papers
10.3-22.9% of pass@k=0 math examples across GSM8K and MATH are recovered by a deterministic six-chain regime using activation grafting, showing a sampling blind spot in difficulty estimation.
Logit composition of autoregressive models is projective under factorized conditionals, preserved under smooth reparameterizations, and maintains length generalization when assumptions hold uniformly.
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
ESamp trains a test-time distiller to model LLM depth-wise representation transitions and biases decoding toward high prediction-error paths to increase semantic diversity.
TriMix dynamically fuses logits from three model sources to outperform baselines and Proxy Tuning on eight low-resource languages across four model families.
An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.
IHDec applies JSD-steered contrastive decoding to enforce multi-turn instruction hierarchies in LLMs without fine-tuning.
VCM is a training-free decoding intervention that applies PMI-driven token elevation and variance-adaptive penalization to reduce repetitive degeneration in LLM open-ended generation.
ALMs encode audio evidence but override it with text in conflicts; GACL interpolates joint and same-audio scores to repair reversals, gaining 17.8 nAUC points under a 5pp faithfulness budget.
Grounded Decoding fuses full-RAG and retrieval-only next-token distributions via normalized geometric mean from a KL-barycenter to improve factual consistency and citation quality in RAG.
MAGS learns low-dimensional subspaces from correct versus incorrect reasoning traces and applies targeted projection corrections to attention heads when they deviate from the correctness manifold during inference.
QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.
Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
GRAB improves multi-table QA performance by encoding relational data as graphs and bridging structural signals to frozen LLMs through latent tokens.
DCO is an inference-time intervention that decomposes attention head outputs orthogonally to a dynamic context anchor and suppresses outlier components via Z-score to improve contextual faithfulness in Llama models.
LightEdit enables scalable lifelong knowledge editing in LLMs via selective knowledge retrieval and probability suppression during decoding, outperforming prior methods on ZSRE, Counterfact, and RIPE while reducing training costs.
citing papers explorer
-
Sampling from Your Language Model One Byte at a Time
An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.