MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.
hub
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.
Develops an information-theoretic framework showing surprise and coherence trade off in single reader models but coexist via pre- and post-revelation modes, operationalized as reference-less LLM metrics for fair play and validated on generated stories plus classic detective fiction.
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
CoRP consolidates reward-weighted perturbations into a single model via low-rank structure, improving base LLMs by 8.1 points on average while using one-tenth the budget of prior ensembles and one forward pass.
MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.
QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.
Broad empirical evaluation finds that fine-tuning heuristics for source-language choice in cross-lingual transfer do not hold reliably under in-context learning.
LANG combines language-adaptive hint guidance, progressive decay, and difficulty-tailored learning horizons in RL to boost non-English reasoning performance while preserving language consistency.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
citing papers explorer
-
The Challenge and Reward of Fair Play in Narrative: A Computational Approach
Develops an information-theoretic framework showing surprise and coherence trade off in single reader models but coexist via pre- and post-revelation modes, operationalized as reference-less LLM metrics for fair play and validated on generated stories plus classic detective fiction.