hub

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel · 2022 · DOI 10.1162/tacl_a_00461

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open at publisher browse 17 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

LangMAP: A Language-Adaptive Approach to Tokenization

cs.CL · 2026-06-22 · unverdicted · novelty 7.0

LangMAP adapts UnigramLM for multilingual use to deliver language-specific tokenization from a shared vocabulary, boosting boundary alignment metrics across natural and programming languages with mixed downstream fine-tuning gains.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

cs.CL · 2026-06-05 · unverdicted · novelty 7.0

Prefix gain measured via student-model solve-rate improvement is used to train a Prefix Utility Model (PUM) that supplies stronger supervision than correctness-based process rewards for mathematical reasoning.

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

cs.CL · 2025-07-17 · unverdicted · novelty 7.0

FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.

Sampling from Your Language Model One Byte at a Time

cs.CL · 2025-06-17 · unverdicted · novelty 7.0

An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

cs.CL · 2024-04-29 · conditional · novelty 7.0

A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

cs.CL · 2024-01-27 · accept · novelty 7.0

MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.

Compute Optimal Tokenization

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

In compute-optimal regimes, language model parameter count scales proportionally with data bytes rather than tokens, and the optimal compression rate decreases with increasing compute.

Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL · 2022-04-14 · accept · novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

ProbScale finds layer subsets in SLMs like RoBERTa-Large and T5-Base that cut parameters 5-10x while retaining 95-98% of original task performance by maximizing aggregated probe scores under a budget.

The Efficiency Gap in Byte Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

cs.CL · 2023-08-07 · unverdicted · novelty 5.0

LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification

cs.CL · 2026-06-25 · unverdicted · novelty 4.0

LLM-extracted patterns merging logical structures and linguistic cues yield statistically significant gains in fallacy classification over zero-shot baselines with cross-dataset generalization.

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

cs.CL · 2026-05-18 · unverdicted · novelty 3.0

MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

cs.CL · 2026-06-24 · unverdicted · novelty 2.0

Fine-tuned PEGASUS achieves state-of-the-art ROUGE scores on XL-Sum English corpus with 4.04% ROUGE-1, 15.25% ROUGE-2, and 3.39% ROUGE-L gains over mT5 baseline.

citing papers explorer

Showing 1 of 1 citing paper after filters.

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 63
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer