hub

B y T 5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel · 2022 · DOI 10.1162/tacl_a_00461

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

cs.CL · 2025-07-17 · unverdicted · novelty 7.0

FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.

Sampling from Your Language Model One Byte at a Time

cs.CL · 2025-06-17 · unverdicted · novelty 7.0

An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

cs.CL · 2024-04-29 · conditional · novelty 7.0

A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

cs.CL · 2024-01-27 · accept · novelty 7.0

MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.

Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL · 2022-04-14 · accept · novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

The Efficiency Gap in Byte Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

cs.CL · 2023-08-07 · unverdicted · novelty 5.0

LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

cs.CL · 2026-05-18 · unverdicted · novelty 3.0

MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.

Compute Optimal Tokenization

cs.CL · 2026-05-02

citing papers explorer

Showing 10 of 10 citing papers.

FLEXITOKENS: Flexible Tokenization for Evolving Language Models cs.CL · 2025-07-17 · unverdicted · none · ref 12
FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.
Sampling from Your Language Model One Byte at a Time cs.CL · 2025-06-17 · unverdicted · none · ref 81
An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models cs.CL · 2024-04-29 · conditional · none · ref 298
A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries cs.CL · 2024-01-27 · accept · none · ref 297
MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.
Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents cs.CL · 2026-04-20 · unverdicted · none · ref 291
Decisive combines document-grounded option scoring with adaptive Bayesian preference elicitation to achieve up to 20% higher decision accuracy than LLMs and existing frameworks across domains.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model cs.CL · 2022-04-14 · accept · none · ref 104
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
The Efficiency Gap in Byte Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 25
Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 63
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.
MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent cs.CL · 2026-05-18 · unverdicted · none · ref 252
MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.
Compute Optimal Tokenization cs.CL · 2026-05-02 · unreviewed · ref 47

B y T 5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer