Scaling laws for generative mixed-modal language models.arXiv preprint arXiv:2301.03728

· 2023 · arXiv 2301.03728

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Finite Scalar Quantization: VQ-VAE Made Simple

cs.CV · 2023-09-27 · conditional · novelty 7.0

Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.

Chameleon: Mixed-Modal Early-Fusion Foundation Models

cs.CL · 2024-05-16 · unverdicted · novelty 6.0

Chameleon is an early-fusion token model that handles mixed image-text sequences for understanding and generation, achieving competitive or superior performance to larger models like Llama-2, Mixtral, and Gemini-Pro on captioning, VQA, text, and image tasks.

The Falcon Series of Open Language Models

cs.CL · 2023-11-28 · conditional · novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

Scaling Data-Constrained Language Models

cs.CL · 2023-05-25 · conditional · novelty 6.0

Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

cs.LG · 2023-03-16 · unverdicted · novelty 6.0

SemDeDup removes semantic duplicates from datasets like LAION using pre-trained embeddings, cutting data by 50% with minimal performance loss and efficiency gains on C4.

citing papers explorer

Showing 5 of 5 citing papers.

Finite Scalar Quantization: VQ-VAE Made Simple cs.CV · 2023-09-27 · conditional · none · ref 2
Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.
Chameleon: Mixed-Modal Early-Fusion Foundation Models cs.CL · 2024-05-16 · unverdicted · none · ref 2
Chameleon is an early-fusion token model that handles mixed image-text sequences for understanding and generation, achieving competitive or superior performance to larger models like Llama-2, Mixtral, and Gemini-Pro on captioning, VQA, text, and image tasks.
The Falcon Series of Open Language Models cs.CL · 2023-11-28 · conditional · none · ref 237
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
Scaling Data-Constrained Language Models cs.CL · 2023-05-25 · conditional · none · ref 1
Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.
SemDeDup: Data-efficient learning at web-scale through semantic deduplication cs.LG · 2023-03-16 · unverdicted · none · ref 12
SemDeDup removes semantic duplicates from datasets like LAION using pre-trained embeddings, cutting data by 50% with minimal performance loss and efficiency gains on C4.

Scaling laws for generative mixed-modal language models.arXiv preprint arXiv:2301.03728

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer