pith. sign in

Scaling laws for generative mixed-modal language models.arXiv preprint arXiv:2301.03728

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2024 1 2023 4

roles

background 2

polarities

background 2

representative citing papers

Finite Scalar Quantization: VQ-VAE Made Simple

cs.CV · 2023-09-27 · conditional · novelty 7.0

Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.

Chameleon: Mixed-Modal Early-Fusion Foundation Models

cs.CL · 2024-05-16 · unverdicted · novelty 6.0

Chameleon is an early-fusion token model that handles mixed image-text sequences for understanding and generation, achieving competitive or superior performance to larger models like Llama-2, Mixtral, and Gemini-Pro on captioning, VQA, text, and image tasks.

The Falcon Series of Open Language Models

cs.CL · 2023-11-28 · conditional · novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

Scaling Data-Constrained Language Models

cs.CL · 2023-05-25 · conditional · novelty 6.0

Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

citing papers explorer

Showing 5 of 5 citing papers.

  • Finite Scalar Quantization: VQ-VAE Made Simple cs.CV · 2023-09-27 · conditional · none · ref 2

    Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.

  • Chameleon: Mixed-Modal Early-Fusion Foundation Models cs.CL · 2024-05-16 · unverdicted · none · ref 2

    Chameleon is an early-fusion token model that handles mixed image-text sequences for understanding and generation, achieving competitive or superior performance to larger models like Llama-2, Mixtral, and Gemini-Pro on captioning, VQA, text, and image tasks.

  • The Falcon Series of Open Language Models cs.CL · 2023-11-28 · conditional · none · ref 237

    Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

  • Scaling Data-Constrained Language Models cs.CL · 2023-05-25 · conditional · none · ref 1

    Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

  • SemDeDup: Data-efficient learning at web-scale through semantic deduplication cs.LG · 2023-03-16 · unverdicted · none · ref 12

    SemDeDup removes semantic duplicates from datasets like LAION using pre-trained embeddings, cutting data by 50% with minimal performance loss and efficiency gains on C4.