pith. sign in

hub

Unifying language learning paradigms.arXiv preprint arXiv:2205.05131, 2022a

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

hub tools

citation-role summary

background 3 method 1

citation-polarity summary

clear filters

representative citing papers

Large Language Model Selection with Limited Annotations

cs.CL · 2026-05-24 · unverdicted · novelty 7.0

SELECT-LLM is the first active model selection framework for LLMs that uses expected information gain from pairwise output similarities to minimize required annotations, reporting up to 84.78% cost reduction across 23 datasets and 156 models.

UniMaia: Steering Chess Policies with Language for Human-like Play

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

UniMaia demonstrates feasible prompt-conditioned control of a frozen Lc0 chess policy network via text encoder and conditioning, achieving SOTA expected accuracy on prompt benchmarks while remaining competitive on human move prediction.

S^2tory: Story Spine Distillation for Movie Script Summarization

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

S^2tory uses narratological theory and a Narrative Expert Agent to identify plot nuclei in movie scripts for high-fidelity summarization at 3.5x compression, with strong zero-shot generalization to books.

Scaling Data-Constrained Language Models

cs.CL · 2023-05-25 · conditional · novelty 6.0

Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

cs.CL · 2023-05-13 · conditional · novelty 6.0

CodeT5+ is a flexible encoder-decoder LLM family for code pretrained with diverse objectives on multilingual corpora and initialized from existing LLMs, achieving state-of-the-art results on code generation, completion, math programming, and retrieval tasks including new SoTA on HumanEval with the 1

BloombergGPT: A Large Language Model for Finance

cs.LG · 2023-03-30 · conditional · novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

Open-Sora: Democratizing Efficient Video Production for All

cs.CV · 2024-12-29 · unverdicted · novelty 5.0

Open-Sora releases an open-source video generation model based on a Spatial-Temporal Diffusion Transformer that decouples spatial and temporal attention, supporting text-to-video, image-to-video, and text-to-image tasks with claimed high fidelity.

Movie Gen: A Cast of Media Foundation Models

cs.CV · 2024-10-17 · unverdicted · novelty 5.0

A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.

StarCoder: may the source be with you!

cs.CL · 2023-05-09 · accept · novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

Large Language Models: A Survey

cs.CL · 2024-02-09 · accept · novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Better & Faster Large Language Models via Multi-token Prediction cs.CL · 2024-04-30 · conditional · none · ref 14

    Multi-token prediction training yields higher sample efficiency, better benchmark scores on code generation, and up to 3x faster inference than standard next-token prediction for LLMs.

  • Scaling Data-Constrained Language Models cs.CL · 2023-05-25 · conditional · none · ref 114

    Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

  • CodeT5+: Open Code Large Language Models for Code Understanding and Generation cs.CL · 2023-05-13 · conditional · none · ref 28

    CodeT5+ is a flexible encoder-decoder LLM family for code pretrained with diverse objectives on multilingual corpora and initialized from existing LLMs, achieving state-of-the-art results on code generation, completion, math programming, and retrieval tasks including new SoTA on HumanEval with the 1

  • BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 116

    BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

  • The Flan Collection: Designing Data and Methods for Effective Instruction Tuning cs.AI · 2023-01-31 · conditional · none · ref 57

    The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.