pith. sign in

CoRR abs/2303.17568 (2023) , author=

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

citation-role summary

background 2 dataset 2

citation-polarity summary

clear filters

representative citing papers

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

InternLM2 Technical Report

cs.CL · 2024-03-26 · unverdicted · novelty 5.0

InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.

StarCoder: may the source be with you!

cs.CL · 2023-05-09 · accept · novelty 5.0 · 2 refs

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation cs.CL · 2023-12-20 · accept · none · ref 43

    A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.

  • Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling cs.CL · 2026-06-26 · unverdicted · none · ref 12

    LPES uses per-layer scaling factors optimized by a genetic algorithm with Bézier curves to balance attention and improve long-context LLM performance by up to 11.2% on key-value retrieval.

  • InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 205

    InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.

  • StarCoder: may the source be with you! cs.CL · 2023-05-09 · accept · none · ref 117 · 2 links

    StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

  • A Survey of Large Language Models cs.CL · 2023-03-31 · accept · none · ref 94

    This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.