pith. sign in

From Context to Skills: Can Language Models Learn from Context Skillfully?

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.

citation-role summary

background 1

citation-polarity summary

fields

cs.CL 1 cs.LG 1

years

2026 2

roles

background 1

polarities

background 1

representative citing papers

Step-wise Rubric Rewards for LLM Reasoning

cs.LG · 2026-05-17 · conditional · novelty 6.0

SRaR attributes rubric items to specific steps via an LLM judge, normalizes per-step scores across rollouts, and combines them with outcome rewards via a decoupled advantage estimator, yielding 3.57-point accuracy gains on Qwen3-8B across math benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

  • Step-wise Rubric Rewards for LLM Reasoning cs.LG · 2026-05-17 · conditional · none · ref 15 · internal anchor

    SRaR attributes rubric items to specific steps via an LLM judge, normalizes per-step scores across rollouts, and combines them with outcome rewards via a decoupled advantage estimator, yielding 3.57-point accuracy gains on Qwen3-8B across math benchmarks.

  • SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution cs.CL · 2026-05-18 · unverdicted · none · ref 52 · internal anchor

    SkillsVote is a governance system for agent skills that profiles corpora, recommends via search, and gates updates on successful reusable outcomes, yielding benchmark gains without model changes.