pith. sign in

hub

" * write output.state after.block = add.period write newline

78 Pith papers cite this work. Polarity classification is still indexing.

78 Pith papers citing it

hub tools

citation-role summary

method 1

citation-polarity summary

claims ledger

  • method It compares the model's prediction for the entire trajec- tory against the ground-truth trajectory label, Ltraj(s, a): Ltraj = LBCE  Rϕ(s, a | x), Rtraj  (12) where σ(·) denotes the sigmoid function, which converts the model's raw logit outputs into probabilities. LBCE(·, ·) denotes the BCE loss function. For a ground-truth label L ∈ { 0, 1} and a model logit output Rϕ, it is defined as LBCE(Rϕ, L) = −[L log σ(Rϕ) + (1− L) log(1 − σ(Rϕ))], By jointly optimizing this objective, Fin-PRM is train

co-cited works

roles

method 1

polarities

use method 1

representative citing papers

Dynamic Tool Dependency Retrieval for Lightweight Function Calling

cs.LG · 2025-12-18 · unverdicted · novelty 7.0

DTDR dynamically retrieves relevant tools by modeling dependencies from demonstrations and conditioning on the evolving agent plan, improving function calling success rates by 23-104% over static retrievers across benchmarks.

Incremental Data-Driven Policy Synthesis via Game Abstractions

cs.GT · 2025-11-14 · unverdicted · novelty 7.0

An incremental rank-lifting algorithm updates winning regions and policies in data-driven stochastic game abstractions by exploiting monotonic growth of under-approximations and shrinkage of over-approximations.

TRAM: Test-Time Risk Adaptation with Mixture of Agents

cs.LG · 2024-08-16 · unverdicted · novelty 7.0

TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

cs.CL · 2025-09-07 · unverdicted · novelty 6.0

Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

cs.LG · 2025-08-27 · conditional · novelty 6.0

GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.

citing papers explorer

Showing 50 of 78 citing papers.