pith. sign in

hub

Can Large Language Models Be an Alternative to Human Evaluations?

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

years

2026 9 2025 2

roles

background 4

representative citing papers

NARRA-Gym for Evaluating Interactive Narrative Agents

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that static story tests miss.

LLM Advertisement based on Neuron Auctions

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Neuron Auctions auction continuous neuron intervention budgets on brand-specific orthogonal subspaces in LLMs to achieve strategy-proof revenue optimization while penalizing user utility loss.

citing papers explorer

Showing 11 of 11 citing papers.