Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
Norman K Denzin
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
HoldUp uses LLM-guided clustering to provide holistic dataset context for semantic operators, yielding up to 33% higher classification accuracy and 30% higher scoring accuracy than row-by-row LLM processing across 15 datasets.
PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.
TwistedHumor dataset shows dark humor in YouTube Shorts clusters around critique, coping, awkwardness and identity with more mixed and toxic audience reactions than regular humor.
AIPsy-Affect supplies 480 keyword-free clinical vignettes and matched neutral controls for mechanistic interpretability studies of emotion in language models.
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
T-FIX operationalizes expert alignment for LLM explanations as an automatic, generalizable evaluation using domain-specific criteria across seven tasks in three domains.
Introduces a new lyrics dataset and hybrid human-LLM framework for emotion annotation that predicts misalignment to improve efficiency.
LLMs detect social signals in clinical transcripts across model families, with an agreement-weighted ensemble using group-level agreement patterns improving accuracy and stability over individual models.
citing papers explorer
-
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.
-
T-FIX: Text-Based Explanations with Features Interpretable to eXperts
T-FIX operationalizes expert alignment for LLM explanations as an automatic, generalizable evaluation using domain-specific criteria across seven tasks in three domains.
-
SocialLM: Social Signal Processing of Patient-Provider Communication using LLMs and Contextual Aggregation
LLMs detect social signals in clinical transcripts across model families, with an agreement-weighted ensemble using group-level agreement patterns improving accuracy and stability over individual models.