Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
Norman K Denzin
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
HoldUp uses LLM-guided clustering to provide holistic dataset context for semantic operators, yielding up to 33% higher classification accuracy and 30% higher scoring accuracy than row-by-row LLM processing across 15 datasets.
PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.
AIPsy-Affect supplies 480 keyword-free clinical vignettes and matched neutral controls for mechanistic interpretability studies of emotion in language models.
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
T-FIX operationalizes expert alignment for LLM explanations as an automatic, generalizable evaluation using domain-specific criteria across seven tasks in three domains.
LLMs detect social signals in clinical transcripts across model families, with an agreement-weighted ensemble using group-level agreement patterns improving accuracy and stability over individual models.
citing papers explorer
-
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.
-
AIPsy-Affect: A Keyword-Free Clinical Stimulus Battery for Mechanistic Interpretability of Emotion in Language Models
AIPsy-Affect supplies 480 keyword-free clinical vignettes and matched neutral controls for mechanistic interpretability studies of emotion in language models.
-
T-FIX: Text-Based Explanations with Features Interpretable to eXperts
T-FIX operationalizes expert alignment for LLM explanations as an automatic, generalizable evaluation using domain-specific criteria across seven tasks in three domains.
-
SocialLM: Social Signal Processing of Patient-Provider Communication using LLMs and Contextual Aggregation
LLMs detect social signals in clinical transcripts across model families, with an agreement-weighted ensemble using group-level agreement patterns improving accuracy and stability over individual models.