hub

Automatic

Pryzant, Reid, Iter, Dan, Li, Jerry, Lee, Yin, Zhu, Chenguang, Zeng, Michael · 2023 · DOI 10.18653/v1/2023.emnlp-main.494

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

open at publisher browse 13 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

cs.SE · 2026-05-04 · unverdicted · novelty 7.0

TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.

CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

GrowLoop proposes a human-seeded self-evolving framework that co-evolves rubrics and cases to evaluate conversational human-likeness with differentiated agreement rules.

iPOE: Interpretable Prompt Optimization via Explanations

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

iPOE generates and optimizes annotation guidelines from explanations to produce interpretable prompts, reporting up to 39% gains over baselines on four datasets with LLM explanations substituting for human ones.

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.

Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion

cs.LG · 2025-12-23 · unverdicted · novelty 6.0

BrainROI achieves leading cross-subject brain-captioning results on NSD by combining multi-atlas soft-ROI fusion with interpretable prompt optimization.

A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization

cs.CL · 2026-06-29 · unverdicted · novelty 5.0

A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.

GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems

cs.MA · 2026-06-26 · unverdicted · novelty 5.0

GBC treats multi-agent LLM workflows as differentiable graphs to enable token-level attribution and targeted optimization, with reported gains on MultiWOZ and τ-bench.

NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems

cs.IR · 2026-06-25 · unverdicted · novelty 5.0 · 2 refs

NOVA introduces a level-aware agent harness with architecture gradient and verification cascade to automate recommender architecture evolution while reducing silent failures and human effort.

What Should a Skill Remember? Quality--Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents

cs.CL · 2026-06-08 · unverdicted · novelty 5.0

Empirical study demonstrates that cost-aware skill rewriting for LLM agents can achieve 7% total cost reduction and 6% agent-token cost reduction with preserved quality on SkillsBench.

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

cs.AI · 2026-04-20 · unverdicted · novelty 5.0

JTPRO co-optimizes prompts and tool descriptions via reflection to raise overall success rate by 5-20% over baselines on multi-tool benchmarks.

Improving Medical Communication using Rubric-Guided Counterfactual Recommendations

cs.CL · 2026-06-17 · unverdicted · novelty 4.0

An LM-guided counterfactual pipeline recommends minimal ordinal changes to communication features like tone and actionability, yielding a mean +6.41% gain in predicted positive feedback under independent auditor models.

citing papers explorer

Showing 11 of 11 citing papers after filters.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments cs.SE · 2026-05-04 · unverdicted · none · ref 19
TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts cs.CL · 2026-06-03 · unverdicted · none · ref 15
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human cs.CL · 2026-05-26 · unverdicted · none · ref 30
GrowLoop proposes a human-seeded self-evolving framework that co-evolves rubrics and cases to evaluate conversational human-likeness with differentiated agreement rules.
iPOE: Interpretable Prompt Optimization via Explanations cs.CL · 2026-05-18 · unverdicted · none · ref 34
iPOE generates and optimizes annotation guidelines from explanations to produce interpretable prompts, reporting up to 39% gains over baselines on four datasets with LLM explanations substituting for human ones.
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures cs.CL · 2026-05-15 · unverdicted · none · ref 2
PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.
A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization cs.CL · 2026-06-29 · unverdicted · none · ref 180
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems cs.MA · 2026-06-26 · unverdicted · none · ref 5
GBC treats multi-agent LLM workflows as differentiable graphs to enable token-level attribution and targeted optimization, with reported gains on MultiWOZ and τ-bench.
NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems cs.IR · 2026-06-25 · unverdicted · none · ref 18 · 2 links
NOVA introduces a level-aware agent harness with architecture gradient and verification cascade to automate recommender architecture evolution while reducing silent failures and human effort.
What Should a Skill Remember? Quality--Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents cs.CL · 2026-06-08 · unverdicted · none · ref 11
Empirical study demonstrates that cost-aware skill rewriting for LLM agents can achieve 7% total cost reduction and 6% agent-token cost reduction with preserved quality on SkillsBench.
JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents cs.AI · 2026-04-20 · unverdicted · none · ref 13
JTPRO co-optimizes prompts and tool descriptions via reflection to raise overall success rate by 5-20% over baselines on multi-tool benchmarks.
Improving Medical Communication using Rubric-Guided Counterfactual Recommendations cs.CL · 2026-06-17 · unverdicted · none · ref 46
An LM-guided counterfactual pipeline recommends minimal ordinal changes to communication features like tone and actionability, yielding a mean +6.41% gain in predicted positive feedback under independent auditor models.

Automatic

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer