PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
Prewrite search: A reinforcement learning approach to query rewriting
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
method 1polarities
use method 1representative citing papers
S^3-R1 generates synthetic intermediate-difficulty multi-hop questions and applies dense rewards for search quality plus answer correctness, yielding up to 10% better out-of-domain generalization than baselines.
LLM use for essay writing correlates with reduced brain network connectivity, lower self-reported ownership, and poorer recall of one's own content compared to unaided or search-based writing.
AIR excels on label-remapping classification tasks while KNN retrieval leads on closed-book QA and fine-tuning leads on structured extraction and event-order reasoning, showing task-dependent adaptation performance.
citing papers explorer
-
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses
PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
-
$S^3$-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data
S^3-R1 generates synthetic intermediate-difficulty multi-hop questions and applies dense rewards for search quality plus answer correctness, yielding up to 10% better out-of-domain generalization than baselines.
-
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
LLM use for essay writing correlates with reduced brain network connectivity, lower self-reported ownership, and poorer recall of one's own content compared to unaided or search-based writing.
-
Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM
AIR excels on label-remapping classification tasks while KNN retrieval leads on closed-book QA and fine-tuning leads on structured extraction and event-order reasoning, showing task-dependent adaptation performance.