Title resolution pending

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al · 2022

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Towards Agentic Runtime Healing

cs.SE · 2024-08-02 · unverdicted · novelty 7.0

Healer uses LLMs to dynamically generate and execute runtime error-handling code, with GPT-4 recovering from 72.8% of errors across four datasets.

When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

cs.SE · 2026-04-10 · unverdicted · novelty 6.0

LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

cs.DB · 2026-04-09 · unverdicted · novelty 6.0

GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.

Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

cs.CY · 2025-02-26 · unverdicted · novelty 6.0

LLMs exhibit identity-dependent hedging on human rights questions, with group identity as the strongest predictor among tested factors, and group steering mitigates the disparity.

Empirical Evaluation of PDF Parsing and Chunking for Financial Question Answering with RAG

cs.CL · 2026-04-13 · unverdicted · novelty 5.0

Systematic tests show that specific PDF parsers combined with overlapping chunking strategies better preserve structure and improve RAG answer correctness on financial QA benchmarks including the new TableQuest dataset.

SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation

cs.IR · 2025-08-01 · unverdicted · novelty 5.0

SPRINT refines LLM-generated intents for session-based recommendation via a global intent pool, performance validation, selective LLM invocation during training, and a lightweight intent predictor for scalable inference without LLM calls.

RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care

cs.HC · 2025-02-09 · unverdicted · novelty 5.0

RECOVER is an LLM-powered RPM system for postoperative GI cancer care, built from 7 participatory design sessions and 5 patient interviews, then piloted with 4 staff and 5 patients to derive design strategies and responsible AI insights.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

A Survey on Large Language Models for Code Generation

cs.CL · 2024-06-01 · unverdicted · novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

citing papers explorer

Showing 9 of 9 citing papers.

Towards Agentic Runtime Healing cs.SE · 2024-08-02 · unverdicted · none · ref 57
Healer uses LLMs to dynamically generate and execute runtime error-handling code, with GPT-4 recovering from 72.8% of errors across four datasets.
When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation cs.SE · 2026-04-10 · unverdicted · none · ref 47
LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization cs.DB · 2026-04-09 · unverdicted · none · ref 73
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights cs.CY · 2025-02-26 · unverdicted · none · ref 55
LLMs exhibit identity-dependent hedging on human rights questions, with group identity as the strongest predictor among tested factors, and group steering mitigates the disparity.
Empirical Evaluation of PDF Parsing and Chunking for Financial Question Answering with RAG cs.CL · 2026-04-13 · unverdicted · none · ref 45
Systematic tests show that specific PDF parsers combined with overlapping chunking strategies better preserve structure and improve RAG answer correctness on financial QA benchmarks including the new TableQuest dataset.
SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation cs.IR · 2025-08-01 · unverdicted · none · ref 46
SPRINT refines LLM-generated intents for session-based recommendation via a global intent pool, performance validation, selective LLM invocation during training, and a lightweight intent predictor for scalable inference without LLM calls.
RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care cs.HC · 2025-02-09 · unverdicted · none · ref 99
RECOVER is an LLM-powered RPM system for postoperative GI cancer care, built from 7 participatory design sessions and 5 patient interviews, then piloted with 4 staff and 5 patients to derive design strategies and responsible AI insights.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 251
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 283
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer