PURE reduces preference-inconsistent explanations in LLM recommenders by selecting user-aligned evidence paths and injecting them into generation, while preserving accuracy.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
B4DL provides a new benchmark, scalable data generation pipeline, and MLLM architecture for direct spatio-temporal reasoning on raw 4D LiDAR data.
Mixed-methods study maps downstream developers' concerns, practices, and challenges with AI failures in PTM-based software.
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.
citing papers explorer
-
Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation
PURE reduces preference-inconsistent explanations in LLM recommenders by selecting user-aligned evidence paths and injecting them into generation, while preserving accuracy.
-
B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding
B4DL provides a new benchmark, scalable data generation pipeline, and MLLM architecture for direct spatio-temporal reasoning on raw 4D LiDAR data.
-
AI Failures in the Eyes of the Downstream Developer: A First Look at Concerns, Practices, and Challenges
Mixed-methods study maps downstream developers' concerns, practices, and challenges with AI failures in PTM-based software.
-
A Survey on LLM-as-a-Judge
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
-
A Survey on Large Language Models for Code Generation
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.