LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.
arXiv preprint arXiv:2409.13884 (2024)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.
Survey evidence shows the software engineering community believes academia must shift from bystander to active partner with industry to boost research impact and relevance.
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
Discriminatory Compliance: How LLMs Answer Queries from Protected Groups
State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.