arXiv preprint arXiv:2409.13884 (2024)

Owens, Deonna M · 2024 · arXiv 2409.13884

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process

cs.CL · 2025-12-29 · unverdicted · novelty 5.0

LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.

Discriminatory Compliance: How LLMs Answer Queries from Protected Groups

cs.CY · 2026-06-19 · unverdicted · novelty 4.0

State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.

Be a Partner, not a Bystander in Software Engineering Practice: Bridging the Gaps between Academia and Industry

cs.SE · 2026-02-08 · unverdicted · novelty 3.0

Survey evidence shows the software engineering community believes academia must shift from bystander to active partner with industry to boost research impact and relevance.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

citing papers explorer

Showing 5 of 5 citing papers.

Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process cs.CL · 2025-12-29 · unverdicted · none · ref 39
LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.
Discriminatory Compliance: How LLMs Answer Queries from Protected Groups cs.CY · 2026-06-19 · unverdicted · none · ref 36
State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.
Be a Partner, not a Bystander in Software Engineering Practice: Bridging the Gaps between Academia and Industry cs.SE · 2026-02-08 · unverdicted · none · ref 14
Survey evidence shows the software engineering community believes academia must shift from bystander to active partner with industry to boost research impact and relevance.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 149
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 173
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

arXiv preprint arXiv:2409.13884 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer