Glass , bibsource =

Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James R · 2019 · DOI 10.1609/aaai.v33i01.33016309

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models

cs.CL · 2026-06-19 · unverdicted · novelty 6.0

LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

cs.AI · 2024-08-23 · unverdicted · novelty 4.0

The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models cs.CL · 2026-06-19 · unverdicted · none · ref 60
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.

Glass , bibsource =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer