A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
Medec: A benchmark for medical error detection and correction in clinical notes
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 1polarities
use dataset 1representative citing papers
Moral alignment in LLMs improves with model size according to the power law D ∝ S^{-0.10} (R²=0.50).
LAC2R uses MCTS to systematically explore multiple LLM refinement trajectories for C-to-Rust translation and reports superior safety and correctness on small-scale benchmarks.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
Understanding the Mechanism of Altruism in Large Language Models
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
-
Scaling Laws for Moral Machine Judgment in Large Language Models
Moral alignment in LLMs improves with model size according to the power law D ∝ S^{-0.10} (R²=0.50).
-
Search-Based Multi-Trajectory Refinement for Safe C-to-Rust Translation with Large Language Models
LAC2R uses MCTS to systematically explore multiple LLM refinement trajectories for C-to-Rust translation and reports superior safety and correctness on small-scale benchmarks.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.