arXiv preprint arXiv:2406.11050 , year=

URL https://arxiv · 2024 · arXiv 2406.11050

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

cs.AI · 2026-04-23 · conditional · novelty 7.0

Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

cs.LG · 2024-10-07 · accept · novelty 7.0

LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

cs.AI · 2025-03-23 · conditional · novelty 5.0

LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cs.CL · 2024-12-07 · accept · novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

AMEL: Accumulated Message Effects on LLM Judgments

cs.AI · 2026-05-21

citing papers explorer

Showing 6 of 6 citing papers.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 156 · 2 links
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning cs.AI · 2026-04-23 · conditional · none · ref 7
Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models cs.LG · 2024-10-07 · accept · none · ref 75
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? cs.AI · 2025-03-23 · conditional · none · ref 14
LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 102
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
AMEL: Accumulated Message Effects on LLM Judgments cs.AI · 2026-05-21 · unreviewed · ref 14

arXiv preprint arXiv:2406.11050 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer