Title resolution pending

URL https://arxiv · 2025 · arXiv 2510.02663

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

cs.CR · 2026-04-20 · unverdicted · novelty 7.0

LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.

Are Agents Ready to Teach? A Multi-Stage Benchmark for Real-World Teaching Workflows

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

EduAgentBench is a new source-grounded benchmark that evaluates tutor agents across pedagogical judgment, situated multi-turn tutoring, and Canvas-style workflow completion, finding frontier models capable of basic judgment but inadequate for professional teaching standards.

The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness

cs.CY · 2026-05-07 · conditional · novelty 6.0

Behavioral signals from how students use AI tutor feedback in 10k code submissions reveal differences between tutors and correlate more strongly with perceived helpfulness than pedagogical quality alone.

Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning

cs.CL · 2026-04-07 · unverdicted · novelty 4.0

EduQwen 32B models optimized via RL then SFT set new SOTA on the Cross-Domain Pedagogical Knowledge Benchmark and surpass Gemini-3 Pro.

citing papers explorer

Showing 4 of 4 citing papers.

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks cs.CR · 2026-04-20 · unverdicted · none · ref 86
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
Are Agents Ready to Teach? A Multi-Stage Benchmark for Real-World Teaching Workflows cs.AI · 2026-05-14 · unverdicted · none · ref 11
EduAgentBench is a new source-grounded benchmark that evaluates tutor agents across pedagogical judgment, situated multi-turn tutoring, and Canvas-style workflow completion, finding frontier models capable of basic judgment but inadequate for professional teaching standards.
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness cs.CY · 2026-05-07 · conditional · none · ref 25
Behavioral signals from how students use AI tutor feedback in 10k code submissions reveal differences between tutors and correlate more strongly with perceived helpfulness than pedagogical quality alone.
Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning cs.CL · 2026-04-07 · unverdicted · none · ref 10
EduQwen 32B models optimized via RL then SFT set new SOTA on the Cross-Domain Pedagogical Knowledge Benchmark and surpass Gemini-3 Pro.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer