pith. sign in

Bibo Cai

Identifiers

  • name variant Bibo Cai 0.60 · backfill

Papers (7)

  1. DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning cs.AI · 2026 · author #3
  2. GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models cs.AI · 2026 · author #4
  3. TinyJudge: Unverifiable Constraint Alignment via Lightweight Specialist Ensembles cs.CL · 2026 · author #11
  4. The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge? cs.AI · 2026 · author #11
  5. Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration cs.AI · 2026 · author #5
  6. MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization cs.LG · 2026 · author #5
  7. Large Language Models Are Still Misled by Simple Bias Ensembles cs.CL · 2025 · author #5

Mentions

  • 2606.07520 #11 · arxiv_oai · confidence 0.70 Bibo Cai
  • 2605.29568 #3 · arxiv_oai · confidence 0.70 Bibo Cai

Frequent Coauthors