pith. sign in

arxiv: 2510.17385 · v5 · pith:BXRAWZ5Lnew · submitted 2025-10-20 · 💻 cs.LG · cs.AI

Strengthening LLMs for Tabular Prediction with Structural Priors

classification 💻 cs.LG cs.AI
keywords tabularllmscompetitivepredictionstrongbaselinesmodelsoptimization
0
0 comments X
read the original abstract

Tabular prediction has long been dominated by gradient-boosted decision trees and specialized deep tabular models, while large language models (LLMs) remain difficult to make competitive despite their cross-task adaptability and transparent reasoning traces. We address this gap by incorporating tabular structural priors into LLM post-training. Specifically, we propose Permutation Relative Policy Optimization (PRPO), which operationalizes column-permutation invariance through label-preserving column permutations and two-level advantage estimation. This design converts sparse outcome rewards into denser and more stable optimization signals. Extensive experiments on 139 OpenML datasets show that our 8B model reaches a genuinely competitive regime against strong specialized tabular baselines. It achieves strong fully supervised performance, dominates zero-shot settings, and performs on par with 32-shot strong baselines. Moreover, it substantially outperforms much larger general-purpose and reasoning LLMs, including up to a 53.17% improvement over DeepSeek-R1 (685B). These results show that structural-prior RL post-training is an effective route for making LLMs competitive in tabular prediction.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold

    cs.AI 2026-04 unverdicted novelty 6.0

    ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.

  2. ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold

    cs.AI 2026-04 unverdicted novelty 6.0

    ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and fina...