TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
arXiv preprint arXiv:2405.06836 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SmileyLlama is an LLM transformed via SFT and DPO to generate valid novel drug-like molecules with user-specified properties and optimized 3D conformations for high binding affinity.
citing papers explorer
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
-
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
SmileyLlama is an LLM transformed via SFT and DPO to generate valid novel drug-like molecules with user-specified properties and optimized 3D conformations for high binding affinity.