TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Crafting papers on machine learning
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
DeEscalWild supplies 1,500 high-fidelity de-escalation scenarios that let fine-tuned 3B SLMs outperform general-purpose larger models on realism and dialogue metrics.
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
Semi-LAR is a semi-supervised contrastive learning framework with linear attention for nighttime flare removal that refines pseudo-labels via quality assessment and uses flare-aware patch-level contrastive losses.
Caracal is a Fourier-based sequence mixing architecture that achieves causal autoregressive modeling with standard operators and competitive performance on long sequences.
citing papers explorer
-
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
-
DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs
DeEscalWild supplies 1,500 high-fidelity de-escalation scenarios that let fine-tuned 3B SLMs outperform general-purpose larger models on realism and dialogue metrics.
-
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
-
Semi-LAR: Semi-supervised Contrastive Learning with Linear Attention for Removal of Nighttime Flares
Semi-LAR is a semi-supervised contrastive learning framework with linear attention for nighttime flare removal that refines pseudo-labels via quality assessment and uses flare-aware patch-level contrastive losses.
-
Caracal: Causal Architecture via Spectral Mixing
Caracal is a Fourier-based sequence mixing architecture that achieves causal autoregressive modeling with standard operators and competitive performance on long sequences.