GCPO uses team-level credit assignment via determinant volume over reward-weighted semantic embeddings to promote non-redundant correct reasoning paths, improving both accuracy and diversity in LLM training.
Sentence-bert: Sentence embeddings using siamese bert- networks
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
method 1polarities
use method 1representative citing papers
S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
PREFER is an online preference learning system that generates personalized review summaries and improves alignment with user interests in simulations on Amazon review data.
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.
citing papers explorer
-
Breaking $\textit{Winner-Takes-All}$: Cooperative Policy Optimization Improves Diverse LLM Reasoning
GCPO uses team-level credit assignment via determinant volume over reward-weighted semantic embeddings to promote non-redundant correct reasoning paths, improving both accuracy and diversity in LLM training.
-
S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs
S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.
-
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
-
PREFER: Personalized Review Summarization with Online Preference Learning
PREFER is an online preference learning system that generates personalized review summaries and improves alignment with user interests in simulations on Amazon review data.
-
Towards Long-horizon Agentic Multimodal Search
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
-
Goal-Conditioned Supervised Learning for LLM Fine-Tuning
GCSL reframes LLM fine-tuning as supervised pursuit of quality thresholds using natural-language goals, outperforming SFT and DPO on toxicity, code, and recommendation tasks.