SiPeR improves recommendation accuracy and response quality in situated conversations by estimating scene transitions and performing Bayesian inverse inference with multimodal LLMs.
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.
PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.
citing papers explorer
-
Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation
SiPeR improves recommendation accuracy and response quality in situated conversations by estimating scene transitions and performing Bayesian inverse inference with multimodal LLMs.
-
MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model
MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.
-
Process Reinforcement through Implicit Rewards
PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.
-
InternLM2 Technical Report
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.
- Lessons from the Trenches on Reproducible Evaluation of Language Models