arxiv: 2604.20749 · v1 · submitted 2026-04-22 · 💻 cs.AI

Recognition: unknown

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

Dongding Lin , Jian Wang , Yongqi Li , Wenjie Li

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:49 UTC · model grok-4.3

classification 💻 cs.AI

keywords situated conversational recommendationdynamic preferencesimplicit preferencesscene transition estimationBayesian inverse inferencemultimodal large language modelsrecommendation accuracy

0 comments

The pith

SiPeR improves situated conversational recommendations by estimating scene suitability and inferring hidden preferences through Bayesian updates on multimodal model likelihoods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SiPeR as a framework for situated conversational recommendation that must handle preferences that shift with the visual environment and are rarely stated outright. It proposes two linked mechanisms: scene transition estimation that checks whether the current surroundings meet the user's needs and prompts a change when they do not, plus Bayesian inverse inference that reads the likelihood scores from multimodal large language models as evidence for which items the user would actually prefer. Together these allow the system to decide both the timing and the content of recommendations within ongoing dialogues. A reader would care because most everyday recommendation happens inside changing physical or visual contexts where explicit feedback is scarce.

Core claim

SiPeR integrates scene transition estimation, which determines whether the current scene satisfies user needs and guides the user toward a more suitable scene when necessary, with Bayesian inverse inference that treats the likelihood outputs of multimodal large language models as direct signals of user preferences over candidate items inside that scene. This combination enables reasoning about dynamic and implicit preferences that evolve across conversation turns, yielding higher recommendation accuracy and better response quality than prior methods on two standard benchmarks.

What carries the argument

Scene transition estimation paired with Bayesian inverse inference that converts multimodal large language model likelihoods into preference probabilities.

Load-bearing premise

That the likelihood scores produced by multimodal large language models accurately reflect the user's actual implicit preferences for items in the current scene.

What would settle it

A controlled user study in which participants engage in situated dialogues and their real item selections or stated preferences systematically differ from the items ranked highest by SiPeR's Bayesian inference on the same scenes and conversation history.

Figures

Figures reproduced from arXiv: 2604.20749 by Dongding Lin, Jian Wang, Wenjie Li, Yongqi Li.

**Figure 2.** Figure 2: Overview of the Situated Preference Reasoning (SiPeR), which has two critical mechanisms: (a) scene [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of (a) ROC curve and (b) cali [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Human evaluation results comparing SiPeR [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Representative instances of the predefined [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Input-Output format for fine-tuning the policy [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: The prompt template used for GPT-Score evaluation. The placeholders SCENE_PROFILE, DIA [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Cases of generated responses for different models, where no scene transition is needed. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Cases of generated responses for different models when the scene transition is required. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional recommendations, SCR requires a deeper understanding of dynamic and implicit user preferences, as the surrounding scene often influences users' underlying interests, while both may evolve across conversations. This complexity significantly impacts the timing and relevance of recommendations. To address this, we propose situated preference reasoning (SiPeR), a novel framework that integrates two core mechanisms: (1) Scene transition estimation, which estimates whether the current scene satisfies user needs, and guides the user toward a more suitable scene when necessary; and (2) Bayesian inverse inference, which leverages the likelihood of multimodal large language models (MLLMs) to predict user preferences about candidate items within the scene. Extensive experiments on two representative benchmarks demonstrate SiPeR's superiority in both recommendation accuracy and response generation quality. The code and data are available at https://github.com/DongdingLin/SiPeR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SiPeR adds scene transition estimation and Bayesian inverse inference over MLLM likelihoods to model dynamic preferences in situated conversational recommendation, but the reported gains rest on an unvalidated proxy assumption.

read the letter

The main point is that this paper introduces SiPeR to handle how visual scenes shift user preferences during conversations. It estimates whether the current scene meets needs and uses Bayesian inversion on multimodal LLM output probabilities to infer item preferences. That combination targets a gap in standard conversational rec systems that treat preferences as more static or text-only.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces SiPeR, a framework for situated conversational recommendation (SCR) that combines (1) scene transition estimation to determine whether the current visual scene satisfies user needs and (2) Bayesian inverse inference that treats likelihoods from multimodal large language models (MLLMs) as proxies for users' dynamic and implicit preferences over candidate items. It claims that experiments on two representative benchmarks demonstrate superiority in both recommendation accuracy and response generation quality, with code and data released.

Significance. If the empirical gains are robust and the modeling assumptions hold, the work could advance context-aware conversational systems by explicitly reasoning about scene dynamics and preference inference. The public release of code and data is a positive contribution to reproducibility. The significance is reduced, however, by the absence of direct validation for the key assumption that MLLM likelihoods reliably encode scene-conditioned user intent.

major comments (2)

[Method (Bayesian inverse inference)] Method section on Bayesian inverse inference: the claim that MLLM likelihoods serve as accurate proxies for implicit, scene-influenced user preferences is load-bearing for attributing benchmark improvements to the proposed mechanisms, yet no human preference correlation, ablation that removes the inverse-inference step, or error analysis of how scene-transition estimation interacts with preference inference is supplied.
[Experiments] Experiments section: while superiority in accuracy and quality is asserted, the manuscript supplies insufficient detail on exact metrics, baseline implementations, statistical tests, and effect sizes, preventing assessment of whether the reported gains are attributable to the novel components rather than implementation choices.

minor comments (2)

[Abstract] Abstract: the two representative benchmarks are not named; early identification would aid readers.
[Method] Notation: the formalization of the Bayesian update and scene-transition probability could be introduced with a single running example to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below and outline the revisions planned for the manuscript.

read point-by-point responses

Referee: [Method (Bayesian inverse inference)] Method section on Bayesian inverse inference: the claim that MLLM likelihoods serve as accurate proxies for implicit, scene-influenced user preferences is load-bearing for attributing benchmark improvements to the proposed mechanisms, yet no human preference correlation, ablation that removes the inverse-inference step, or error analysis of how scene-transition estimation interacts with preference inference is supplied.

Authors: We agree that additional evidence would strengthen attribution of gains to the Bayesian inverse inference component. In the revised manuscript we will add an ablation that removes the inverse-inference step and reports the resulting drop in accuracy and response quality on both benchmarks. We will also add an error analysis section with concrete examples illustrating interactions between scene-transition estimation and preference inference. Direct human preference correlation studies were outside the scope of the original work; we will expand the discussion to cite prior literature on MLLM likelihood alignment with human judgments and explicitly note the lack of direct validation as a limitation. This constitutes a partial revision. revision: partial
Referee: [Experiments] Experiments section: while superiority in accuracy and quality is asserted, the manuscript supplies insufficient detail on exact metrics, baseline implementations, statistical tests, and effect sizes, preventing assessment of whether the reported gains are attributable to the novel components rather than implementation choices.

Authors: We agree that greater detail is required for proper evaluation. The revised experiments section will define all metrics with formulas, provide full implementation details and hyperparameters for every baseline, report statistical significance tests (including p-values), and include effect sizes for the observed improvements. These additions will clarify the contribution of the novel components and will be incorporated as a full revision. revision: yes

Circularity Check

0 steps flagged

No circularity: framework assembles existing MLLM components without self-referential reductions or fitted inputs renamed as predictions

full rationale

The paper describes SiPeR as integrating scene transition estimation and Bayesian inverse inference that consumes MLLM likelihoods as proxies for implicit preferences. No equations, derivations, or parameter-fitting steps are exhibited that would reduce any claimed output (e.g., recommendation accuracy) to the inputs by construction. The superiority claim rests on benchmark experiments rather than tautological re-labeling of fitted quantities or self-citation chains that bear the central load. The architecture therefore remains self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on pre-existing multimodal LLMs and standard Bayesian concepts without detailing any new fitted values or postulates.

pith-pipeline@v0.9.0 · 5491 in / 904 out tokens · 37957 ms · 2026-05-09T23:49:12.592946+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

102 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Hovy , editor =

Naoki Otani and Jun Araki and HyeongSik Kim and Eduard H. Hovy , editor =. A Textual Dataset for Situated Proactive Response Selection , booktitle =. 2023 , doi =

2023
[2]

Crook and Shivani Poddar and Ankita De and Semir Shafi and David Whitney and Alborz Geramifard and Rajen Subba , title =

Paul A. Crook and Shivani Poddar and Ankita De and Semir Shafi and David Whitney and Alborz Geramifard and Rajen Subba , title =. CoRR , volume =. 2019 , eprinttype =

2019
[3]

2021 , doi =

Dietmar Jannach and Ahtsham Manzoor and Wanling Cai and Li Chen , title =. 2021 , doi =

2021
[4]

The 41st International

Yueming Sun and Yi Zhang , title =. The 41st International. 2018 , doi =

2018
[5]

Advances in Neural Information Processing Systems , pages =

Raymond Li and Samira Ebrahimi Kahou and Hannes Schulz and Vincent Michalski and Laurent Charlin and Chris Pal , title =. Advances in Neural Information Processing Systems , pages =
[6]

Towards Topic-Guided Conversational Recommender System , booktitle =

Kun Zhou and Yuanhang Zhou and Wayne Xin Zhao and Xiaoke Wang and Ji. Towards Topic-Guided Conversational Recommender System , booktitle =. 2020 , doi =

2020
[7]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (

Shirley Anugrah Hayati and Dongyeop Kang and Qingxiaoyang Zhu and Weiyan Shi and Zhou Yu , title =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (. 2020 , doi =

2020
[8]

Towards Conversational Recommendation over Multi-Type Dialogs , booktitle =

Zeming Liu and Haifeng Wang and Zheng. Towards Conversational Recommendation over Multi-Type Dialogs , booktitle =. 2020 , doi =

2020
[9]

DuRecDial 2.0:

Zeming Liu and Haifeng Wang and Zhengyu Niu and Hua Wu and Wanxiang Che , editor =. DuRecDial 2.0:. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (. 2021 , doi =

2021
[10]

Khapra and Karthik Sankaranarayanan , editor =

Amrita Saha and Mitesh M. Khapra and Karthik Sankaranarayanan , editor =. Towards Building Large Scale Multimodal Domain-Aware Conversation Systems , booktitle =. 2018 , doi =

2018
[11]

Riedl , editor =

Prithviraj Ammanabrolu and Renee Jia and Mark O. Riedl , editor =. Situated Dialogue Learning through Procedural Environment Generation , booktitle =. 2022 , doi =

2022
[12]

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue , pages=

A situated dialogue system for learning structural concepts in blocks world , author=. Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue , pages=
[13]

Proceedings of The Eleventh Dialog System Technology Challenge , pages=

Improving Situated Conversational Agents with Step-by-Step Multi-modal Logic Reasoning , author=. Proceedings of The Eleventh Dialog System Technology Challenge , pages=
[14]

AAAI 2021 DSTC9 Workshop , year=

Joint generation and bi-encoder for situated interactive multimodal conversations , author=. AAAI 2021 DSTC9 Workshop , year=

2021
[15]

Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems , journal =

Po. Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems , journal =
[16]

CoRR , volume =

Deyao Zhu and Jun Chen and Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny , title =. CoRR , volume =. 2023 , doi =

2023
[17]

Multimodal Recommendation Dialog with Subjective Preference:

Yuxing Long and Binyuan Hui and Caixia Yuan and Fei Huang and Yongbin Li and Xiaojie Wang , editor =. Multimodal Recommendation Dialog with Subjective Preference:. Findings of the Association for Computational Linguistics:
[18]

CoRR , volume =

Jianwei Yang and Hao Zhang and Feng Li and Xueyan Zou and Chunyuan Li and Jianfeng Gao , title =. CoRR , volume =
[19]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,

Satwik Kottur and Seungwhan Moon and Alborz Geramifard and Babak Damavandi , editor =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,. 2021 , doi =

2021
[20]

Bleu: a Method for Automatic Evaluation of Machine Translation , booktitle =

Kishore Papineni and Salim Roukos and Todd Ward and Wei. Bleu: a Method for Automatic Evaluation of Machine Translation , booktitle =. 2002 , doi =

2002
[21]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,

Qibin Chen and Junyang Lin and Yichang Zhang and Ming Ding and Yukuo Cen and Hongxia Yang and Jie Tang , title =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,. 2019 , doi =

2019
[22]

Proceedings of the 22nd

Konstantina Christakopoulou and Filip Radlinski and Katja Hofmann , title =. Proceedings of the 22nd. 2016 , doi =

2016
[23]

Deep Conversational Recommender in Travel , journal =

Lizi Liao and Ryuichi Takanobu and Yunshan Ma and Xun Yang and Minlie Huang and Tat. Deep Conversational Recommender in Travel , journal =
[24]

Advances and challenges in conversational recommender systems:

Chongming Gao and Wenqiang Lei and Xiangnan He and Maarten de Rijke and Tat. Advances and challenges in conversational recommender systems:. 2021 , doi =

2021
[25]

2022 , doi =

Yuanhang Zhou and Kun Zhou and Wayne Xin Zhao and Cheng Wang and Peng Jiang and He Hu , title =. 2022 , doi =

2022
[26]

Thirty-Seventh

Dongding Lin and Jian Wang and Wenjie Li , editor =. Thirty-Seventh. 2023 , url =

2023
[27]

State Graph Reasoning for Multimodal Conversational Recommendation , journal =

Yuxia Wu and Lizi Liao and Gangyi Zhang and Wenqiang Lei and Guoshuai Zhao and Xueming Qian and Tat. State Graph Reasoning for Multimodal Conversational Recommendation , journal =. 2023 , doi =

2023
[28]

Enhancing Product Representation with Multi-form Interactions for Multimodal Conversational Recommendation , booktitle =

Wenzhe Du and Su Haoyang and Cam. Enhancing Product Representation with Multi-form Interactions for Multimodal Conversational Recommendation , booktitle =. 2023 , doi =

2023
[29]

World Wide Web

Siqi Fan and Yequan Wang and Xiaobing Pang and Lisi Chen and Peng Han and Shuo Shang , title =. World Wide Web. 2023 , doi =

2023
[30]

The Next Generation Multimodal Conversational Search and Recommendation , booktitle =

Jo. The Next Generation Multimodal Conversational Search and Recommendation , booktitle =. 2021 , doi =

2021
[31]

CoRR , volume =

Hongyu Zhou and Xin Zhou and Zhiwei Zeng and Lingzi Zhang and Zhiqi Shen , title =. CoRR , volume =. 2023 , doi =

2023
[32]

CoRR , volume =

Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu , title =. CoRR , volume =. 2023 , doi =

2023
[33]

CoRR , volume =

Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem , title =. CoRR , volume =. 2023 , doi =

2023
[34]

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation , booktitle =

Jian Wang and Yi Cheng and Dongding Lin and Chak Tou Leong and Wenjie Li , editor =. Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation , booktitle =
[35]

2021 , doi =

Liqiang Nie and Fangkai Jiao and Wenjie Wang and Yinglong Wang and Qi Tian , title =. 2021 , doi =

2021
[36]

Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding , booktitle =

Haoyu Zhang and Meng Liu and Zan Gao and Xiaoqiang Lei and Yinglong Wang and Liqiang Nie , editor =. Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding , booktitle =. 2021 , doi =

2021
[37]

MMConv: An Environment for Multimodal Conversational Search across Multiple Domains , booktitle =

Lizi Liao and Le Hong Long and Zheng Zhang and Minlie Huang and Tat. MMConv: An Environment for Multimodal Conversational Search across Multiple Domains , booktitle =. 2021 , url =

2021
[38]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

Te. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2023 , url =

2023
[39]

, author=

The structure of phenotypic personality traits. , author=. American psychologist , volume=. 1993 , publisher=

1993
[40]

Litman , editor =

Mingzhi Yu and Emer Gilmartin and Diane J. Litman , editor =. Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue , booktitle =. 2019 , url =

2019
[41]

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion , booktitle =

Kun Zhou and Wayne Xin Zhao and Shuqing Bian and Yuanhang Zhou and Ji. Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion , booktitle =. 2020 , doi =

2020
[42]

CoRR , volume =

Tong Zhang and Yong Liu and Peixiang Zhong and Chen Zhang and Hao Wang and Chunyan Miao , title =. CoRR , volume =
[43]

, author=

Measuring nominal scale agreement among many raters. , author=. Psychological bulletin , volume=. 1971 , publisher=

1971
[44]

Do and Yan Xu and Pascale Fung , title =

Yejin Bang and Samuel Cahyawijaya and Nayeon Lee and Wenliang Dai and Dan Su and Bryan Wilie and Holy Lovenia and Ziwei Ji and Tiezheng Yu and Willy Chung and Quyet V. Do and Yan Xu and Pascale Fung , title =. CoRR , volume =. 2023 , url =

2023
[45]

Crook and Ankita De and Shivani Poddar and Theodore Levin and David Whitney and Daniel Difranco and Ahmad Beirami and Eunjoon Cho and Rajen Subba and Alborz Geramifard , editor =

Seungwhan Moon and Satwik Kottur and Paul A. Crook and Ankita De and Shivani Poddar and Theodore Levin and David Whitney and Daniel Difranco and Ahmad Beirami and Eunjoon Cho and Rajen Subba and Alborz Geramifard , editor =. Situated and Interactive Multimodal Conversations , booktitle =. 2020 , url =

2020
[46]

CoRR , volume =

Xiaolin Chen and Xuemeng Song and Liqiang Jing and Shuo Li and Linmei Hu and Liqiang Nie , title =. CoRR , volume =. 2022 , url =

2022
[47]

Proceedings of The Eleventh Dialog System Technology Challenge , pages=

Overview of Situated and Interactive Multimodal Conversations (SIMMC) 2.1 Track at DSTC 11 , author=. Proceedings of The Eleventh Dialog System Technology Challenge , pages=
[48]

Yang Yang and Chubing Zhang and Xin Song and Zheng Dong and Hengshu Zhu and Wenjie Li , title =
[49]

The Journal of the Acoustical Society of America , volume=

Perplexity—a measure of the difficulty of speech recognition tasks , author=. The Journal of the Acoustical Society of America , volume=. 1977 , publisher=

1977
[50]

Authorea Preprints , year=

GPT-4o: The Cutting-Edge Advancement in Multimodal LLM , author=. Authorea Preprints , year=
[51]

GitHub repository , version =

ChatArena: Multi-Agent Language Game Environments for Large Language Models , author =. GitHub repository , version =. 2023 , publisher =

2023
[52]

OpenAI , journal =. Hello. 2024 , month =

2024
[53]

Haotian Liu and Chunyuan Li and Qingyang Wu and Yong Jae Lee , title =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

2023
[54]

Junnan Li and Dongxu Li and Silvio Savarese and Steven C. H. Hoi , title =. International Conference on Machine Learning,. 2023 , biburl =

2023
[55]

Selvaraju and Akhilesh Gotmare and Shafiq R

Junnan Li and Ramprasaath R. Selvaraju and Akhilesh Gotmare and Shafiq R. Joty and Caiming Xiong and Steven Chu. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , booktitle =. 2021 , timestamp =

2021
[56]

Proceedings of the 32nd

Dongding Lin and Jian Wang and Chak Tou Leong and Wenjie Li , title =. Proceedings of the 32nd. 2024 , url =

2024
[57]

MMToM-QA: Multimodal Theory of Mind Question Answering , booktitle =

Chuanyang Jin and Yutong Wu and Jing Cao and Jiannan Xiang and Yen. MMToM-QA: Multimodal Theory of Mind Question Answering , booktitle =. 2024 , url =

2024
[58]

CoRR , volume =

Michal Kosinski , title =. CoRR , volume =. 2023 , url =

2023
[59]

Sparks of Artificial General Intelligence: Early experiments with

S. Sparks of Artificial General Intelligence: Early experiments with. CoRR , volume =. 2023 , url =

2023
[60]

CoRR , volume =

Peng Wang and Shuai Bai and Sinan Tan and Shijie Wang and Zhihao Fan and Jinze Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Yang Fan and Kai Dang and Mengfei Du and Xuancheng Ren and Rui Men and Dayiheng Liu and Chang Zhou and Jingren Zhou and Junyang Lin , title =. CoRR , volume =. 2024 , url =

2024
[61]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , booktitle =. 2022 , url =

2022
[62]

CoRR , volume =

Yue Feng and Shuchang Liu and Zhenghai Xue and Qingpeng Cai and Lantao Hu and Peng Jiang and Kun Gai and Fei Sun , title =. CoRR , volume =. 2023 , url =

2023
[63]

Littman and Anthony R

Leslie Pack Kaelbling and Michael L. Littman and Anthony R. Cassandra , title =. Artif. Intell. , volume =. 1998 , url =

1998
[64]

International Conference on Machine Learning,

Wenlong Huang and Pieter Abbeel and Deepak Pathak and Igor Mordatch , title =. International Conference on Machine Learning,
[65]

Pre-Trained Language Models for Interactive Decision-Making , booktitle =

Shuang Li and Xavier Puig and Chris Paxton and Yilun Du and Clinton Wang and Linxi Fan and Tao Chen and De. Pre-Trained Language Models for Interactive Decision-Making , booktitle =. 2022 , url =

2022
[66]

LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae , month=. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=
[67]

Exploring Prompt-based Multi-task Learning for Multimodal Dialog State Tracking and Immersive Multimodal Conversation

Chen, Yirong and Li, Ya and Wang, Tao and Xing, Xiaofen and Xu, Xiangmin and Liu, Quan and Liu, Cong and Hu, Guoping. Exploring Prompt-based Multi-task Learning for Multimodal Dialog State Tracking and Immersive Multimodal Conversation. Proceedings of The Eleventh Dialog System Technology Challenge. 2023

2023
[68]

PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods , author =
[69]

Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer , title =. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , year =

2023
[70]

7th International Conference on Learning Representations,

Ilya Loshchilov and Frank Hutter , title =. 7th International Conference on Learning Representations,. 2019 , url =

2019
[71]

Text summarization branches out , pages=

Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
[72]

Weinberger and Yoav Artzi , title =

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , title =. 8th International Conference on Learning Representations,. 2020 , url =

2020
[73]

Behavioral and brain sciences , volume=

Does the chimpanzee have a theory of mind? , author=. Behavioral and brain sciences , volume=. 1978 , publisher=

1978
[74]

theory of mind

Does the autistic child have a “theory of mind”? , author=. Cognition , volume=. 1985 , publisher=

1985
[75]

arXiv preprint arXiv:2402.15052 , year=

ToMBench: Benchmarking Theory of Mind in Large Language Models , author=. arXiv preprint arXiv:2402.15052 , year=

work page arXiv
[76]

OpenAI Blog , year =

Introducing ChatGPT , author =. OpenAI Blog , year =
[77]

The Llama 3 Herd of Models , journal =

Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al. The Llama 3 Herd of Models , journal =. 2024 , url =

2024
[78]

Ullman , title =

Tomer D. Ullman , title =. CoRR , volume =. 2023 , url =

2023
[79]

CoRR , volume =

Zihan Wang and Xiaocui Yang and Yongkang Liu and Shi Feng and Daling Wang and Yifei Zhang , title =. CoRR , volume =. 2024 , url =

2024
[80]

Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning , booktitle =

Ruiyi Zhang and Tong Yu and Yilin Shen and Hongxia Jin and Changyou Chen , editor =. Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning , booktitle =

Showing first 80 references.