{"work":{"id":"5b264119-de53-49d1-b7be-d172bf51c834","openalex_id":null,"doi":null,"arxiv_id":"2305.14233","raw_key":null,"title":"Enhancing Chat Language Models by Scaling High-quality Instructional Conversations","authors":null,"authors_text":"Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu","year":2023,"venue":"cs.CL","abstract":"Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT. Scaling the diversity and quality of such data, although straightforward, stands a great chance of leading to improved performance. This paper aims to improve the upper bound of open-source models further. We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat, which does not involve human queries. Our objective is to capture the breadth of interactions that a human might have with an AI assistant and employs a comprehensive framework to generate multi-turn conversation iteratively. UltraChat contains 1.5 million high-quality multi-turn dialogues and covers a wide range of topics and instructions. Our statistical analysis of UltraChat reveals its superiority in various key metrics, including scale, average length, diversity, coherence, etc., solidifying its position as a leading open-source dataset. Building upon UltraChat, we fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA. Our evaluations indicate that UltraLLaMA consistently outperforms other open-source models, including Vicuna, the previously recognized state-of-the-art open-source model. The dataset and the model will be publicly released\\footnote{\\url{https://github.com/thunlp/UltraChat}}.","external_url":"https://arxiv.org/abs/2305.14233","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T17:53:46.840156+00:00","pith_arxiv_id":"2305.14233","created_at":"2026-05-09T00:14:28.066364+00:00","updated_at":"2026-06-29T17:53:46.840156+00:00","title_quality_ok":true,"display_title":"Enhancing Chat Language Models by Scaling High-quality Instructional Conversations","render_title":"Enhancing Chat Language Models by Scaling High-quality Instructional Conversations"},"hub":{"state":{"work_id":"5b264119-de53-49d1-b7be-d172bf51c834","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":32,"external_cited_by_count":null,"distinct_field_count":6,"first_pith_cited_at":"2023-10-25T19:25:16+00:00","last_pith_cited_at":"2026-05-26T09:12:14+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-30T03:39:24.145935+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":6},{"context_role":"dataset","n":5}],"polarity_counts":[{"context_polarity":"background","n":5},{"context_polarity":"use_dataset","n":5},{"context_polarity":"unclear","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}