{"work":{"id":"c249e63c-cf40-408f-a4ff-fdf68e8cbeb8","openalex_id":null,"doi":null,"arxiv_id":"2407.10759","raw_key":null,"title":"Qwen2-Audio Technical Report","authors":null,"authors_text":"Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo","year":2024,"venue":"eess.AS","abstract":"We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio analysis. In the voice chat mode, users can freely engage in voice interactions with Qwen2-Audio without text input. In the audio analysis mode, users could provide audio and text instructions for analysis during the interaction. Note that we do not use any system prompts to switch between voice chat and audio analysis modes. Qwen2-Audio is capable of intelligently comprehending the content within audio and following voice commands to respond appropriately. For instance, in an audio segment that simultaneously contains sounds, multi-speaker conversations, and a voice command, Qwen2-Audio can directly understand the command and provide an interpretation and response to the audio. Additionally, DPO has optimized the model's performance in terms of factuality and adherence to desired behavior. According to the evaluation results from AIR-Bench, Qwen2-Audio outperformed previous SOTAs, such as Gemini-1.5-pro, in tests focused on audio-centric instruction-following capabilities. Qwen2-Audio is open-sourced with the aim of fostering the advancement of the multi-modal language community.","external_url":"https://arxiv.org/abs/2407.10759","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T18:43:50.720480+00:00","pith_arxiv_id":"2407.10759","created_at":"2026-05-09T06:25:46.002743+00:00","updated_at":"2026-06-29T18:43:50.720480+00:00","title_quality_ok":false,"display_title":"Qwen2-Audio Technical Report","render_title":"Qwen2-Audio Technical Report"},"hub":{"state":{"work_id":"c249e63c-cf40-408f-a4ff-fdf68e8cbeb8","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":114,"external_cited_by_count":null,"distinct_field_count":11,"first_pith_cited_at":"2024-06-11T17:22:23+00:00","last_pith_cited_at":"2026-06-01T18:05:18+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T19:59:08.776053+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":18},{"context_role":"baseline","n":5},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":18},{"context_polarity":"baseline","n":5},{"context_polarity":"use_method","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Qwen2-Audio Technical Report","claims":[{"claim_text":"We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio an","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"environments, and adaptive evaluation protocols tailored to different task paradigms. 1) Model Selection and Deployment Setup:We comprehensively evaluate our proposed FM-Speech against 11 advanced speech LLMs, comprising eight mainstream open-source models (Audio Flamingo 3 [8], Qwen3-Omni [7], Kimi-Audio [25], Step-Audio 2 [26], Omni- Captioner [27], Mimo-Audio [28], Qwen2.5-Omni [29], and Qwen2- Audio [30]) and three representative proprietary models (Gemini 2.5 Flash, Gemini 3 Flash, and Gemi","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"underpinned by their architectural design and the transition from task-specific cascaded systems toward unified, end-to- end multimodal frameworks [17], [42]. Unlike traditional systems characterized by modular decoupling, contempo- rary architectures employ a sophisticated pipeline designed to map continuous, non-stationary auditory signals into structured semantic latent spaces [16], [18]. 2.1 Architectural Foundations The structural integrity of LALMs is established upon a composite informati","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"cial expressions and gestures, in either 2D or 3D form. Speech dialog research splits into two paradigms: mod- ular and end-to-end. Modular systems pair an LLM with a TTS/vocoder-LLM emits text or semantic tokens, TTS renders waveforms [19, 22-24, 74, 88, 128]. End-to-end voice agents integrate understanding and speech generation in a single model, tightly coupling semantics, prosody, and emotion [15, 40, 129]. In visual generation [11, 47, 56, 59, 75, 123], particularly 2D portrait animation, a","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"categories of baselines: 1) Non-LLM metrics:We utilize AES model [ 13] with its CE, CU, PC, and PQ metrics, UTMOS [ 11], and NISQA [ 10] as our baselines. 2) General-purpose LLMs:We choose several MLLM models as baselines, including Gemini series ( Gemini-3- Pro, Gemini-2.5-Pro, and Gemini-2.5-Flash) [ 15], Qwen series (Qwen3-omni [28], Qwen2-audio [ 52]), and Nvidia's Audio Flamingo3 [53]. 7https://github.com/vivian556123/Jastin JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 6 TAB","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"preprint arXiv:2411.18138, 2024. [4] A. Zeng, Z. Du, M. Liu, K. Wang, S. Jiang, L. Zhao, Y . Dong, and J. Tang, \"Glm-4-voice: Towards intelligent and human-like end- to-end spoken chatbot,\"arXiv preprint arXiv:2412.02612, 2024. [5] Z. Xie and C. Wu, \"Mini-omni: Language models can hear, talk while thinking in streaming,\"arXiv preprint arXiv:2408.16725, 2024. [6] X. Geng, Q. Shao, H. Xue, S. Wang, H. Xie, Z. Guo, Y . Zhao, G. Li, W. Tian, C. Wanget al., \"Osum-echat: Enhancing end-to- end empathet","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"We compare our model with a diverse set of SLM baselines, including leading industrial systems and methods explicitly designed to reduce the modality gap. We calculate the modality gap as the performance difference between a speech model given spoken input and its backbone TLM given textual input. The compared model pairs are GLM-4-V oice [12] and GLM-4-9B [10], Qwen2-Audio [62] and Qwen-7B-Chat [63], DiV A [64] and Llama-3-8B [65], Qwen2.5-Omni [23] and Qwen2.5-7B-Instruct [66], Kimi-Audio [29]","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Qwen2-Audio Technical Report because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (17 contexts).","role_counts":[{"n":17,"context_role":"background"},{"n":5,"context_role":"baseline"},{"n":1,"context_role":"method"}]},"error":null,"updated_at":"2026-05-22T00:13:18.596675+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"21a86318-9e09-4d98-8b6f-55f70c35ecf5","orcid":null,"display_name":"Yunfei Chu"},{"id":"c42757d0-ace5-480f-b243-890854f9054c","orcid":null,"display_name":"Jin Xu"},{"id":"de26b4d9-7dee-43dc-a7b6-60bc87084a61","orcid":null,"display_name":"Qian Yang"},{"id":"0d7e22da-8a9e-4048-9707-e14dd07fe509","orcid":null,"display_name":"Haojie Wei"},{"id":"4e5e51bd-7ac1-4936-aa0e-40e8d011b27a","orcid":null,"display_name":"Xipin Wei"},{"id":"6fe3c884-415c-45f2-b8fb-6629cd6c24df","orcid":null,"display_name":"Zhifang Guo"}]},"error":null,"updated_at":"2026-05-22T00:13:18.590889+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T12:40:36.076506+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Qwen2.5-Omni Technical Report","work_id":"438f105c-fa9b-44aa-ad52-43acb8045cda","shared_citers":21},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":18},{"title":"Kimi-Audio Technical Report","work_id":"9c2ba56b-5585-4f28-b751-703f31dca2d5","shared_citers":16},{"title":"Qwen3-Omni Technical Report","work_id":"ae43e594-8bab-4471-b6af-92a300f6a048","shared_citers":16},{"title":"Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models","work_id":"d3f033ac-bfa8-4143-9d0d-51f3f5bd3f0e","shared_citers":14},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":14},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":12},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":12},{"title":"Step-audio 2 technical report","work_id":"2317bc9f-2d58-4e53-b7ea-804337e9fdef","shared_citers":12},{"title":"arXiv preprint arXiv:2412.02612 , year=","work_id":"1d250ff4-6ca5-4eb5-b561-48106b630d8b","shared_citers":11},{"title":"Audio flamingo 3: Advancing audio intelligence with fully open large audio language models","work_id":"67c2892d-8e27-4da1-8198-71c48e673e96","shared_citers":11},{"title":"MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark","work_id":"e60f85db-636c-4830-af85-5d31ebc74a1b","shared_citers":11},{"title":"Moshi: a speech-text foundation model for real-time dialogue","work_id":"3104332b-d279-44c8-aaa7-3d5a13c01832","shared_citers":11},{"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","shared_citers":10},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":8},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":7},{"title":"Mmsu: A massive multi-task spoken language understanding and reasoning benchmark","work_id":"1a24d3b1-2d00-407c-90f0-b0aeec13bfe6","shared_citers":7},{"title":"Qwen3-ASR Technical Report","work_id":"db50e258-4a3d-4141-ba30-f76f8f953880","shared_citers":7},{"title":"CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models","work_id":"3af84775-3b81-4078-b553-52739aae03ba","shared_citers":6},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":6},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":6},{"title":"Mimo-audio: Audio language models are few-shot learners","work_id":"8c438dfc-e0fb-45e5-983b-3708cd7afa91","shared_citers":6},{"title":"Mini-omni: Language models can hear, talk while thinking in streaming","work_id":"5192d640-6f69-48bb-b5e7-c2eb57e52726","shared_citers":6},{"title":"Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs","work_id":"83956045-536a-41ff-af02-b80e2a614eab","shared_citers":6}],"time_series":[{"n":1,"year":2024},{"n":3,"year":2025},{"n":50,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T12:40:39.780427+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T12:40:47.797731+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Qwen2-Audio Technical Report","claims":[{"claim_text":"We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio an","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"environments, and adaptive evaluation protocols tailored to different task paradigms. 1) Model Selection and Deployment Setup:We comprehensively evaluate our proposed FM-Speech against 11 advanced speech LLMs, comprising eight mainstream open-source models (Audio Flamingo 3 [8], Qwen3-Omni [7], Kimi-Audio [25], Step-Audio 2 [26], Omni- Captioner [27], Mimo-Audio [28], Qwen2.5-Omni [29], and Qwen2- Audio [30]) and three representative proprietary models (Gemini 2.5 Flash, Gemini 3 Flash, and Gemi","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"underpinned by their architectural design and the transition from task-specific cascaded systems toward unified, end-to- end multimodal frameworks [17], [42]. Unlike traditional systems characterized by modular decoupling, contempo- rary architectures employ a sophisticated pipeline designed to map continuous, non-stationary auditory signals into structured semantic latent spaces [16], [18]. 2.1 Architectural Foundations The structural integrity of LALMs is established upon a composite informati","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"cial expressions and gestures, in either 2D or 3D form. Speech dialog research splits into two paradigms: mod- ular and end-to-end. Modular systems pair an LLM with a TTS/vocoder-LLM emits text or semantic tokens, TTS renders waveforms [19, 22-24, 74, 88, 128]. End-to-end voice agents integrate understanding and speech generation in a single model, tightly coupling semantics, prosody, and emotion [15, 40, 129]. In visual generation [11, 47, 56, 59, 75, 123], particularly 2D portrait animation, a","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"categories of baselines: 1) Non-LLM metrics:We utilize AES model [ 13] with its CE, CU, PC, and PQ metrics, UTMOS [ 11], and NISQA [ 10] as our baselines. 2) General-purpose LLMs:We choose several MLLM models as baselines, including Gemini series ( Gemini-3- Pro, Gemini-2.5-Pro, and Gemini-2.5-Flash) [ 15], Qwen series (Qwen3-omni [28], Qwen2-audio [ 52]), and Nvidia's Audio Flamingo3 [53]. 7https://github.com/vivian556123/Jastin JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 6 TAB","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"preprint arXiv:2411.18138, 2024. [4] A. Zeng, Z. Du, M. Liu, K. Wang, S. Jiang, L. Zhao, Y . Dong, and J. Tang, \"Glm-4-voice: Towards intelligent and human-like end- to-end spoken chatbot,\"arXiv preprint arXiv:2412.02612, 2024. [5] Z. Xie and C. Wu, \"Mini-omni: Language models can hear, talk while thinking in streaming,\"arXiv preprint arXiv:2408.16725, 2024. [6] X. Geng, Q. Shao, H. Xue, S. Wang, H. Xie, Z. Guo, Y . Zhao, G. Li, W. Tian, C. Wanget al., \"Osum-echat: Enhancing end-to- end empathet","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"We compare our model with a diverse set of SLM baselines, including leading industrial systems and methods explicitly designed to reduce the modality gap. We calculate the modality gap as the performance difference between a speech model given spoken input and its backbone TLM given textual input. The compared model pairs are GLM-4-V oice [12] and GLM-4-9B [10], Qwen2-Audio [62] and Qwen-7B-Chat [63], DiV A [64] and Llama-3-8B [65], Qwen2.5-Omni [23] and Qwen2.5-7B-Instruct [66], Kimi-Audio [29]","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Qwen2-Audio Technical Report because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (17 contexts).","role_counts":[{"n":17,"context_role":"background"},{"n":5,"context_role":"baseline"},{"n":1,"context_role":"method"}]},"error":null,"updated_at":"2026-05-22T00:13:18.602370+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Qwen2-Audio Technical Report","claims":[{"claim_text":"We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio an","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Qwen2-Audio Technical Report because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T12:40:51.439794+00:00"}},"summary":{"title":"Qwen2-Audio Technical Report","claims":[{"claim_text":"We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio an","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Qwen2-Audio Technical Report because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Qwen2.5-Omni Technical Report","work_id":"438f105c-fa9b-44aa-ad52-43acb8045cda","shared_citers":21},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":18},{"title":"Kimi-Audio Technical Report","work_id":"9c2ba56b-5585-4f28-b751-703f31dca2d5","shared_citers":16},{"title":"Qwen3-Omni Technical Report","work_id":"ae43e594-8bab-4471-b6af-92a300f6a048","shared_citers":16},{"title":"Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models","work_id":"d3f033ac-bfa8-4143-9d0d-51f3f5bd3f0e","shared_citers":14},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":14},{"title":"GPT-4o System Card","work_id":"f37bf1c7-4964-4e56-9762-d20da8d9009f","shared_citers":12},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":12},{"title":"Step-audio 2 technical report","work_id":"2317bc9f-2d58-4e53-b7ea-804337e9fdef","shared_citers":12},{"title":"arXiv preprint arXiv:2412.02612 , year=","work_id":"1d250ff4-6ca5-4eb5-b561-48106b630d8b","shared_citers":11},{"title":"Audio flamingo 3: Advancing audio intelligence with fully open large audio language models","work_id":"67c2892d-8e27-4da1-8198-71c48e673e96","shared_citers":11},{"title":"MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark","work_id":"e60f85db-636c-4830-af85-5d31ebc74a1b","shared_citers":11},{"title":"Moshi: a speech-text foundation model for real-time dialogue","work_id":"3104332b-d279-44c8-aaa7-3d5a13c01832","shared_citers":11},{"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","shared_citers":10},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":8},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":7},{"title":"Mmsu: A massive multi-task spoken language understanding and reasoning benchmark","work_id":"1a24d3b1-2d00-407c-90f0-b0aeec13bfe6","shared_citers":7},{"title":"Qwen3-ASR Technical Report","work_id":"db50e258-4a3d-4141-ba30-f76f8f953880","shared_citers":7},{"title":"CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models","work_id":"3af84775-3b81-4078-b553-52739aae03ba","shared_citers":6},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":6},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":6},{"title":"Mimo-audio: Audio language models are few-shot learners","work_id":"8c438dfc-e0fb-45e5-983b-3708cd7afa91","shared_citers":6},{"title":"Mini-omni: Language models can hear, talk while thinking in streaming","work_id":"5192d640-6f69-48bb-b5e7-c2eb57e52726","shared_citers":6},{"title":"Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs","work_id":"83956045-536a-41ff-af02-b80e2a614eab","shared_citers":6}],"time_series":[{"n":1,"year":2024},{"n":3,"year":2025},{"n":50,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"0d7e22da-8a9e-4048-9707-e14dd07fe509","orcid":null,"display_name":"Haojie Wei","source":"manual","import_confidence":0.72},{"id":"c42757d0-ace5-480f-b243-890854f9054c","orcid":null,"display_name":"Jin Xu","source":"manual","import_confidence":0.72},{"id":"de26b4d9-7dee-43dc-a7b6-60bc87084a61","orcid":null,"display_name":"Qian Yang","source":"manual","import_confidence":0.72},{"id":"4e5e51bd-7ac1-4936-aa0e-40e8d011b27a","orcid":null,"display_name":"Xipin Wei","source":"manual","import_confidence":0.72},{"id":"21a86318-9e09-4d98-8b6f-55f70c35ecf5","orcid":null,"display_name":"Yunfei Chu","source":"manual","import_confidence":0.72},{"id":"6fe3c884-415c-45f2-b8fb-6629cd6c24df","orcid":null,"display_name":"Zhifang Guo","source":"manual","import_confidence":0.72}]}}