pith. sign in

arxiv: 2606.21933 · v1 · pith:DIMANKXSnew · submitted 2026-06-20 · 💻 cs.SD

ISCSLP 2026 CoT-TTS Challenge: Chain-of-Thought Reasoning for Context-Aware Text-to-Speech

Pith reviewed 2026-06-26 11:32 UTC · model grok-4.3

classification 💻 cs.SD
keywords text-to-speechchain-of-thought reasoningcontext-aware TTSconversational speech synthesisexpressive speech generationTTS challengemultimodal evaluation
0
0 comments X

The pith

The ISCSLP 2026 CoT-TTS Challenge requires text-to-speech systems to produce explicit chain-of-thought reasoning about context before generating speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a challenge to test whether TTS systems can automatically determine appropriate speaking styles from surrounding conversational context instead of relying on explicit style prompts. It defines two tracks—one using text context and one using audio context—where each submitted system must output both a step-by-step reasoning trace and the final speech waveform. A large bilingual training set drawn from media sources is supplied along with filtered evaluation data. Evaluation combines objective scores, multimodal LLM judgments, and human listening tests. A baseline fine-tuned 0.6B model and training recipe are released to lower the entry barrier for participants working on film dubbing, audiobooks, and dialogue agents.

Core claim

The paper establishes the CoT-TTS Challenge as a benchmark in which systems must infer intended speaking manner from either textual or audio context, output an explicit chain-of-thought analysis of that inference, and then synthesize speech that remains consistent with both the analysis and the surrounding scene; two separate tracks, a bilingual training corpus, and a three-way evaluation protocol (objective metrics, multimodal LLM scoring, and human assessment) are defined to measure progress toward context-aware expressive speech generation.

What carries the argument

The CoT-TTS requirement that each system output both an explicit chain-of-thought reasoning trace and the generated waveform, applied across text-context and audio-context tracks.

If this is right

  • Successful systems must combine context understanding with waveform generation rather than treating style as an independent input.
  • The released bilingual training data and three-stage fine-tuning recipe will allow rapid iteration on models that jointly reason and synthesize speech.
  • Leaderboard results will establish quantitative baselines for how well current architectures handle implicit style inference in long dialogues.
  • Improved performance should directly benefit downstream tasks such as film dubbing and spoken dialogue agents that currently require manual style specification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The challenge format could be extended to test whether reasoning traces transfer across languages or to new acoustic environments not seen in training.
  • If the evaluation protocol proves stable, similar CoT requirements might be applied to other generative audio tasks such as music or sound-effect synthesis from scene descriptions.
  • The separation of text-context and audio-context tracks offers a controlled comparison of how much additional information audio cues provide for style inference.

Load-bearing premise

That the filtered evaluation set together with the combination of objective metrics, multimodal LLM scoring, and human judgments will reliably isolate and measure genuine context-aware reasoning ability rather than dataset-specific patterns.

What would settle it

Top-ranked systems on the challenge leaderboard produce speech that listeners rate as inconsistent with the intended manner when tested on new conversational scenes drawn from the same media domains but excluded from the official evaluation set.

Figures

Figures reproduced from arXiv: 2606.21933 by Bei Liu, Bin Long, Boyi Kang, Jiahao Pan, Junlan Feng, Liumeng Xue, Ruosong Yang, Shilei Zhang, Sitong Cheng, Wei Xue, Weizhen Bian, Yue Wang.

Figure 1
Figure 1. Figure 1: Overview of the initial training data construction pipeline. Raw media are converted and normalized, processed into utterance- and scene-level segments, annotated with emo￾tion and reasoning information, and finally organized into CoT￾TTS training samples. After the initial construction, we refine the dataset through sample-level filtering. We extract audio-level features, includ￾ing segment duration, effe… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the evaluation data filtering pipeline. The initial candidate pool is progressively filtered through ba￾sic audio and context checks, target-audio quality assessment, similarity-based deduplication, LLM-based text quality scoring, audio–CoT consistency checking, and final human assessment. with multiple speakers in the target audio are removed. After automatic filtering, we further conduct huma… view at source ↗
read the original abstract

Recent advances in text-to-speech (TTS) have greatly improved speech naturalness, speaker similarity, and controllability. However, most existing controllable TTS systems still rely on explicit user-provided style prompts, making it difficult to automatically determine how a sentence should be spoken in long and complex conversational scenarios. This proposal introduces the ISCSLP 2026 CoT-TTS Challenge, which aims to evaluate whether a system can infer the intended speaking manner from contextual information and generate speech consistent with both the reasoning output and the surrounding scene. The challenge contains two tracks: text-context-aware CoT-TTS and audio-context-aware CoT-TTS. We construct a large-scale bilingual training set from speech-rich media and provide carefully filtered evaluation data for leaderboard comparison. Each system is required to output both a chain-of-thought reasoning analysis and the generated speech waveform. The official evaluation combines objective metrics, multimodal LLM-based evaluation, and human subjective assessment. To facilitate reproducibility, we provide inference code together with a fine-tuning recipe for a 0.6B Qwen3-based model trained via a three-stage strategy. This challenge is expected to support research on context understanding, chain-of-thought reasoning, and expressive speech generation for applications such as film dubbing, audiobook production, virtual characters, and spoken dialogue agents. Further information about the associated challenge is available at:https://iscslp2026-cot-tts.github.io/challenge-website/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript proposes the ISCSLP 2026 CoT-TTS Challenge consisting of two tracks (text-context-aware CoT-TTS and audio-context-aware CoT-TTS). It supplies a large-scale bilingual training set derived from speech-rich media, filtered evaluation data, and an evaluation protocol that combines objective metrics, multimodal LLM-based assessment, and human subjective listening tests. Systems must produce both chain-of-thought reasoning and the corresponding speech waveform. A reproducibility baseline is supplied in the form of inference code and a three-stage fine-tuning recipe for a 0.6B Qwen3 model.

Significance. If executed as described, the challenge could stimulate research on inferring expressive speaking style from conversational context rather than explicit prompts, with downstream relevance to dubbing, audiobooks, and dialogue agents. The explicit provision of training data, evaluation code, and a documented baseline constitutes a concrete contribution to reproducibility that strengthens the proposal.

minor comments (1)
  1. [Abstract] Abstract, final sentence: the URL is written without a space after 'at:' ('at:https://...').

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough review and positive recommendation to accept the manuscript. The comments confirm that the challenge proposal, data provision, evaluation protocol, and reproducibility baseline are viewed as a concrete contribution.

Circularity Check

0 steps flagged

No significant circularity: challenge proposal with no derivations

full rationale

This document is a challenge proposal that defines tracks, evaluation protocols, and a training recipe without any quantitative claims, derivations, equations, or predictions. No load-bearing steps exist that could reduce to fitted inputs, self-definitions, or self-citations. The central statements are definitional descriptions of the intended evaluation setup, which are self-contained by construction and contain no testable assertions that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The document introduces no mathematical models, free parameters, axioms, or invented entities; it is an organizational proposal for a competition.

pith-pipeline@v0.9.1-grok · 5837 in / 1028 out tokens · 25998 ms · 2026-06-26T11:32:09.921895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 9 linked inside Pith

  1. [1]

    Session/Challenge Organizers The proposed session organizers are listed below, together with their contact information and brief biographies where available for reference and further communication. Wei Xue, The Hong Kong University of Science and Technol- ogy Email:weixue@ust.hk Brief biography:Wei Xue is an Assistant Professor at the Hong Kong University...

  2. [2]

    Modern TTS models can now support expressive style control, instruction- following generation, and long-form multi-speaker speech synthesis

    Background, Motivation, and Potential Impact Recent advances in large language models have expanded the capabilities of text-to-speech (TTS) systems [1]. Modern TTS models can now support expressive style control, instruction- following generation, and long-form multi-speaker speech synthesis. Natural-language style prompts have been used for expressive T...

  3. [3]

    The topic is closely related to current re- search in speech synthesis, speech foundation models, spoken dialogue systems, and multimodal speech processing

    Tentative List of Authors / Teams Who Could Contribute Papers We expect this challenge to attract participating teams from both academia and industry. The topic is closely related to current re- search in speech synthesis, speech foundation models, spoken dialogue systems, and multimodal speech processing. The fol- lowing list represents potential contrib...

  4. [4]

    The challenge will be launched after proposal acceptance, and the model submission deadline will be aligned with the paper submission deadline

    Tentative Challenge Timeline The tentative timeline is designed to follow the official ISCSLP 2026 schedule. The challenge will be launched after proposal acceptance, and the model submission deadline will be aligned with the paper submission deadline. The evaluation stage will take place before the paper acceptance notification, so that the final ranking...

  5. [5]

    The session will begin with an overview of the chal- lenge, including the task setup, dataset construction, evaluation methodology, and overall results

    Planned Format of the Challenge Session The proposed challenge session will be organized as a two-hour event. The session will begin with an overview of the chal- lenge, including the task setup, dataset construction, evaluation methodology, and overall results. Representatives from the winning teams in each track and model category will then be invited t...

  6. [6]

    They have relevant expertise in speech synthesis, speech generation, speech understanding, and spoken language processing

    Recommended Expert Reviewers The following researchers are recommended as potential expert reviewers for papers submitted to this challenge session. They have relevant expertise in speech synthesis, speech generation, speech understanding, and spoken language processing. Table 4:Tentative list of recommended reviewers. Name Affiliation Xie Chen Shanghai J...

  7. [7]

    Training Dataset The training dataset is built from speech-rich media through a step-by-step processing pipeline

    Additional or Non-Standard Resources Needed 7.1. Training Dataset The training dataset is built from speech-rich media through a step-by-step processing pipeline. We first select suitable source materials and convert them into normalized audio. The audio is then processed to obtain utterance-level information such as speaker labels, timestamps, and spoken...

  8. [8]

    speaker-id: spoken text

    Challenge Description This challenge focuses on context-aware CoT-TTS. Given con- textual information, a reference speech sample, and a target text, a system is expected to infer the appropriate speaking style of the target sentence, produce a chain-of-thought reasoning anal- ysis, and generate speech with the speaker timbre specified by the reference spe...

  9. [9]

    Towards con- trollable speech synthesis in the era of large language models: A systematic survey,

    T. Xie, Y . Rong, P. Zhang, W. Wang, and L. Liu, “Towards con- trollable speech synthesis in the era of large language models: A systematic survey,” inProceedings of the 2025 Conference on Em- pirical Methods in Natural Language Processing, 2025, pp. 764– 791

  10. [10]

    Instructtts: Modelling expressive tts in discrete latent space with natural lan- guage style prompt,

    D. Yang, S. Liu, R. Huang, C. Weng, and H. Meng, “Instructtts: Modelling expressive tts in discrete latent space with natural lan- guage style prompt,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2913–2925, 2024

  11. [11]

    Fish audio s2 technical report,

    S. Liao, Y . Wang, S. Liu, Y . Cheng, R. Zhang, T. Li, S. Li, Y . Zheng, X. Liu, Q. Wanget al., “Fish audio s2 technical report,” arXiv preprint arXiv:2603.08823, 2026

  12. [12]

    V oxcpm: Tokenizer-free tts for context-aware speech generation and true-to-life voice cloning,

    Y . Zhou, G. Zeng, X. Liu, X. Li, R. Yu, Z. Wang, R. Ye, W. Sun, J. Gui, K. Liet al., “V oxcpm: Tokenizer-free tts for context-aware speech generation and true-to-life voice cloning,”arXiv preprint arXiv:2509.24650, 2025

  13. [13]

    Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech,

    S. Zhou, Y . Zhou, Y . He, X. Zhou, J. Wang, W. Deng, and J. Shu, “Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 41, 2026, pp. 35 139–35 148

  14. [14]

    Borderless long speech synthesis,

    X. Song, D. Wu, D. Zhou, P. Cheng, H. Ding, Y . He, J. Wang, S. Shen, S. Lv, L. Fanet al., “Borderless long speech synthesis,” arXiv preprint arXiv:2603.19798, 2026

  15. [15]

    Soulx-podcast: Towards realis- tic long-form podcasts with dialectal and paralinguistic diversity,

    H. Xie, H. Lin, W. Cao, D. Guo, W. Tian, J. Wu, H. Wen, R. Shang, H. Liu, Z. Jianget al., “Soulx-podcast: Towards realis- tic long-form podcasts with dialectal and paralinguistic diversity,” arXiv preprint arXiv:2510.23541, 2025

  16. [16]

    Duplexcascade: Full- duplex speech-to-speech dialogue with vad-free cascaded asr- llm-tts pipeline and micro-turn optimization,

    J. Yang, Y . Fujita, and Y . Sudo, “Duplexcascade: Full- duplex speech-to-speech dialogue with vad-free cascaded asr- llm-tts pipeline and micro-turn optimization,”arXiv preprint arXiv:2603.09180, 2026

  17. [17]

    Style- talker: Finetuning audio language model and style-based text-to- speech model for fast spoken dialogue generation,

    Y . A. Li, X. Jiang, J. Darefsky, G. Zhu, and N. Mesgarani, “Style- talker: Finetuning audio language model and style-based text-to- speech model for fast spoken dialogue generation,”arXiv preprint arXiv:2408.11849, 2024

  18. [18]

    M 2-ctts: End-to-end multi-scale multi-modal conver- sational text-to-speech synthesis,

    J. Xue, Y . Deng, F. Wang, Y . Li, Y . Gao, J. Tao, J. Sun, and J. Liang, “M 2-ctts: End-to-end multi-scale multi-modal conver- sational text-to-speech synthesis,” inICASSP 2023-2023 IEEE In- ternational Conference on Acoustics, Speech and Signal Process- ing (ICASSP). IEEE, 2023, pp. 1–5

  19. [19]

    H4c-tts: Leveraging multi-modal historical context for conversational text-to-speech

    D. Seong and J.-H. Chang, “H4c-tts: Leveraging multi-modal historical context for conversational text-to-speech.” inINTER- SPEECH, 2024

  20. [20]

    Diffcss: Diverse and expressive conversational speech synthesis with diffusion models,

    W. Wu, Z. Lin, Y . Zhou, J. Li, R. Niu, Q. Wu, S. Cao, L. Ma, and Z. Wu, “Diffcss: Diverse and expressive conversational speech synthesis with diffusion models,” inICASSP 2025-2025 IEEE In- ternational Conference on Acoustics, Speech and Signal Process- ing (ICASSP). IEEE, 2025, pp. 1–5

  21. [21]

    Cmcu-css: En- hancing naturalness via commonsense-based multi-modal context understanding in conversational speech synthesis,

    Y . Deng, J. Xue, F. Wang, Y . Gao, and Y . Li, “Cmcu-css: En- hancing naturalness via commonsense-based multi-modal context understanding in conversational speech synthesis,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 6081–6089

  22. [22]

    Actormind: Emulating hu- man actor reasoning for speech role-playing,

    X. Chen, W. Xue, and Y . Guo, “Actormind: Emulating hu- man actor reasoning for speech role-playing,”arXiv preprint arXiv:2604.11103, 2026

  23. [23]

    A preliminary exploration with gpt-4o voice mode,

    Y .-X. Lin, C.-K. Yang, W.-C. Chen, C.-A. Li, C.-y. Huang, X. Chen, and H.-y. Lee, “A preliminary exploration with gpt-4o voice mode,”arXiv preprint arXiv:2502.09940, 2025

  24. [24]

    Gpt-4o system card,

    A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

  25. [25]

    Captalk: Unified voice de- sign for single-utterance and dialogue speech generation,

    X. Su, Z. Sun, P. Jia, and J. Gao, “Captalk: Unified voice de- sign for single-utterance and dialogue speech generation,”arXiv preprint arXiv:2604.08363, 2026

  26. [26]

    Dailydialog: A manually labelled multi-turn dialogue dataset,

    Y . Li, H. Su, X. Shen, W. Li, Z. Cao, and S. Niu, “Dailydialog: A manually labelled multi-turn dialogue dataset,” inProceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2017, pp. 986–995

  27. [27]

    Dailytalk: Spoken dialogue dataset for conversational text-to-speech,

    K. Lee, K. Park, and D. Kim, “Dailytalk: Spoken dialogue dataset for conversational text-to-speech,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP). IEEE, 2023, pp. 1–5

  28. [28]

    Emotion rendering for conversational speech synthesis with heterogeneous graph-based context modeling,

    R. Liu, Y . Hu, Y . Ren, X. Yin, and H. Li, “Emotion rendering for conversational speech synthesis with heterogeneous graph-based context modeling,” inProceedings of the AAAI Conference on Ar- tificial Intelligence, vol. 38, no. 17, 2024, pp. 18 698–18 706

  29. [29]

    Prompt-guided selective masking loss for context-aware emotive text-to-speech,

    Y . Jeon, Y . Kim, J. Lee, and G. Lee, “Prompt-guided selective masking loss for context-aware emotive text-to-speech,” inFind- ings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 638–650

  30. [30]

    Generative expressive conversational speech synthesis,

    R. Liu, Y . Hu, Y . Ren, X. Yin, and H. Li, “Generative expressive conversational speech synthesis,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 4187–4196

  31. [31]

    Emospeech: A corpus of emotionally rich and contextually detailed speech annotations,

    W. Bian, Y . Zhou, K. Zhang, and X. Gu, “Emospeech: A corpus of emotionally rich and contextually detailed speech annotations,” in2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2024, pp. 417–420

  32. [32]

    Dnaspeech: A contextualized and situated text-to-speech dataset with dialogues, narratives and actions,

    C. Cheng, H. Sun, B. Du, S. Shang, X. Hu, and R. Yan, “Dnaspeech: A contextualized and situated text-to-speech dataset with dialogues, narratives and actions,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 18 599–18 616

  33. [33]

    But system description to voxceleb speaker recognition chal- lenge 2019,

    H. Zeinali, S. Wang, A. Silnova, P. Mat ˇejka, and O. Plchot, “But system description to voxceleb speaker recognition chal- lenge 2019,”arXiv preprint arXiv:1910.12592, 2019

  34. [34]

    Wespeaker: A research and production oriented speaker embedding learning toolkit,

    H. Wang, C. Liang, S. Wang, Z. Chen, B. Zhang, X. Xiang, Y . Deng, and Y . Qian, “Wespeaker: A research and production oriented speaker embedding learning toolkit,” inIEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  35. [35]

    pyannote.audio 2.1 speaker diarization pipeline: prin- ciple, benchmark, and recipe,

    H. Bredin, “pyannote.audio 2.1 speaker diarization pipeline: prin- ciple, benchmark, and recipe,” inProc. INTERSPEECH 2023, 2023

  36. [36]

    Powerset multi-class cross entropy loss for neural speaker diarization,

    A. Plaquet and H. Bredin, “Powerset multi-class cross entropy loss for neural speaker diarization,” inProc. INTERSPEECH 2023, 2023

  37. [37]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

    DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025. [Online]. Available: https://arxiv.org/abs/2501.12948

  38. [38]

    Funasr: A fundamental end-to-end speech recog- nition toolkit,

    Z. Gaoet al., “Funasr: A fundamental end-to-end speech recog- nition toolkit,” inINTERSPEECH, 2023

  39. [39]

    Dawn of the trans- former era in speech emotion recognition: closing the valence gap,

    J. Wagner, A. Triantafyllopoulos, H. Wierstorf, M. Schmitt, F. Burkhardt, F. Eyben, and B. W. Schuller, “Dawn of the trans- former era in speech emotion recognition: closing the valence gap,”IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 45, no. 9, pp. 10 745–10 759, 2023

  40. [40]

    Qwen3-asr technical report,

    X. Shi, X. Wang, Z. Guo, Y . Wang, P. Zhang, X. Zhang, Z. Guo, H. Hao, Y . Xi, B. Yang, J. Xu, J. Zhou, and J. Lin, “Qwen3-asr technical report,”arXiv preprint arXiv:2601.21337, 2026

  41. [41]

    Uniss: Unified expressive speech-to-speech transla- tion with your voice,

    S. Cheng, W. Bian, X. Wang, R. Yuan, J. Chen, S. Yin, Y . Guo, and W. Xue, “Uniss: Unified expressive speech-to-speech transla- tion with your voice,”arXiv preprint arXiv:2509.21144, 2025

  42. [42]

    Qwen3 technical report,

    Q. Team, “Qwen3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2505.09388

  43. [43]

    Spark-tts: An efficient llm- based text-to-speech model with single-stream decoupled speech tokens,

    X. Wang, M. Jiang, Z. Ma, Z. Zhang, S. Liu, L. Li, Z. Liang, Q. Zheng, R. Wang, X. Fenget al., “Spark-tts: An efficient llm- based text-to-speech model with single-stream decoupled speech tokens,”arXiv preprint arXiv:2503.01710, 2025

  44. [44]

    Distillation and pruning for scalable self-supervised representation-based speech quality assessment,

    B. Stahl and H. Gamper, “Distillation and pruning for scalable self-supervised representation-based speech quality assessment,” inProc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025. [Online]. Available: https://arxiv.org/abs/2502.05356

  45. [45]

    The t05 system for the V oiceMOS Challenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech,

    K. Baba, W. Nakata, Y . Saito, and H. Saruwatari, “The t05 system for the V oiceMOS Challenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech,” inIEEE Spoken Language Technology Work- shop (SLT), 2024

  46. [46]

    Dnsmos p. 835: A non-intrusive perceptual objective speech quality metric to evalu- ate noise suppressors,

    C. K. Reddy, V . Gopal, and R. Cutler, “Dnsmos p. 835: A non-intrusive perceptual objective speech quality metric to evalu- ate noise suppressors,” inICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2022, pp. 886–890

  47. [47]

    From wer and ril to mer and wil: improved evaluation measures for connected speech recognition

    A. C. Morris, V . Maier, and P. D. Green, “From wer and ril to mer and wil: improved evaluation measures for connected speech recognition.” inInterspeech, no. 4-8, 2004, p. 2004. A. Web Interface and Scoring Criteria for Human Quality Assessment B. Data Copyright Statement The challenge dataset is prepared for academic research and evaluation purposes onl...