When Misinformation Speaks and Converses: Rethinking Fact-Checking in Audio Platforms

Chaewan Chun; Delvin Ce Zhang; Dongwon Lee

arxiv: 2604.16767 · v1 · submitted 2026-04-18 · 💻 cs.CL · cs.CY

When Misinformation Speaks and Converses: Rethinking Fact-Checking in Audio Platforms

Chaewan Chun , Delvin Ce Zhang , Dongwon Lee This is my paper

Pith reviewed 2026-05-10 07:38 UTC · model grok-4.3

classification 💻 cs.CL cs.CY

keywords audio misinformationfact-checkingspoken mediaconversational structureprosodypodcastsvoice notesverification pipelines

0 comments

The pith

Audio misinformation carries persuasive force through speech patterns and conversation turns that text-based fact-checking misses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that audio platforms have become major vectors for misinformation through podcasts, radio, voice notes, and streams, yet current verification systems treat spoken content as if it were written text. It establishes that spoken misinformation gains impact from prosody, pacing, and emotion while conversational formats spread across speakers and episodes, creating verification hurdles absent in static text. A sympathetic reader would care because millions of listeners encounter claims in these formats daily, and overlooking the spoken-conversational structure leaves large portions of public discourse unchecked. The position paper reviews evidence from multiple platforms and modalities to show why existing pipelines fall short and calls for redesigning them around audio realities.

Core claim

Audio misinformation is structurally different from textual claims because it is both spoken, conveying persuasive force through prosody, pacing, and emotion, and conversational, unfolding across turns, speakers, and episodes; these properties introduce verification difficulties that traditional text-focused methods rarely encounter, requiring fact-checking pipelines to be rethought around the spoken and conversational nature of audio.

What carries the argument

The dual properties of spoken delivery (prosody, pacing, emotion) and conversational unfolding (turns, speakers, episodes) that distinguish audio misinformation from text and create unique verification challenges.

If this is right

Traditional pipelines miss the persuasive elements carried by voice and dialogue structure in audio content.
Verification becomes harder because claims evolve across conversational turns rather than appearing as fixed statements.
Audio platforms require new methods that account for both spoken delivery and multi-speaker dynamics.
Synthesizing evidence across modalities reveals consistent gaps in current approaches to audio misinformation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Audio-specific detection tools would need to process intonation and speaker shifts directly rather than relying solely on transcripts.
Platforms hosting live streams or voice notes may need real-time conversational analysis to flag evolving claims before they spread.
Future datasets for training fact-checkers should include paired audio and full dialogue context instead of isolated text excerpts.

Load-bearing premise

Existing fact-checking pipelines are mostly designed for written claims and overlook the unique properties of spoken media.

What would settle it

A controlled comparison showing that transcript-only fact-checking achieves the same accuracy and coverage on audio misinformation as specialized audio-aware methods would undermine the claim of structural difference.

Figures

Figures reproduced from arXiv: 2604.16767 by Chaewan Chun, Delvin Ce Zhang, Dongwon Lee.

**Figure 2.** Figure 2: Illustration of claim detection and verification [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Audio platforms have evolved beyond entertainment. They have become central to public discourse, from podcasts and radio to WhatsApp voice notes and live streams. With millions of shows and hundreds of millions of listeners, audio platforms are now a major channel for misinformation. Yet existing fact-checking pipelines are mostly designed for written claims, overlooking the unique properties of spoken media. We argue that audio misinformation is not merely textual content with transcripts: it is structurally different because it is both spoken - carrying persuasive force through prosody, pacing, and emotion - and conversational - unfolding across turns, speakers, and episodes. These dual properties introduce verification difficulties that traditional methods rarely face. This position paper synthesizes evidence across modalities and platforms, examines datasets and methods, and highlights why existing pipelines fail on audio. We argue that advancing fact-checking requires rethinking verification pipelines around the spoken and conversational realities of audio.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Audio misinformation needs dedicated handling for spoken and conversational traits, but the paper does not demonstrate that existing pipelines actually fail on them.

read the letter

This paper's main message is that audio platforms are becoming big vectors for misinformation and that current fact-checking, built around text, misses the spoken and interactive elements that make audio distinct. That's the punchline worth knowing up front. They do well in laying out the landscape: audio from podcasts to voice notes reaches hundreds of millions, and they pull evidence from various studies on how prosody and pacing add persuasive power while conversations across episodes can layer claims in complex ways. The synthesis of existing work on multimodal misinformation and the call to rethink pipelines around these realities gives a clear direction for future work. Credit to them for focusing on a practical gap as audio grows. Where it gets soft is on the evidence that these properties create verification problems traditional methods can't handle. The stress-test concern holds up here—the paper assumes prosody and conversation structure alter factual assessment in ways transcripts miss, but they mostly affect how claims spread or are believed, not whether they can be checked against evidence. No specific cases are given where audio features cause pipeline failures that can't be mitigated by better transcription or additional context. Being a position paper means no new datasets, experiments, or derivations, so the argument rests on prior literature without testing the size of the gap. This work is for people studying misinformation in emerging media or designing tools for platforms. Readers interested in high-level framing and open problems will find value, while those seeking technical advances or quantitative results might not. It deserves a serious referee because it identifies a timely issue with potential impact on how we handle public information. The authors engage honestly with the literature even if the conclusions need more grounding. Send it to peer review.

Referee Report

3 major / 2 minor

Summary. This position paper argues that audio misinformation differs structurally from textual forms because it is spoken (carrying persuasive force via prosody, pacing, and emotion) and conversational (unfolding across turns, speakers, and episodes). It claims that these properties create verification difficulties that existing fact-checking pipelines, designed primarily for written claims, fail to address. The work synthesizes evidence across modalities and platforms, reviews datasets and methods, and calls for rethinking verification pipelines around audio-specific realities.

Significance. If the argument is substantiated, the paper identifies an important gap in misinformation research and fact-checking, potentially spurring development of multimodal tools tailored to podcasts, voice notes, and live streams. As a synthesis rather than an empirical study, its value lies in framing the spoken and conversational dimensions as load-bearing for verification, which could guide future work if supported by clearer evidence of unique failures.

major comments (3)

[Abstract / Introduction] Abstract and introduction: The central claim that spoken properties (prosody, pacing, emotion) introduce verification difficulties beyond text assumes these features alter factual accuracy assessment rather than primarily affecting persuasion or spread. The manuscript should explicitly distinguish these and provide cases where audio-only cues change the underlying propositional truth value or block extraction in ways transcript methods cannot mitigate.
[Datasets and methods review] Section examining datasets and methods: The synthesis does not isolate concrete instances where conversational structure (multi-turn speaker dynamics) causes transcript-based pipelines to fail verification in a manner not already addressable by existing dialogue or thread-based text methods. Without such examples, the claim that audio is 'structurally different' risks overgeneralization.
[Synthesis of evidence] The argument that existing pipelines 'fail on audio' would be strengthened by citing specific current systems or benchmarks and demonstrating their breakdown on audio features, rather than relying on the general observation that most are text-designed.

minor comments (2)

[Abstract] The abstract could more explicitly frame the paper as a position piece and outline its contributions (synthesis, gap identification, call to action) to help readers set expectations.
[Introduction] Terminology such as 'verification difficulties' and 'persuasive force' could be defined more precisely early on to avoid conflation between detection, verification, and impact.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for these insightful comments, which help us refine the distinctions and evidence in our position paper. We address each major comment point by point below, with plans to revise the manuscript for greater clarity and specificity while preserving its synthetic nature.

read point-by-point responses

Referee: [Abstract / Introduction] Abstract and introduction: The central claim that spoken properties (prosody, pacing, emotion) introduce verification difficulties beyond text assumes these features alter factual accuracy assessment rather than primarily affecting persuasion or spread. The manuscript should explicitly distinguish these and provide cases where audio-only cues change the underlying propositional truth value or block extraction in ways transcript methods cannot mitigate.

Authors: We agree that an explicit distinction between effects on persuasion/spread and on verification is necessary. In revision, we will update the abstract and introduction to clarify that prosody, pacing, and emotion primarily amplify persuasion and virality but can also impede factual verification by introducing ambiguity or altering the effective claim (e.g., vocal sarcasm or emphasis that reverses literal meaning). We will add concrete cases, such as podcast statements where tone indicates irony or hedging not captured in transcripts, thereby blocking accurate propositional extraction even when text-based methods are applied to the words alone. These examples illustrate how audio cues affect what counts as the verifiable claim without always changing an abstract truth value. revision: yes
Referee: [Datasets and methods review] Section examining datasets and methods: The synthesis does not isolate concrete instances where conversational structure (multi-turn speaker dynamics) causes transcript-based pipelines to fail verification in a manner not already addressable by existing dialogue or thread-based text methods. Without such examples, the claim that audio is 'structurally different' risks overgeneralization.

Authors: We acknowledge the value of isolating concrete instances to avoid overgeneralization. Although this is a position paper, we will revise the datasets and methods section to draw on existing literature for specific examples. These include multi-turn podcast exchanges where implicit cross-references (e.g., endorsements or refutations signaled by prosody across speakers and episodes) create verification failures; standard dialogue or thread-based text methods often miss the audio layer of intent or emphasis, leading to incorrect claim isolation. We will cite relevant work on conversational fact-checking to show where audio-specific dynamics exceed what text pipelines currently mitigate. revision: partial
Referee: [Synthesis of evidence] The argument that existing pipelines 'fail on audio' would be strengthened by citing specific current systems or benchmarks and demonstrating their breakdown on audio features, rather than relying on the general observation that most are text-designed.

Authors: We agree that naming specific systems strengthens the synthesis. In revision, we will cite concrete text-centric benchmarks and pipelines (such as those built on FEVER-style claim verification or social media thread checkers) and discuss their documented limitations when applied to transcribed audio, including loss of overlapping speech, emotional valence affecting claim boundaries, and long-range conversational context. While we cannot run new empirical tests in this position paper, we will reference studies showing performance degradation on audio-derived data and use these to illustrate structural breakdowns rather than generic text-design observations. revision: yes

Circularity Check

0 steps flagged

No circularity: argumentative position paper with no derivations or reductions

full rationale

This position paper is an argumentative synthesis of evidence across modalities and platforms, with no equations, fitted parameters, mathematical derivations, or self-referential loops. The central claims rest on stated observations about spoken and conversational properties of audio misinformation rather than any reduction to prior results by the same authors. No load-bearing steps exist that could be circular by construction, self-definition, or imported uniqueness. The paper is self-contained as a call to rethink pipelines and does not rely on unverified self-citations for its premises.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper rests on the domain assumption that audio carries unique persuasive and structural elements beyond text transcripts, but introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5449 in / 1015 out tokens · 43645 ms · 2026-05-10T07:38:43.250207+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

um" to "yeah

Advancing automated deception detection: A multimodal approach to feature extraction and anal- ysis. InInternational Conference on Intelligent Sys- tems, Blockchain, and Communication Technologies, pages 727–738. Springer. Max Bain, Jaesung Huh, Tengda Han, and Andrew Zis- serman. 2023. Whisperx: Time-accurate speech tran- scription of long-form audio.INT...

work page arXiv 2023
[2]

Eric Chamoun, Marzieh Saeidi, and Andreas Vlachos

WhatsApp and audio misinformation during the Covid-19 pandemic.El Profesional de la infor- mación, page e310321. Eric Chamoun, Marzieh Saeidi, and Andreas Vlachos

work page
[3]

Association for Computational Linguistics

Automated fact-checking in dialogue: Are spe- cialized models needed? InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 16009–16020, Singapore. Association for Computational Linguistics. Chaewan Chun, Lysandre Terrisse, Delvin Ce Zhang, and Dongwon Lee. 2025. Mad: A benchmark for multi-turn audio dialogue fa...

work page 2023
[4]

Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Maria Eskevich, Gareth Jones, Jussi Karlgren, Ben Carterette, and Rosie Jones

Context-aware multimodal claim verification in spoken dialogues.The Pennsylvania State Univer- sity Technical Report. Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Maria Eskevich, Gareth Jones, Jussi Karlgren, Ben Carterette, and Rosie Jones. 2020. 100,000 Podcasts: A Spoken En- glish Document Corpus. InProceedings o...

work page arXiv 2020
[5]

InProceedings of Interspeech 2023, pages 4059–

Md3: The multi-dialect dataset of dialogues. InProceedings of Interspeech 2023, pages 4059–

work page 2023
[6]

Azza El-Masri, Martin J

ISCA. Azza El-Masri, Martin J. Riedl, and Samuel Woolley

work page
[7]

Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku, and Dilek Hakkani-Tur

Audio misinformation on WhatsApp: A case study from Lebanon.Harvard Kennedy School Mis- information Review. Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku, and Dilek Hakkani-Tur. 2020. Mul- tiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracki...

work page 2020
[8]

InProceedings of the Neural Information Pro- cessing Systems Track on Datasets and Benchmarks, volume 1

The people’s speech: A large-scale diverse english speech recognition dataset for commercial usage. InProceedings of the Neural Information Pro- cessing Systems Track on Datasets and Benchmarks, volume 1. Revanth Gangi Reddy, Sai Chetan Chinthakindi, Zhen- hailong Wang, Yi Fung, Kathryn Conger, Ahmed EL- sayed, Martha Palmer, Preslav Nakov, Eduard Hovy, K...

work page 2022
[9]

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Alek- sander Wawer

Viclaim: A multilingual multilabel dataset for automatic claim detection in videos.Preprint, arXiv:2504.12882. Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Alek- sander Wawer. 2019. SAMSum corpus: A human- annotated dialogue dataset for abstractive summa- rization. InProceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Ko...

work page arXiv 2019
[10]

Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations. InProc. Interspeech 2019, pages 1891–1895. Raymond Grossman, Taejin Park, Kunal Dhawan, An- drew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, and Boris Ginsburg. 2025. SPGISpeech 2.0: Transcribed multi-speaker finan- cial audio for speaker-tagged transcription. InI...

work page arXiv 2019
[11]

In2016 IEEE 16th International Conference on Data Min- ing Workshops (ICDMW), pages 938–943, Barcelona, Spain

The Truth and Nothing But the Truth: Mul- timodal Analysis for Deception Detection. In2016 IEEE 16th International Conference on Data Min- ing Workshops (ICDMW), pages 938–943, Barcelona, Spain. IEEE. Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. Claim- Rank: Detecting check-worthy claims in Arabic and Englis...

work page arXiv 2018
[12]

Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jian- hua Tao, Xuefei Liu, and Guanjun Li

Mapping the podcast ecosystem with the structured podcast research corpus.Preprint, arXiv:2411.07892. Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jian- hua Tao, Xuefei Liu, and Guanjun Li. 2024. Explor- ing the role of audio in multimodal misinformation detection. In2024 IEEE 14th International Sympo- sium on Chinese Spoken Language Processing (ISC- SLP...

work page arXiv 2024
[13]

K., Lavrukhin, V ., Majumdar, S., Noroozi, V ., Zhang, Y ., Kuchaiev, O., Balam, J., Dovzhenko, Y ., Frey- berg, K., Shulman, M

Fighting fire with fire: The dual role of LLMs in crafting and detecting elusive disinformation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14279–14305, Singapore. Association for Compu- tational Linguistics. Huanhuan Ma, Weizhi Xu, Yifan Wei, Liuji Chen, Liang Wang, Qiang Liu, Shu Wu, and Liang Wang. ...

work page arXiv 2023
[14]

Temporal misalignment attacks against multimodal perception in autonomous driving.arXiv preprintarXiv:2507.09095, 2025

Who is speaking? speaker-aware multiparty dialogue act classification. InFindings of the As- sociation for Computational Linguistics: EMNLP 2023, pages 10122–10135, Singapore. Association for Computational Linguistics. Libo Qin, Tianbao Xie, Shijue Huang, Qiguang Chen, Xiao Xu, and Wanxiang Che. 2021. Don’t be contra- dicted with anything! CI-ToD: Towards...

work page arXiv 2023
[15]

one of us

Rationale-enhanced language models are bet- ter continual relation learners. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15489–15497, Singa- pore. Association for Computational Linguistics. Longqi Yang, Yu Wang, Drew Dunne, Michael Sobolev, Mor Naaman, and Deborah Estrin. 2019. More Than Just Words: Mode...

work page 2023
[16]

InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online

MediaSum: A large-scale media interview dataset for dialogue summarization. InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online. Association for Computational Linguistics. Arkaitz Zubiaga, Maria Liakata, Rob Procter, Kalina Bontcheva, an...

work page arXiv 2021

[1] [1]

um" to "yeah

Advancing automated deception detection: A multimodal approach to feature extraction and anal- ysis. InInternational Conference on Intelligent Sys- tems, Blockchain, and Communication Technologies, pages 727–738. Springer. Max Bain, Jaesung Huh, Tengda Han, and Andrew Zis- serman. 2023. Whisperx: Time-accurate speech tran- scription of long-form audio.INT...

work page arXiv 2023

[2] [2]

Eric Chamoun, Marzieh Saeidi, and Andreas Vlachos

WhatsApp and audio misinformation during the Covid-19 pandemic.El Profesional de la infor- mación, page e310321. Eric Chamoun, Marzieh Saeidi, and Andreas Vlachos

work page

[3] [3]

Association for Computational Linguistics

Automated fact-checking in dialogue: Are spe- cialized models needed? InProceedings of the 2023 Conference on Empirical Methods in Natural Lan- guage Processing, pages 16009–16020, Singapore. Association for Computational Linguistics. Chaewan Chun, Lysandre Terrisse, Delvin Ce Zhang, and Dongwon Lee. 2025. Mad: A benchmark for multi-turn audio dialogue fa...

work page 2023

[4] [4]

Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Maria Eskevich, Gareth Jones, Jussi Karlgren, Ben Carterette, and Rosie Jones

Context-aware multimodal claim verification in spoken dialogues.The Pennsylvania State Univer- sity Technical Report. Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Maria Eskevich, Gareth Jones, Jussi Karlgren, Ben Carterette, and Rosie Jones. 2020. 100,000 Podcasts: A Spoken En- glish Document Corpus. InProceedings o...

work page arXiv 2020

[5] [5]

InProceedings of Interspeech 2023, pages 4059–

Md3: The multi-dialect dataset of dialogues. InProceedings of Interspeech 2023, pages 4059–

work page 2023

[6] [6]

Azza El-Masri, Martin J

ISCA. Azza El-Masri, Martin J. Riedl, and Samuel Woolley

work page

[7] [7]

Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku, and Dilek Hakkani-Tur

Audio misinformation on WhatsApp: A case study from Lebanon.Harvard Kennedy School Mis- information Review. Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku, and Dilek Hakkani-Tur. 2020. Mul- tiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracki...

work page 2020

[8] [8]

InProceedings of the Neural Information Pro- cessing Systems Track on Datasets and Benchmarks, volume 1

The people’s speech: A large-scale diverse english speech recognition dataset for commercial usage. InProceedings of the Neural Information Pro- cessing Systems Track on Datasets and Benchmarks, volume 1. Revanth Gangi Reddy, Sai Chetan Chinthakindi, Zhen- hailong Wang, Yi Fung, Kathryn Conger, Ahmed EL- sayed, Martha Palmer, Preslav Nakov, Eduard Hovy, K...

work page 2022

[9] [9]

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Alek- sander Wawer

Viclaim: A multilingual multilabel dataset for automatic claim detection in videos.Preprint, arXiv:2504.12882. Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Alek- sander Wawer. 2019. SAMSum corpus: A human- annotated dialogue dataset for abstractive summa- rization. InProceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Ko...

work page arXiv 2019

[10] [10]

Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations. InProc. Interspeech 2019, pages 1891–1895. Raymond Grossman, Taejin Park, Kunal Dhawan, An- drew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, and Boris Ginsburg. 2025. SPGISpeech 2.0: Transcribed multi-speaker finan- cial audio for speaker-tagged transcription. InI...

work page arXiv 2019

[11] [11]

In2016 IEEE 16th International Conference on Data Min- ing Workshops (ICDMW), pages 938–943, Barcelona, Spain

The Truth and Nothing But the Truth: Mul- timodal Analysis for Deception Detection. In2016 IEEE 16th International Conference on Data Min- ing Workshops (ICDMW), pages 938–943, Barcelona, Spain. IEEE. Israa Jaradat, Pepa Gencheva, Alberto Barrón-Cedeño, Lluís Màrquez, and Preslav Nakov. 2018. Claim- Rank: Detecting check-worthy claims in Arabic and Englis...

work page arXiv 2018

[12] [12]

Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jian- hua Tao, Xuefei Liu, and Guanjun Li

Mapping the podcast ecosystem with the structured podcast research corpus.Preprint, arXiv:2411.07892. Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jian- hua Tao, Xuefei Liu, and Guanjun Li. 2024. Explor- ing the role of audio in multimodal misinformation detection. In2024 IEEE 14th International Sympo- sium on Chinese Spoken Language Processing (ISC- SLP...

work page arXiv 2024

[13] [13]

K., Lavrukhin, V ., Majumdar, S., Noroozi, V ., Zhang, Y ., Kuchaiev, O., Balam, J., Dovzhenko, Y ., Frey- berg, K., Shulman, M

Fighting fire with fire: The dual role of LLMs in crafting and detecting elusive disinformation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14279–14305, Singapore. Association for Compu- tational Linguistics. Huanhuan Ma, Weizhi Xu, Yifan Wei, Liuji Chen, Liang Wang, Qiang Liu, Shu Wu, and Liang Wang. ...

work page arXiv 2023

[14] [14]

Temporal misalignment attacks against multimodal perception in autonomous driving.arXiv preprintarXiv:2507.09095, 2025

Who is speaking? speaker-aware multiparty dialogue act classification. InFindings of the As- sociation for Computational Linguistics: EMNLP 2023, pages 10122–10135, Singapore. Association for Computational Linguistics. Libo Qin, Tianbao Xie, Shijue Huang, Qiguang Chen, Xiao Xu, and Wanxiang Che. 2021. Don’t be contra- dicted with anything! CI-ToD: Towards...

work page arXiv 2023

[15] [15]

one of us

Rationale-enhanced language models are bet- ter continual relation learners. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15489–15497, Singa- pore. Association for Computational Linguistics. Longqi Yang, Yu Wang, Drew Dunne, Michael Sobolev, Mor Naaman, and Deborah Estrin. 2019. More Than Just Words: Mode...

work page 2023

[16] [16]

InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online

MediaSum: A large-scale media interview dataset for dialogue summarization. InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online. Association for Computational Linguistics. Arkaitz Zubiaga, Maria Liakata, Rob Procter, Kalina Bontcheva, an...

work page arXiv 2021