pith. sign in

arxiv: 2406.15809 · v5 · submitted 2024-06-22 · 💻 cs.CL · cs.LG

LaMSUM: Amplifying Voices Against Harassment through LLM Guided Extractive Summarization of User Incident Reports

Pith reviewed 2026-05-24 00:09 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords extractive summarizationlarge language modelssexual harassmentincident reportscode-mixed languagesmulti-level frameworkvoting methodscitizen reporting platforms
0
0 comments X

The pith

LaMSUM uses multi-level voting to make LLMs output extractive summaries from large collections of code-mixed harassment reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LaMSUM, a framework that processes high volumes of user reports on sexual harassment incidents posted to citizen platforms. Standard LLMs produce paraphrased abstractive summaries and cannot fit thousands of reports into one context window, so the method breaks the task into multiple summarization stages and applies voting across outputs to select direct excerpts instead. This produces extractive summaries that preserve original wording while covering the full collection. A reader would care because manual review of every report is impractical, and such summaries could reveal patterns that help authorities shape prevention policies. Evaluations on four LLMs show the approach beats prior extractive methods.

Core claim

LaMSUM is a novel multi-level framework combining summarization with different voting methods to generate extractive summaries for large collections of incident reports using LLMs. It addresses LLMs' tendency to produce abstractive outputs and their limited context windows when processing code-mixed languages. Extensive evaluation using four popular LLMs (Llama, Mistral, Claude and GPT-4o) demonstrates that LaMSUM outperforms state-of-the-art extractive summarization methods. The work is presented as one of the first attempts to achieve extractive summarization through LLMs.

What carries the argument

multi-level framework that pairs staged summarization with voting methods to steer LLMs toward selecting verbatim excerpts

If this is right

  • Stakeholders receive a single overview covering thousands of reports without reading each one.
  • Policy makers can identify recurring patterns in harassment incidents more efficiently.
  • The same LLM back-ends (Llama, Mistral, Claude, GPT-4o) produce higher-quality extractive summaries than prior dedicated extractive systems.
  • Code-mixed text common in real user reports is handled without language-specific preprocessing.
  • The framework offers an early route to extractive rather than abstractive output from LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The voting mechanism might be reusable in other LLM tasks where strict fidelity to source text is required, such as legal document extraction.
  • Adding more hierarchy levels could allow processing of even larger report sets that current context limits still block.
  • Testing the framework on complaint data from unrelated domains would reveal whether the multi-level voting pattern generalizes beyond harassment reports.

Load-bearing premise

Voting across staged LLM outputs can reliably force selection of original text excerpts rather than new paraphrased sentences when reports mix languages and exceed single context windows.

What would settle it

Generate summaries on a test collection of 50 reports and verify whether every sentence in each output appears as a contiguous verbatim substring in the input set, with zero added or reworded content.

Figures

Figures reproduced from arXiv: 2406.15809 by Abhijnan Chakraborty, Anurag Sharma, Garima Chhikara, Kripabandhu Ghosh, V. Gurucharan.

Figure 1
Figure 1. Figure 1: Current LLMs, by default, produce abstrac [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: LaMSUM: Multi-level framework for extractive summarization of large user-generated text. Input set T (level 0) is divided into ⌈ |T | s ⌉ chunks each of size s. From each chunk a summary is produced of size q (refer [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Textual units (e.g., posts) in the input chunk are [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Metric scores obtained through four different LLM setups. (i) Vanilla LLM without shuffling and voting method (ii) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Posts chosen by LaMSUM tend to be detailed and descriptive, offering a deeper level of information. Num￾ber of words in LaMSUM selected posts is often highest across various datasets, ensuring extensive and comprehen￾sive summarization. Models City A City B City C City D City E LexRank 7.486 5.819 7.637 7.548 7.481 SummBasic 8.198 5.734 8.050 8.020 8.196 LSA 8.251 7.061 8.481 8.194 8.387 BERT 8.068 7.762 8… view at source ↗
Figure 7
Figure 7. Figure 7: Venn diagram showing the overlap in the gold standard summaries obtained from three annotators (HS1, HS2 and [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Citizen reporting platforms help the public and authorities stay informed about sexual harassment incidents. However, the high volume of data shared on these platforms makes reviewing each individual case challenging. Therefore, a summarization algorithm capable of processing and understanding various code-mixed languages is essential. In recent years, Large Language Models (LLMs) have shown exceptional performance in NLP tasks, including summarization. LLMs inherently produce abstractive summaries by paraphrasing the original text, while the generation of extractive summaries - selecting specific subsets from the original text - through LLMs remains largely unexplored. Moreover, LLMs have a limited context window size, restricting the amount of data that can be processed at once. We tackle these challenges by introducing LaMSUM, a novel multi-level framework combining summarization with different voting methods to generate extractive summaries for large collections of incident reports using LLMs. Extensive evaluation using four popular LLMs (Llama, Mistral, Claude and GPT-4o) demonstrates that LaMSUM outperforms state-of-the-art extractive summarization methods. Overall, this work represents one of the first attempts to achieve extractive summarization through LLMs, and is likely to support stakeholders by offering a comprehensive overview and enabling them to develop effective policies to minimize incidents of unwarranted harassment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents LaMSUM, a novel multi-level framework that combines LLM summarization with voting methods to generate extractive summaries from large collections of code-mixed incident reports on sexual harassment. The central claim is that LaMSUM outperforms state-of-the-art extractive summarization methods, as demonstrated through extensive evaluation using four popular LLMs: Llama, Mistral, Claude, and GPT-4o. The work aims to address challenges of context window limits and the tendency of LLMs to produce abstractive rather than extractive summaries.

Significance. If the results hold and the outputs are verifiably extractive, this could be a significant contribution to applying LLMs for extractive summarization in challenging settings involving code-mixed languages and large document collections, potentially aiding stakeholders in understanding harassment patterns and developing policies. The use of multiple LLMs and voting methods is a promising direction for steering LLMs towards extractive outputs.

major comments (2)
  1. [Abstract] Abstract: the claim that LaMSUM 'outperforms state-of-the-art extractive summarization methods' is asserted without any quantitative metrics, baselines, dataset sizes, or evaluation protocol described, leaving the central empirical claim without visible supporting evidence.
  2. [LaMSUM framework description] LaMSUM framework description (multi-level chunking + voting): no post-hoc verification is described that the generated summaries remain strictly extractive (e.g., sentence-level overlap ratio, ROUGE-L computed only against source sentences, or an ablation disabling voting to measure extractiveness drop). LLMs default to abstractive output, so without such a check the reported gains could reflect improved abstractive content rather than the claimed extractive property, undermining direct comparison to classical extractive baselines that are extractive by construction.
minor comments (1)
  1. [Abstract] Abstract: the statement that this is 'one of the first attempts' to achieve extractive summarization through LLMs would benefit from explicit citations to any prior LLM-based extractive work for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that LaMSUM 'outperforms state-of-the-art extractive summarization methods' is asserted without any quantitative metrics, baselines, dataset sizes, or evaluation protocol described, leaving the central empirical claim without visible supporting evidence.

    Authors: We acknowledge that the abstract presents the performance claim at a high level. While the full quantitative results, baselines, dataset details, and evaluation protocol are provided in the Experiments and Results sections, we agree that including key supporting metrics in the abstract would make the central claim more self-contained. In the revised manuscript we will add a concise statement of the main ROUGE improvements, number of baselines, and dataset size to the abstract. revision: yes

  2. Referee: [LaMSUM framework description] LaMSUM framework description (multi-level chunking + voting): no post-hoc verification is described that the generated summaries remain strictly extractive (e.g., sentence-level overlap ratio, ROUGE-L computed only against source sentences, or an ablation disabling voting to measure extractiveness drop). LLMs default to abstractive output, so without such a check the reported gains could reflect improved abstractive content rather than the claimed extractive property, undermining direct comparison to classical extractive baselines that are extractive by construction.

    Authors: This concern is valid. Although the LaMSUM design (multi-level chunking followed by sentence-level voting) is intended to enforce extractiveness by selecting verbatim sentences from the source, the submitted manuscript does not include explicit post-hoc verification. We will add a dedicated subsection that reports (1) sentence-level overlap ratios, (2) ROUGE-L computed exclusively against source sentences, and (3) an ablation that disables the voting stage to quantify any drop in extractiveness. These additions will directly address the possibility of abstractive leakage and strengthen the comparison to classical extractive baselines. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical comparison is independent of inputs

full rationale

The paper presents an empirical framework (LaMSUM) for LLM-based extractive summarization and reports performance gains against external baselines using four named LLMs. No equations, fitted parameters, or self-citations are invoked as load-bearing premises that reduce the central claim to a tautology or prior author result. The evaluation is described as direct comparison on held-out incident reports; the extractiveness property is asserted via the multi-level voting design rather than being defined in terms of the measured outcomes. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the untested assumption that the new framework can force extractive behavior from LLMs. No free parameters are mentioned. One domain assumption and one invented entity are introduced.

axioms (1)
  • domain assumption LLMs can be guided via multi-level processing and voting to output extractive rather than abstractive summaries despite limited context windows
    Invoked to justify the need for and design of the LaMSUM framework.
invented entities (1)
  • LaMSUM framework no independent evidence
    purpose: Enable extractive summarization of large code-mixed report collections using LLMs
    Newly proposed combination of multi-level summarization and voting methods.

pith-pipeline@v0.9.0 · 5784 in / 1338 out tokens · 30386 ms · 2026-05-24T00:09:10.239138+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 2 internal anchors

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Bhattacharya, P.; Poddar, S.; Rudra, K.; Ghosh, K.; and Ghosh, S. 2021. Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

  4. [4]

    Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D. 2016. Handbook of computational social choice. Cambridge University Press

  5. [5]

    Bra z inskas, A.; Lapata, M.; and Titov, I. 2020. Few-Shot Learning for Opinion Summarization. In EMNLP

  6. [6]

    Brown, H.; and Shokri, R. 2023. How (Un)Fair is Text Summarization?

  7. [7]

    Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.;...

  8. [8]

    Buchholz, K. 2024. The Countries That Are Safe & Unsafe for Women

  9. [9]

    Chang, Y.; Lo, K.; Goyal, T.; and Iyyer, M. 2024. BooookScore: A systematic exploration of book-length summarization in the era of LLMs. In ICLR

  10. [10]

    S.; Yang, Q.; and Xie, X

    Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; Ye, W.; Zhang, Y.; Chang, Y.; Yu, P. S.; Yang, Q.; and Xie, X. 2023. A Survey on Evaluation of Large Language Models

  11. [11]

    Davidson, T.; Warmsley, D.; Macy, M.; and Weber, I. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. ICWSM

  12. [12]

    Dublish, N. 2020. All about the Hathras Case

  13. [13]

    ElSherief, M.; Belding, E.; and Nguyen, D. 2017. \#NotOkay: Understanding Gender-Based Violence in Social Media. ICWSM

  14. [14]

    Emerson, P. 2013. The original Borda count and partial voting. Social Choice and Welfare

  15. [15]

    Erkan, G.; and Radev, D. R. 2004. LexRank: Graph-based Lexical Centrality As Salience in Text Summarization. Journal of Artificial Intelligence Research

  16. [16]

    Ghosh Chowdhury, A.; Sawhney, R.; Mathur, P.; Mahata, D.; and Ratn Shah, R. 2019. Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment. In NAACL

  17. [17]

    Gong, Y.; and Liu, X. 2001. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In ACM SIGIR

  18. [18]

    J.; and Durrett, G

    Goyal, T.; Li, J. J.; and Durrett, G. 2023. News Summarization and Evaluation in the Era of GPT-3. arXiv:2209.12356

  19. [19]

    T.; Karmaker Santu, S

    Hassan, N.; Poudel, A.; Hale, J.; Hubacek, C.; Huq, K. T.; Karmaker Santu, S. K.; and Ahmed, S. I. 2020. Towards Automated Sexual Violence Report Tracking. ICWSM

  20. [20]

    Jia, R.; Cao, Y.; Tang, H.; Fang, F.; Cao, C.; and Wang, S. 2020. Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network. In EMNLP

  21. [21]

    Mixtral of Experts

    Jiang, A. Q.; and et al. 2024. Mixtral of Experts. arXiv:2401.04088

  22. [22]

    Jin, H.; Han, X.; Yang, J.; Jiang, Z.; Liu, Z.; Chang, C.-Y.; Chen, H.; and Hu, X. 2024 a . LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning. In ICML

  23. [23]

    Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; and Tan, J. 2024 b . A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv:2403.02901

  24. [24]

    Jost, L. 2006. Entropy and diversity. Oikos

  25. [25]

    Jung, T.; Kang, D.; Mentch, L.; and Hovy, E. 2019. Earlier Isn ' t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization. In EMNLP

  26. [26]

    Kanwal, N.; and Rizzo, G. 2022. Attention-based clinical note summarization. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

  27. [27]

    J.; and De Choudhury, M

    Kim, S.; Razi, A.; Alsoubai, A.; Wisniewski, P. J.; and De Choudhury, M. 2024. Assessing the Impact of Online Harassment on Youth Mental Health in Private Networked Spaces. ICWSM

  28. [28]

    Kopackova, H.; and Libalova, P. 2019. Citizen reporting as the form of e-participation in smart cities. In Iberian Conference on Information Systems and Technologies (CISTI). IEEE

  29. [29]

    Laban, P.; Kryscinski, W.; Agarwal, D.; Fabbri, A.; Xiong, C.; Joty, S.; and Wu, C.-S. 2023. S umm E dits: Measuring LLM Ability at Factual Reasoning Through The Lens of Summarization. In EMNLP

  30. [30]

    Lackner, M.; Regner, P.; and Krenn, B. 2023. abcvoting: A P ython package for approval-based multi-winner voting rules. Journal of Open Source Software

  31. [31]

    Laskar, M. T. R.; Bari, M. S.; Rahman, M.; Bhuiyan, M. A. H.; Joty, S.; and Huang, J. 2023. A Systematic Study and Comprehensive Evaluation of C hat GPT on Benchmark Datasets. In Findings of the Association for Computational Linguistics: ACL 2023

  32. [32]

    Lin, C.-Y. 2004. ROUGE : A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out

  33. [33]

    Liu, Y.; and Lapata, M. 2019. Text Summarization with Pretrained Encoders. In EMNLP

  34. [34]

    Liu, Y.; Shi, K.; He, K.; Ye, L.; Fabbri, A.; Liu, P.; Radev, D.; and Cohan, A. 2024. On Learning to Summarize with Large Language Models as References. In NAACL

  35. [35]

    Luo, Z.; Xie, Q.; and Ananiadou, S. 2023. ChatGPT as a Factual Inconsistency Evaluator for Text Summarization. arXiv:2303.15621

  36. [36]

    K.; Goyal, P.; and Mukherjee, A

    Mathew, B.; Saha, P.; Tharad, H.; Rajgaria, S.; Singhania, P.; Maity, S. K.; Goyal, P.; and Mukherjee, A. 2019. Thou Shalt Not Hate: Countering Online Hate Speech. ICWSM

  37. [37]

    Miller, D. 2019. Leveraging BERT for Extractive Text Summarization on Lectures. arXiv:1906.04165

  38. [38]

    Mudambi, R.; Navarra, P.; and Nicosia, C. 1996. Plurality versus Proportional Representation: An Analysis of Sicilian Elections. Public Choice

  39. [39]

    C.; Vishnu, U.; Goyal, P.; Bhattacharya, S.; and Ganguly, N

    Mukherjee, R.; Peruri, H. C.; Vishnu, U.; Goyal, P.; Bhattacharya, S.; and Ganguly, N. 2020. Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews. In SIGIR

  40. [40]

    Nenkova, A.; and Vanderwende, L. 2005. The impact of frequency on summarization. Technical report, Microsoft Research

  41. [41]

    Olteanu, A.; Castillo, C.; Boy, J.; and Varshney, K. 2018. The Effect of Extremist Violence on Hateful Speech Online. ICWSM

  42. [42]

    OpenAI. 2024. GPT-4o mini: advancing cost-efficient intelligence

  43. [43]

    F.; Leike, J.; and Lowe, R

    Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; Schulman, J.; Hilton, J.; Kelton, F.; Miller, L.; Simens, M.; Askell, A.; Welinder, P.; Christiano, P. F.; Leike, J.; and Lowe, R. 2022. Training language models to follow instructions with human feedback. In NIPS

  44. [44]

    Park, H.; and Lee, J. 2021. Designing a Conversational Agent for Sexual Assault Survivors: Defining Burden of Self-Disclosure and Envisioning Survivor-Centered Solutions. In CHI

  45. [45]

    Pu, X.; Gao, M.; and Wan, X. 2023. Summarization is (Almost) Dead. arXiv:2309.09558

  46. [46]

    Ristad, E.; and Yianilos, P. 1998. Learning string-edit distance. IEEE Transactions on PAML

  47. [47]

    They Don't Leave Us Alone Anywhere We Go

    Sambasivan, N.; Batool, A.; Ahmed, N.; Matthews, T.; Thomas, K.; Gayt\' a n-Lugo, L. S.; Nemer, D.; Bursztein, E.; Churchill, E.; and Consolvo, S. 2019. "They Don't Leave Us Alone Anywhere We Go": Gender and Digital Abuse in South Asia. In CHI

  48. [48]

    K.; and Shah, R

    Sawhney, R.; Mathur, P.; Jain, T.; Gautam, A. K.; and Shah, R. R. 2021. Multitask Learning for Emotionally Analyzing Sexual Abuse Disclosures. In NAACL

  49. [49]

    Shin, B.; Floch, J.; Rask, M.; B ck, P.; Edgar, C.; Berditchevskaia, A.; Mesure, P.; and Branlat, M. 2024. A systematic analysis of digital tools for citizen participation. Government Information Quarterly

  50. [50]

    Stoop, W.; Kunneman, F.; van den Bosch, A.; and Miller, B. 2019. Detecting harassment in real-time as conversations develop. In Workshop on Abusive Language Online

  51. [51]

    F.; Moitra, A.; Amin, M

    Sultana, S.; Deb, M.; Bhattacharjee, A.; Hasan, S.; Alam, S.; Chakraborty, T.; Roy, P.; Ahmed, S. F.; Moitra, A.; Amin, M. A.; Islam, A. N.; and Ahmed, S. I. 2021. ‘Unmochon’: A Tool to Combat Online Sexual Harassment over Facebook Messenger. In CHI

  52. [52]

    Tam, D.; Mascarenhas, A.; Zhang, S.; Kwan, S.; Bansal, M.; and Raffel, C. 2023. Evaluating the Factual Consistency of Large Language Models Through News Summarization. In ACL

  53. [53]

    W.; Burnsky, J.; Vincent, J

    Tang, L.; Shalyminov, I.; mei Wong, A. W.; Burnsky, J.; Vincent, J. W.; Yang, Y.; Singh, S.; Feng, S.; Song, H.; Su, H.; Sun, L.; Zhang, Y.; Mansour, S.; and McKeown, K. 2024. TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization. arXiv:2402.13249

  54. [54]

    G.; Soroush, A.; Elias, P

    Tang, L.; Sun, Z.; Idnay, B.; Nestor, J. G.; Soroush, A.; Elias, P. A.; Xu, Z.; Ding, Y.; Durrett, G.; Rousseau, J.; Weng, C.; and Peng, Y. 2023 a . Evaluating large language models on medical evidence summarization. medRxiv

  55. [55]

    Tang, Y.; Puduppully, R.; Liu, Z.; and Chen, N. 2023 b . In-context Learning of Large Language Models for Controlled Dialogue Summarization: A Holistic Benchmark and Empirical Analysis. In NewSumm Workshop

  56. [56]

    It’s common and a part of being a content creator

    Thomas, K.; Kelley, P. G.; Consolvo, S.; Samermit, P.; and Bursztein, E. 2022. “It’s common and a part of being a content creator”: Understanding How Creators Experience and Cope with Hate and Harassment Online. In CHI

  57. [57]

    Times, T. E. 2024. Kolkata doctor rape-murder case: RG Kar, the campus was victim's `second home'

  58. [58]

    Today, I. 2020. Nirbhaya case: From December 16, 2012 to March 20, 2020 | A timeline

  59. [59]

    Touvron, H.; and et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models

  60. [60]

    Upadhayay, B.; Lodhia, Z.; and Behzadan, V. 2021. Combating Human Trafficking via Automatic OSINT Collection, Validation and Fusion. In ICWSM Workshop

  61. [61]

    Venkatasubramanian, K.; Skorinko, J. L. M.; Kobeissi, M.; Lewis, B.; Jutras, N.; Bosma, P.; Mullaly, J.; Kelly, B.; Lloyd, D.; Freark, M.; and Alterio, N. A. 2021. Exploring A Reporting Tool to Empower Individuals with Intellectual and Developmental Disabilities to Self-Report Abuse. In CHI

  62. [62]

    Worledge, T.; Hashimoto, T.; and Guestrin, C. 2024. The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations. arXiv:2411.17375

  63. [63]

    Wu, Y.; Iso, H.; Pezeshkpour, P.; Bhutani, N.; and Hruschka, E. 2024. Less is More for Long Document Summary Evaluation by LLM s. In EACL

  64. [64]

    Xu, J.; Gan, Z.; Cheng, Y.; and Liu, J. 2020. Discourse-Aware Neural Extractive Text Summarization. In ACL

  65. [65]

    Yang, X.; Li, Y.; Zhang, X.; Chen, H.; and Cheng, W. 2023. Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization. arXiv:2302.08081

  66. [66]

    Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; and Le, Q. V. 2019. XLNet: generalized autoregressive pretraining for language understanding. In NeurIPS

  67. [67]

    Zhang, H.; Liu, X.; and Zhang, J. 2022. HEGEL : Hypergraph Transformer for Long Document Summarization. In EMNLP

  68. [68]

    Zhang, H.; Liu, X.; and Zhang, J. 2023 a . D iffu S um: Generation Enhanced Extractive Summarization with Diffusion. In ACL

  69. [69]

    Zhang, H.; Liu, X.; and Zhang, J. 2023 b . Extractive Summarization via C hat GPT for Faithful Summary Generation. In EMNLP

  70. [70]

    Zhang, H.; Liu, X.; and Zhang, J. 2023 c . S umm I t: Iterative Text Summarization via C hat GPT . In EMNLP

  71. [71]

    Zhang, T.; Ladhak, F.; Durmus, E.; Liang, P.; McKeown, K.; and Hashimoto, T. B. 2024. Benchmarking Large Language Models for News Summarization . ACL Transactions

  72. [72]

    Zhao, W. X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; Du, Y.; Yang, C.; Chen, Y.; Chen, Z.; Jiang, J.; Ren, R.; Li, Y.; Tang, X.; Liu, Z.; Liu, P.; Nie, J.-Y.; and Wen, J.-R. 2023. A Survey of Large Language Models

  73. [73]

    Zhong, M.; Liu, P.; Chen, Y.; Wang, D.; Qiu, X.; and Huang, X. 2020. Extractive Summarization as Text Matching. In ACL

  74. [74]

    Ziems, C.; Vigfusson, Y.; and Morstatter, F. 2020. Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification. ICWSM