LaMSUM: Amplifying Voices Against Harassment through LLM Guided Extractive Summarization of User Incident Reports

Abhijnan Chakraborty; Anurag Sharma; Garima Chhikara; Kripabandhu Ghosh; V. Gurucharan

arxiv: 2406.15809 · v5 · submitted 2024-06-22 · 💻 cs.CL · cs.LG

LaMSUM: Amplifying Voices Against Harassment through LLM Guided Extractive Summarization of User Incident Reports

Garima Chhikara , Anurag Sharma , V. Gurucharan , Kripabandhu Ghosh , Abhijnan Chakraborty This is my paper

Pith reviewed 2026-05-24 00:09 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords extractive summarizationlarge language modelssexual harassmentincident reportscode-mixed languagesmulti-level frameworkvoting methodscitizen reporting platforms

0 comments

The pith

LaMSUM uses multi-level voting to make LLMs output extractive summaries from large collections of code-mixed harassment reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LaMSUM, a framework that processes high volumes of user reports on sexual harassment incidents posted to citizen platforms. Standard LLMs produce paraphrased abstractive summaries and cannot fit thousands of reports into one context window, so the method breaks the task into multiple summarization stages and applies voting across outputs to select direct excerpts instead. This produces extractive summaries that preserve original wording while covering the full collection. A reader would care because manual review of every report is impractical, and such summaries could reveal patterns that help authorities shape prevention policies. Evaluations on four LLMs show the approach beats prior extractive methods.

Core claim

LaMSUM is a novel multi-level framework combining summarization with different voting methods to generate extractive summaries for large collections of incident reports using LLMs. It addresses LLMs' tendency to produce abstractive outputs and their limited context windows when processing code-mixed languages. Extensive evaluation using four popular LLMs (Llama, Mistral, Claude and GPT-4o) demonstrates that LaMSUM outperforms state-of-the-art extractive summarization methods. The work is presented as one of the first attempts to achieve extractive summarization through LLMs.

What carries the argument

multi-level framework that pairs staged summarization with voting methods to steer LLMs toward selecting verbatim excerpts

If this is right

Stakeholders receive a single overview covering thousands of reports without reading each one.
Policy makers can identify recurring patterns in harassment incidents more efficiently.
The same LLM back-ends (Llama, Mistral, Claude, GPT-4o) produce higher-quality extractive summaries than prior dedicated extractive systems.
Code-mixed text common in real user reports is handled without language-specific preprocessing.
The framework offers an early route to extractive rather than abstractive output from LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The voting mechanism might be reusable in other LLM tasks where strict fidelity to source text is required, such as legal document extraction.
Adding more hierarchy levels could allow processing of even larger report sets that current context limits still block.
Testing the framework on complaint data from unrelated domains would reveal whether the multi-level voting pattern generalizes beyond harassment reports.

Load-bearing premise

Voting across staged LLM outputs can reliably force selection of original text excerpts rather than new paraphrased sentences when reports mix languages and exceed single context windows.

What would settle it

Generate summaries on a test collection of 50 reports and verify whether every sentence in each output appears as a contiguous verbatim substring in the input set, with zero added or reworded content.

Figures

Figures reproduced from arXiv: 2406.15809 by Abhijnan Chakraborty, Anurag Sharma, Garima Chhikara, Kripabandhu Ghosh, V. Gurucharan.

**Figure 2.** Figure 2: LaMSUM: Multi-level framework for extractive summarization of large user-generated text. Input set T (level 0) is divided into ⌈ |T | s ⌉ chunks each of size s. From each chunk a summary is produced of size q (refer [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Textual units (e.g., posts) in the input chunk are [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Metric scores obtained through four different LLM setups. (i) Vanilla LLM without shuffling and voting method (ii) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Posts chosen by LaMSUM tend to be detailed and descriptive, offering a deeper level of information. Number of words in LaMSUM selected posts is often highest across various datasets, ensuring extensive and comprehensive summarization. Models City A City B City C City D City E LexRank 7.486 5.819 7.637 7.548 7.481 SummBasic 8.198 5.734 8.050 8.020 8.196 LSA 8.251 7.061 8.481 8.194 8.387 BERT 8.068 7.762 8… view at source ↗

**Figure 7.** Figure 7: Venn diagram showing the overlap in the gold standard summaries obtained from three annotators (HS1, HS2 and [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Citizen reporting platforms help the public and authorities stay informed about sexual harassment incidents. However, the high volume of data shared on these platforms makes reviewing each individual case challenging. Therefore, a summarization algorithm capable of processing and understanding various code-mixed languages is essential. In recent years, Large Language Models (LLMs) have shown exceptional performance in NLP tasks, including summarization. LLMs inherently produce abstractive summaries by paraphrasing the original text, while the generation of extractive summaries - selecting specific subsets from the original text - through LLMs remains largely unexplored. Moreover, LLMs have a limited context window size, restricting the amount of data that can be processed at once. We tackle these challenges by introducing LaMSUM, a novel multi-level framework combining summarization with different voting methods to generate extractive summaries for large collections of incident reports using LLMs. Extensive evaluation using four popular LLMs (Llama, Mistral, Claude and GPT-4o) demonstrates that LaMSUM outperforms state-of-the-art extractive summarization methods. Overall, this work represents one of the first attempts to achieve extractive summarization through LLMs, and is likely to support stakeholders by offering a comprehensive overview and enabling them to develop effective policies to minimize incidents of unwarranted harassment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LaMSUM proposes a multi-level LLM framework with voting to force extractive summaries from harassment reports, but the abstract supplies no metrics, baselines, or extractiveness checks, leaving the outperformance claim unsupported.

read the letter

The paper's main contribution is LaMSUM, a multi-level framework that uses LLMs to create extractive summaries of many user reports on sexual harassment incidents, especially in code-mixed languages. It combines chunking at different levels with voting methods to work around the limited context windows of models like Llama, Mistral, Claude, and GPT-4o. This is presented as one of the first efforts to push LLMs toward extractive rather than their natural abstractive output. What stands out as useful is the focus on a real-world application where high volumes of reports make individual review impractical. Extractive summaries could preserve original phrasing and help policymakers spot patterns without introducing new wording that might alter intent. The approach of breaking down large collections and aggregating via voting is a sensible engineering response to the constraints. On the downside, the abstract asserts that LaMSUM outperforms state-of-the-art extractive methods but supplies none of the supporting details. There are no reported metrics, no information on dataset sizes or composition, no description of the evaluation protocol, and no baselines listed. The stress-test concern about verifying extractiveness is relevant here. LLMs default to paraphrasing, so without something like a post-generation check for sentence overlap with the source or an ablation showing that voting improves extractiveness, any reported improvements might come from better overall summarization quality instead of staying true to the extractive goal. That would weaken the direct comparison to classical extractive baselines. This work is aimed at the intersection of NLP and social applications, particularly for handling citizen-generated data in multilingual or low-resource contexts. Readers working on summarization techniques or tools for public safety reporting might pick up ideas from the framework even if the results section needs expansion. I would recommend sending it for peer review. The idea is grounded in a clear problem and offers a practical structure, but the lack of evidence in the current version means referees would need to see the full experiments and any checks for extractiveness before it could be accepted. The authors seem to engage honestly with the challenges of LLMs in this setting.

Referee Report

2 major / 1 minor

Summary. The manuscript presents LaMSUM, a novel multi-level framework that combines LLM summarization with voting methods to generate extractive summaries from large collections of code-mixed incident reports on sexual harassment. The central claim is that LaMSUM outperforms state-of-the-art extractive summarization methods, as demonstrated through extensive evaluation using four popular LLMs: Llama, Mistral, Claude, and GPT-4o. The work aims to address challenges of context window limits and the tendency of LLMs to produce abstractive rather than extractive summaries.

Significance. If the results hold and the outputs are verifiably extractive, this could be a significant contribution to applying LLMs for extractive summarization in challenging settings involving code-mixed languages and large document collections, potentially aiding stakeholders in understanding harassment patterns and developing policies. The use of multiple LLMs and voting methods is a promising direction for steering LLMs towards extractive outputs.

major comments (2)

[Abstract] Abstract: the claim that LaMSUM 'outperforms state-of-the-art extractive summarization methods' is asserted without any quantitative metrics, baselines, dataset sizes, or evaluation protocol described, leaving the central empirical claim without visible supporting evidence.
[LaMSUM framework description] LaMSUM framework description (multi-level chunking + voting): no post-hoc verification is described that the generated summaries remain strictly extractive (e.g., sentence-level overlap ratio, ROUGE-L computed only against source sentences, or an ablation disabling voting to measure extractiveness drop). LLMs default to abstractive output, so without such a check the reported gains could reflect improved abstractive content rather than the claimed extractive property, undermining direct comparison to classical extractive baselines that are extractive by construction.

minor comments (1)

[Abstract] Abstract: the statement that this is 'one of the first attempts' to achieve extractive summarization through LLMs would benefit from explicit citations to any prior LLM-based extractive work for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that LaMSUM 'outperforms state-of-the-art extractive summarization methods' is asserted without any quantitative metrics, baselines, dataset sizes, or evaluation protocol described, leaving the central empirical claim without visible supporting evidence.

Authors: We acknowledge that the abstract presents the performance claim at a high level. While the full quantitative results, baselines, dataset details, and evaluation protocol are provided in the Experiments and Results sections, we agree that including key supporting metrics in the abstract would make the central claim more self-contained. In the revised manuscript we will add a concise statement of the main ROUGE improvements, number of baselines, and dataset size to the abstract. revision: yes
Referee: [LaMSUM framework description] LaMSUM framework description (multi-level chunking + voting): no post-hoc verification is described that the generated summaries remain strictly extractive (e.g., sentence-level overlap ratio, ROUGE-L computed only against source sentences, or an ablation disabling voting to measure extractiveness drop). LLMs default to abstractive output, so without such a check the reported gains could reflect improved abstractive content rather than the claimed extractive property, undermining direct comparison to classical extractive baselines that are extractive by construction.

Authors: This concern is valid. Although the LaMSUM design (multi-level chunking followed by sentence-level voting) is intended to enforce extractiveness by selecting verbatim sentences from the source, the submitted manuscript does not include explicit post-hoc verification. We will add a dedicated subsection that reports (1) sentence-level overlap ratios, (2) ROUGE-L computed exclusively against source sentences, and (3) an ablation that disables the voting stage to quantify any drop in extractiveness. These additions will directly address the possibility of abstractive leakage and strengthen the comparison to classical extractive baselines. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical comparison is independent of inputs

full rationale

The paper presents an empirical framework (LaMSUM) for LLM-based extractive summarization and reports performance gains against external baselines using four named LLMs. No equations, fitted parameters, or self-citations are invoked as load-bearing premises that reduce the central claim to a tautology or prior author result. The evaluation is described as direct comparison on held-out incident reports; the extractiveness property is asserted via the multi-level voting design rather than being defined in terms of the measured outcomes. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the untested assumption that the new framework can force extractive behavior from LLMs. No free parameters are mentioned. One domain assumption and one invented entity are introduced.

axioms (1)

domain assumption LLMs can be guided via multi-level processing and voting to output extractive rather than abstractive summaries despite limited context windows
Invoked to justify the need for and design of the LaMSUM framework.

invented entities (1)

LaMSUM framework no independent evidence
purpose: Enable extractive summarization of large code-mixed report collections using LLMs
Newly proposed combination of multi-level summarization and voting methods.

pith-pipeline@v0.9.0 · 5784 in / 1338 out tokens · 30386 ms · 2026-05-24T00:09:10.239138+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 2 internal anchors

[1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Bhattacharya, P.; Poddar, S.; Rudra, K.; Ghosh, K.; and Ghosh, S. 2021. Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

work page 2021
[4]

Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D. 2016. Handbook of computational social choice. Cambridge University Press

work page 2016
[5]

Bra z inskas, A.; Lapata, M.; and Titov, I. 2020. Few-Shot Learning for Opinion Summarization. In EMNLP

work page 2020
[6]

Brown, H.; and Shokri, R. 2023. How (Un)Fair is Text Summarization?

work page 2023
[7]

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.;...

work page 2020
[8]

Buchholz, K. 2024. The Countries That Are Safe & Unsafe for Women

work page 2024
[9]

Chang, Y.; Lo, K.; Goyal, T.; and Iyyer, M. 2024. BooookScore: A systematic exploration of book-length summarization in the era of LLMs. In ICLR

work page 2024
[10]

S.; Yang, Q.; and Xie, X

Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; Ye, W.; Zhang, Y.; Chang, Y.; Yu, P. S.; Yang, Q.; and Xie, X. 2023. A Survey on Evaluation of Large Language Models

work page 2023
[11]

Davidson, T.; Warmsley, D.; Macy, M.; and Weber, I. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. ICWSM

work page 2017
[12]

Dublish, N. 2020. All about the Hathras Case

work page 2020
[13]

ElSherief, M.; Belding, E.; and Nguyen, D. 2017. \#NotOkay: Understanding Gender-Based Violence in Social Media. ICWSM

work page 2017
[14]

Emerson, P. 2013. The original Borda count and partial voting. Social Choice and Welfare

work page 2013
[15]

Erkan, G.; and Radev, D. R. 2004. LexRank: Graph-based Lexical Centrality As Salience in Text Summarization. Journal of Artificial Intelligence Research

work page 2004
[16]

Ghosh Chowdhury, A.; Sawhney, R.; Mathur, P.; Mahata, D.; and Ratn Shah, R. 2019. Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment. In NAACL

work page 2019
[17]

Gong, Y.; and Liu, X. 2001. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In ACM SIGIR

work page 2001
[18]

J.; and Durrett, G

Goyal, T.; Li, J. J.; and Durrett, G. 2023. News Summarization and Evaluation in the Era of GPT-3. arXiv:2209.12356

work page arXiv 2023
[19]

T.; Karmaker Santu, S

Hassan, N.; Poudel, A.; Hale, J.; Hubacek, C.; Huq, K. T.; Karmaker Santu, S. K.; and Ahmed, S. I. 2020. Towards Automated Sexual Violence Report Tracking. ICWSM

work page 2020
[20]

Jia, R.; Cao, Y.; Tang, H.; Fang, F.; Cao, C.; and Wang, S. 2020. Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network. In EMNLP

work page 2020
[21]

Mixtral of Experts

Jiang, A. Q.; and et al. 2024. Mixtral of Experts. arXiv:2401.04088

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Jin, H.; Han, X.; Yang, J.; Jiang, Z.; Liu, Z.; Chang, C.-Y.; Chen, H.; and Hu, X. 2024 a . LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning. In ICML

work page 2024
[23]

Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; and Tan, J. 2024 b . A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv:2403.02901

work page arXiv 2024
[24]

Jost, L. 2006. Entropy and diversity. Oikos

work page 2006
[25]

Jung, T.; Kang, D.; Mentch, L.; and Hovy, E. 2019. Earlier Isn ' t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization. In EMNLP

work page 2019
[26]

Kanwal, N.; and Rizzo, G. 2022. Attention-based clinical note summarization. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

work page 2022
[27]

J.; and De Choudhury, M

Kim, S.; Razi, A.; Alsoubai, A.; Wisniewski, P. J.; and De Choudhury, M. 2024. Assessing the Impact of Online Harassment on Youth Mental Health in Private Networked Spaces. ICWSM

work page 2024
[28]

Kopackova, H.; and Libalova, P. 2019. Citizen reporting as the form of e-participation in smart cities. In Iberian Conference on Information Systems and Technologies (CISTI). IEEE

work page 2019
[29]

Laban, P.; Kryscinski, W.; Agarwal, D.; Fabbri, A.; Xiong, C.; Joty, S.; and Wu, C.-S. 2023. S umm E dits: Measuring LLM Ability at Factual Reasoning Through The Lens of Summarization. In EMNLP

work page 2023
[30]

Lackner, M.; Regner, P.; and Krenn, B. 2023. abcvoting: A P ython package for approval-based multi-winner voting rules. Journal of Open Source Software

work page 2023
[31]

Laskar, M. T. R.; Bari, M. S.; Rahman, M.; Bhuiyan, M. A. H.; Joty, S.; and Huang, J. 2023. A Systematic Study and Comprehensive Evaluation of C hat GPT on Benchmark Datasets. In Findings of the Association for Computational Linguistics: ACL 2023

work page 2023
[32]

Lin, C.-Y. 2004. ROUGE : A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out

work page 2004
[33]

Liu, Y.; and Lapata, M. 2019. Text Summarization with Pretrained Encoders. In EMNLP

work page 2019
[34]

Liu, Y.; Shi, K.; He, K.; Ye, L.; Fabbri, A.; Liu, P.; Radev, D.; and Cohan, A. 2024. On Learning to Summarize with Large Language Models as References. In NAACL

work page 2024
[35]

Luo, Z.; Xie, Q.; and Ananiadou, S. 2023. ChatGPT as a Factual Inconsistency Evaluator for Text Summarization. arXiv:2303.15621

work page arXiv 2023
[36]

K.; Goyal, P.; and Mukherjee, A

Mathew, B.; Saha, P.; Tharad, H.; Rajgaria, S.; Singhania, P.; Maity, S. K.; Goyal, P.; and Mukherjee, A. 2019. Thou Shalt Not Hate: Countering Online Hate Speech. ICWSM

work page 2019
[37]

Miller, D. 2019. Leveraging BERT for Extractive Text Summarization on Lectures. arXiv:1906.04165

work page internal anchor Pith review Pith/arXiv arXiv 2019
[38]

Mudambi, R.; Navarra, P.; and Nicosia, C. 1996. Plurality versus Proportional Representation: An Analysis of Sicilian Elections. Public Choice

work page 1996
[39]

C.; Vishnu, U.; Goyal, P.; Bhattacharya, S.; and Ganguly, N

Mukherjee, R.; Peruri, H. C.; Vishnu, U.; Goyal, P.; Bhattacharya, S.; and Ganguly, N. 2020. Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews. In SIGIR

work page 2020
[40]

Nenkova, A.; and Vanderwende, L. 2005. The impact of frequency on summarization. Technical report, Microsoft Research

work page 2005
[41]

Olteanu, A.; Castillo, C.; Boy, J.; and Varshney, K. 2018. The Effect of Extremist Violence on Hateful Speech Online. ICWSM

work page 2018
[42]

OpenAI. 2024. GPT-4o mini: advancing cost-efficient intelligence

work page 2024
[43]

F.; Leike, J.; and Lowe, R

Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; Schulman, J.; Hilton, J.; Kelton, F.; Miller, L.; Simens, M.; Askell, A.; Welinder, P.; Christiano, P. F.; Leike, J.; and Lowe, R. 2022. Training language models to follow instructions with human feedback. In NIPS

work page 2022
[44]

Park, H.; and Lee, J. 2021. Designing a Conversational Agent for Sexual Assault Survivors: Defining Burden of Self-Disclosure and Envisioning Survivor-Centered Solutions. In CHI

work page 2021
[45]

Pu, X.; Gao, M.; and Wan, X. 2023. Summarization is (Almost) Dead. arXiv:2309.09558

work page arXiv 2023
[46]

Ristad, E.; and Yianilos, P. 1998. Learning string-edit distance. IEEE Transactions on PAML

work page 1998
[47]

They Don't Leave Us Alone Anywhere We Go

Sambasivan, N.; Batool, A.; Ahmed, N.; Matthews, T.; Thomas, K.; Gayt\' a n-Lugo, L. S.; Nemer, D.; Bursztein, E.; Churchill, E.; and Consolvo, S. 2019. "They Don't Leave Us Alone Anywhere We Go": Gender and Digital Abuse in South Asia. In CHI

work page 2019
[48]

K.; and Shah, R

Sawhney, R.; Mathur, P.; Jain, T.; Gautam, A. K.; and Shah, R. R. 2021. Multitask Learning for Emotionally Analyzing Sexual Abuse Disclosures. In NAACL

work page 2021
[49]

Shin, B.; Floch, J.; Rask, M.; B ck, P.; Edgar, C.; Berditchevskaia, A.; Mesure, P.; and Branlat, M. 2024. A systematic analysis of digital tools for citizen participation. Government Information Quarterly

work page 2024
[50]

Stoop, W.; Kunneman, F.; van den Bosch, A.; and Miller, B. 2019. Detecting harassment in real-time as conversations develop. In Workshop on Abusive Language Online

work page 2019
[51]

F.; Moitra, A.; Amin, M

Sultana, S.; Deb, M.; Bhattacharjee, A.; Hasan, S.; Alam, S.; Chakraborty, T.; Roy, P.; Ahmed, S. F.; Moitra, A.; Amin, M. A.; Islam, A. N.; and Ahmed, S. I. 2021. ‘Unmochon’: A Tool to Combat Online Sexual Harassment over Facebook Messenger. In CHI

work page 2021
[52]

Tam, D.; Mascarenhas, A.; Zhang, S.; Kwan, S.; Bansal, M.; and Raffel, C. 2023. Evaluating the Factual Consistency of Large Language Models Through News Summarization. In ACL

work page 2023
[53]

W.; Burnsky, J.; Vincent, J

Tang, L.; Shalyminov, I.; mei Wong, A. W.; Burnsky, J.; Vincent, J. W.; Yang, Y.; Singh, S.; Feng, S.; Song, H.; Su, H.; Sun, L.; Zhang, Y.; Mansour, S.; and McKeown, K. 2024. TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization. arXiv:2402.13249

work page arXiv 2024
[54]

G.; Soroush, A.; Elias, P

Tang, L.; Sun, Z.; Idnay, B.; Nestor, J. G.; Soroush, A.; Elias, P. A.; Xu, Z.; Ding, Y.; Durrett, G.; Rousseau, J.; Weng, C.; and Peng, Y. 2023 a . Evaluating large language models on medical evidence summarization. medRxiv

work page 2023
[55]

Tang, Y.; Puduppully, R.; Liu, Z.; and Chen, N. 2023 b . In-context Learning of Large Language Models for Controlled Dialogue Summarization: A Holistic Benchmark and Empirical Analysis. In NewSumm Workshop

work page 2023
[56]

It’s common and a part of being a content creator

Thomas, K.; Kelley, P. G.; Consolvo, S.; Samermit, P.; and Bursztein, E. 2022. “It’s common and a part of being a content creator”: Understanding How Creators Experience and Cope with Hate and Harassment Online. In CHI

work page 2022
[57]

Times, T. E. 2024. Kolkata doctor rape-murder case: RG Kar, the campus was victim's `second home'

work page 2024
[58]

Today, I. 2020. Nirbhaya case: From December 16, 2012 to March 20, 2020 | A timeline

work page 2020
[59]

Touvron, H.; and et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models

work page 2023
[60]

Upadhayay, B.; Lodhia, Z.; and Behzadan, V. 2021. Combating Human Trafficking via Automatic OSINT Collection, Validation and Fusion. In ICWSM Workshop

work page 2021
[61]

Venkatasubramanian, K.; Skorinko, J. L. M.; Kobeissi, M.; Lewis, B.; Jutras, N.; Bosma, P.; Mullaly, J.; Kelly, B.; Lloyd, D.; Freark, M.; and Alterio, N. A. 2021. Exploring A Reporting Tool to Empower Individuals with Intellectual and Developmental Disabilities to Self-Report Abuse. In CHI

work page 2021
[62]

Worledge, T.; Hashimoto, T.; and Guestrin, C. 2024. The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations. arXiv:2411.17375

work page arXiv 2024
[63]

Wu, Y.; Iso, H.; Pezeshkpour, P.; Bhutani, N.; and Hruschka, E. 2024. Less is More for Long Document Summary Evaluation by LLM s. In EACL

work page 2024
[64]

Xu, J.; Gan, Z.; Cheng, Y.; and Liu, J. 2020. Discourse-Aware Neural Extractive Text Summarization. In ACL

work page 2020
[65]

Yang, X.; Li, Y.; Zhang, X.; Chen, H.; and Cheng, W. 2023. Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization. arXiv:2302.08081

work page arXiv 2023
[66]

Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; and Le, Q. V. 2019. XLNet: generalized autoregressive pretraining for language understanding. In NeurIPS

work page 2019
[67]

Zhang, H.; Liu, X.; and Zhang, J. 2022. HEGEL : Hypergraph Transformer for Long Document Summarization. In EMNLP

work page 2022
[68]

Zhang, H.; Liu, X.; and Zhang, J. 2023 a . D iffu S um: Generation Enhanced Extractive Summarization with Diffusion. In ACL

work page 2023
[69]

Zhang, H.; Liu, X.; and Zhang, J. 2023 b . Extractive Summarization via C hat GPT for Faithful Summary Generation. In EMNLP

work page 2023
[70]

Zhang, H.; Liu, X.; and Zhang, J. 2023 c . S umm I t: Iterative Text Summarization via C hat GPT . In EMNLP

work page 2023
[71]

Zhang, T.; Ladhak, F.; Durmus, E.; Liang, P.; McKeown, K.; and Hashimoto, T. B. 2024. Benchmarking Large Language Models for News Summarization . ACL Transactions

work page 2024
[72]

Zhao, W. X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; Du, Y.; Yang, C.; Chen, Y.; Chen, Z.; Jiang, J.; Ren, R.; Li, Y.; Tang, X.; Liu, Z.; Liu, P.; Nie, J.-Y.; and Wen, J.-R. 2023. A Survey of Large Language Models

work page 2023
[73]

Zhong, M.; Liu, P.; Chen, Y.; Wang, D.; Qiu, X.; and Huang, X. 2020. Extractive Summarization as Text Matching. In ACL

work page 2020
[74]

Ziems, C.; Vigfusson, Y.; and Morstatter, F. 2020. Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification. ICWSM

work page 2020

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Bhattacharya, P.; Poddar, S.; Rudra, K.; Ghosh, K.; and Ghosh, S. 2021. Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

work page 2021

[4] [4]

Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D. 2016. Handbook of computational social choice. Cambridge University Press

work page 2016

[5] [5]

Bra z inskas, A.; Lapata, M.; and Titov, I. 2020. Few-Shot Learning for Opinion Summarization. In EMNLP

work page 2020

[6] [6]

Brown, H.; and Shokri, R. 2023. How (Un)Fair is Text Summarization?

work page 2023

[7] [7]

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.;...

work page 2020

[8] [8]

Buchholz, K. 2024. The Countries That Are Safe & Unsafe for Women

work page 2024

[9] [9]

Chang, Y.; Lo, K.; Goyal, T.; and Iyyer, M. 2024. BooookScore: A systematic exploration of book-length summarization in the era of LLMs. In ICLR

work page 2024

[10] [10]

S.; Yang, Q.; and Xie, X

Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; Ye, W.; Zhang, Y.; Chang, Y.; Yu, P. S.; Yang, Q.; and Xie, X. 2023. A Survey on Evaluation of Large Language Models

work page 2023

[11] [11]

Davidson, T.; Warmsley, D.; Macy, M.; and Weber, I. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. ICWSM

work page 2017

[12] [12]

Dublish, N. 2020. All about the Hathras Case

work page 2020

[13] [13]

ElSherief, M.; Belding, E.; and Nguyen, D. 2017. \#NotOkay: Understanding Gender-Based Violence in Social Media. ICWSM

work page 2017

[14] [14]

Emerson, P. 2013. The original Borda count and partial voting. Social Choice and Welfare

work page 2013

[15] [15]

Erkan, G.; and Radev, D. R. 2004. LexRank: Graph-based Lexical Centrality As Salience in Text Summarization. Journal of Artificial Intelligence Research

work page 2004

[16] [16]

Ghosh Chowdhury, A.; Sawhney, R.; Mathur, P.; Mahata, D.; and Ratn Shah, R. 2019. Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment. In NAACL

work page 2019

[17] [17]

Gong, Y.; and Liu, X. 2001. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In ACM SIGIR

work page 2001

[18] [18]

J.; and Durrett, G

Goyal, T.; Li, J. J.; and Durrett, G. 2023. News Summarization and Evaluation in the Era of GPT-3. arXiv:2209.12356

work page arXiv 2023

[19] [19]

T.; Karmaker Santu, S

Hassan, N.; Poudel, A.; Hale, J.; Hubacek, C.; Huq, K. T.; Karmaker Santu, S. K.; and Ahmed, S. I. 2020. Towards Automated Sexual Violence Report Tracking. ICWSM

work page 2020

[20] [20]

Jia, R.; Cao, Y.; Tang, H.; Fang, F.; Cao, C.; and Wang, S. 2020. Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network. In EMNLP

work page 2020

[21] [21]

Mixtral of Experts

Jiang, A. Q.; and et al. 2024. Mixtral of Experts. arXiv:2401.04088

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Jin, H.; Han, X.; Yang, J.; Jiang, Z.; Liu, Z.; Chang, C.-Y.; Chen, H.; and Hu, X. 2024 a . LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning. In ICML

work page 2024

[23] [23]

Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; and Tan, J. 2024 b . A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv:2403.02901

work page arXiv 2024

[24] [24]

Jost, L. 2006. Entropy and diversity. Oikos

work page 2006

[25] [25]

Jung, T.; Kang, D.; Mentch, L.; and Hovy, E. 2019. Earlier Isn ' t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization. In EMNLP

work page 2019

[26] [26]

Kanwal, N.; and Rizzo, G. 2022. Attention-based clinical note summarization. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

work page 2022

[27] [27]

J.; and De Choudhury, M

Kim, S.; Razi, A.; Alsoubai, A.; Wisniewski, P. J.; and De Choudhury, M. 2024. Assessing the Impact of Online Harassment on Youth Mental Health in Private Networked Spaces. ICWSM

work page 2024

[28] [28]

Kopackova, H.; and Libalova, P. 2019. Citizen reporting as the form of e-participation in smart cities. In Iberian Conference on Information Systems and Technologies (CISTI). IEEE

work page 2019

[29] [29]

Laban, P.; Kryscinski, W.; Agarwal, D.; Fabbri, A.; Xiong, C.; Joty, S.; and Wu, C.-S. 2023. S umm E dits: Measuring LLM Ability at Factual Reasoning Through The Lens of Summarization. In EMNLP

work page 2023

[30] [30]

Lackner, M.; Regner, P.; and Krenn, B. 2023. abcvoting: A P ython package for approval-based multi-winner voting rules. Journal of Open Source Software

work page 2023

[31] [31]

Laskar, M. T. R.; Bari, M. S.; Rahman, M.; Bhuiyan, M. A. H.; Joty, S.; and Huang, J. 2023. A Systematic Study and Comprehensive Evaluation of C hat GPT on Benchmark Datasets. In Findings of the Association for Computational Linguistics: ACL 2023

work page 2023

[32] [32]

Lin, C.-Y. 2004. ROUGE : A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out

work page 2004

[33] [33]

Liu, Y.; and Lapata, M. 2019. Text Summarization with Pretrained Encoders. In EMNLP

work page 2019

[34] [34]

Liu, Y.; Shi, K.; He, K.; Ye, L.; Fabbri, A.; Liu, P.; Radev, D.; and Cohan, A. 2024. On Learning to Summarize with Large Language Models as References. In NAACL

work page 2024

[35] [35]

Luo, Z.; Xie, Q.; and Ananiadou, S. 2023. ChatGPT as a Factual Inconsistency Evaluator for Text Summarization. arXiv:2303.15621

work page arXiv 2023

[36] [36]

K.; Goyal, P.; and Mukherjee, A

Mathew, B.; Saha, P.; Tharad, H.; Rajgaria, S.; Singhania, P.; Maity, S. K.; Goyal, P.; and Mukherjee, A. 2019. Thou Shalt Not Hate: Countering Online Hate Speech. ICWSM

work page 2019

[37] [37]

Miller, D. 2019. Leveraging BERT for Extractive Text Summarization on Lectures. arXiv:1906.04165

work page internal anchor Pith review Pith/arXiv arXiv 2019

[38] [38]

Mudambi, R.; Navarra, P.; and Nicosia, C. 1996. Plurality versus Proportional Representation: An Analysis of Sicilian Elections. Public Choice

work page 1996

[39] [39]

C.; Vishnu, U.; Goyal, P.; Bhattacharya, S.; and Ganguly, N

Mukherjee, R.; Peruri, H. C.; Vishnu, U.; Goyal, P.; Bhattacharya, S.; and Ganguly, N. 2020. Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews. In SIGIR

work page 2020

[40] [40]

Nenkova, A.; and Vanderwende, L. 2005. The impact of frequency on summarization. Technical report, Microsoft Research

work page 2005

[41] [41]

Olteanu, A.; Castillo, C.; Boy, J.; and Varshney, K. 2018. The Effect of Extremist Violence on Hateful Speech Online. ICWSM

work page 2018

[42] [42]

OpenAI. 2024. GPT-4o mini: advancing cost-efficient intelligence

work page 2024

[43] [43]

F.; Leike, J.; and Lowe, R

Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; Schulman, J.; Hilton, J.; Kelton, F.; Miller, L.; Simens, M.; Askell, A.; Welinder, P.; Christiano, P. F.; Leike, J.; and Lowe, R. 2022. Training language models to follow instructions with human feedback. In NIPS

work page 2022

[44] [44]

Park, H.; and Lee, J. 2021. Designing a Conversational Agent for Sexual Assault Survivors: Defining Burden of Self-Disclosure and Envisioning Survivor-Centered Solutions. In CHI

work page 2021

[45] [45]

Pu, X.; Gao, M.; and Wan, X. 2023. Summarization is (Almost) Dead. arXiv:2309.09558

work page arXiv 2023

[46] [46]

Ristad, E.; and Yianilos, P. 1998. Learning string-edit distance. IEEE Transactions on PAML

work page 1998

[47] [47]

They Don't Leave Us Alone Anywhere We Go

Sambasivan, N.; Batool, A.; Ahmed, N.; Matthews, T.; Thomas, K.; Gayt\' a n-Lugo, L. S.; Nemer, D.; Bursztein, E.; Churchill, E.; and Consolvo, S. 2019. "They Don't Leave Us Alone Anywhere We Go": Gender and Digital Abuse in South Asia. In CHI

work page 2019

[48] [48]

K.; and Shah, R

Sawhney, R.; Mathur, P.; Jain, T.; Gautam, A. K.; and Shah, R. R. 2021. Multitask Learning for Emotionally Analyzing Sexual Abuse Disclosures. In NAACL

work page 2021

[49] [49]

Shin, B.; Floch, J.; Rask, M.; B ck, P.; Edgar, C.; Berditchevskaia, A.; Mesure, P.; and Branlat, M. 2024. A systematic analysis of digital tools for citizen participation. Government Information Quarterly

work page 2024

[50] [50]

Stoop, W.; Kunneman, F.; van den Bosch, A.; and Miller, B. 2019. Detecting harassment in real-time as conversations develop. In Workshop on Abusive Language Online

work page 2019

[51] [51]

F.; Moitra, A.; Amin, M

Sultana, S.; Deb, M.; Bhattacharjee, A.; Hasan, S.; Alam, S.; Chakraborty, T.; Roy, P.; Ahmed, S. F.; Moitra, A.; Amin, M. A.; Islam, A. N.; and Ahmed, S. I. 2021. ‘Unmochon’: A Tool to Combat Online Sexual Harassment over Facebook Messenger. In CHI

work page 2021

[52] [52]

Tam, D.; Mascarenhas, A.; Zhang, S.; Kwan, S.; Bansal, M.; and Raffel, C. 2023. Evaluating the Factual Consistency of Large Language Models Through News Summarization. In ACL

work page 2023

[53] [53]

W.; Burnsky, J.; Vincent, J

Tang, L.; Shalyminov, I.; mei Wong, A. W.; Burnsky, J.; Vincent, J. W.; Yang, Y.; Singh, S.; Feng, S.; Song, H.; Su, H.; Sun, L.; Zhang, Y.; Mansour, S.; and McKeown, K. 2024. TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization. arXiv:2402.13249

work page arXiv 2024

[54] [54]

G.; Soroush, A.; Elias, P

Tang, L.; Sun, Z.; Idnay, B.; Nestor, J. G.; Soroush, A.; Elias, P. A.; Xu, Z.; Ding, Y.; Durrett, G.; Rousseau, J.; Weng, C.; and Peng, Y. 2023 a . Evaluating large language models on medical evidence summarization. medRxiv

work page 2023

[55] [55]

Tang, Y.; Puduppully, R.; Liu, Z.; and Chen, N. 2023 b . In-context Learning of Large Language Models for Controlled Dialogue Summarization: A Holistic Benchmark and Empirical Analysis. In NewSumm Workshop

work page 2023

[56] [56]

It’s common and a part of being a content creator

Thomas, K.; Kelley, P. G.; Consolvo, S.; Samermit, P.; and Bursztein, E. 2022. “It’s common and a part of being a content creator”: Understanding How Creators Experience and Cope with Hate and Harassment Online. In CHI

work page 2022

[57] [57]

Times, T. E. 2024. Kolkata doctor rape-murder case: RG Kar, the campus was victim's `second home'

work page 2024

[58] [58]

Today, I. 2020. Nirbhaya case: From December 16, 2012 to March 20, 2020 | A timeline

work page 2020

[59] [59]

Touvron, H.; and et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models

work page 2023

[60] [60]

Upadhayay, B.; Lodhia, Z.; and Behzadan, V. 2021. Combating Human Trafficking via Automatic OSINT Collection, Validation and Fusion. In ICWSM Workshop

work page 2021

[61] [61]

Venkatasubramanian, K.; Skorinko, J. L. M.; Kobeissi, M.; Lewis, B.; Jutras, N.; Bosma, P.; Mullaly, J.; Kelly, B.; Lloyd, D.; Freark, M.; and Alterio, N. A. 2021. Exploring A Reporting Tool to Empower Individuals with Intellectual and Developmental Disabilities to Self-Report Abuse. In CHI

work page 2021

[62] [62]

Worledge, T.; Hashimoto, T.; and Guestrin, C. 2024. The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations. arXiv:2411.17375

work page arXiv 2024

[63] [63]

Wu, Y.; Iso, H.; Pezeshkpour, P.; Bhutani, N.; and Hruschka, E. 2024. Less is More for Long Document Summary Evaluation by LLM s. In EACL

work page 2024

[64] [64]

Xu, J.; Gan, Z.; Cheng, Y.; and Liu, J. 2020. Discourse-Aware Neural Extractive Text Summarization. In ACL

work page 2020

[65] [65]

Yang, X.; Li, Y.; Zhang, X.; Chen, H.; and Cheng, W. 2023. Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization. arXiv:2302.08081

work page arXiv 2023

[66] [66]

Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; and Le, Q. V. 2019. XLNet: generalized autoregressive pretraining for language understanding. In NeurIPS

work page 2019

[67] [67]

Zhang, H.; Liu, X.; and Zhang, J. 2022. HEGEL : Hypergraph Transformer for Long Document Summarization. In EMNLP

work page 2022

[68] [68]

Zhang, H.; Liu, X.; and Zhang, J. 2023 a . D iffu S um: Generation Enhanced Extractive Summarization with Diffusion. In ACL

work page 2023

[69] [69]

Zhang, H.; Liu, X.; and Zhang, J. 2023 b . Extractive Summarization via C hat GPT for Faithful Summary Generation. In EMNLP

work page 2023

[70] [70]

Zhang, H.; Liu, X.; and Zhang, J. 2023 c . S umm I t: Iterative Text Summarization via C hat GPT . In EMNLP

work page 2023

[71] [71]

Zhang, T.; Ladhak, F.; Durmus, E.; Liang, P.; McKeown, K.; and Hashimoto, T. B. 2024. Benchmarking Large Language Models for News Summarization . ACL Transactions

work page 2024

[72] [72]

Zhao, W. X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; Du, Y.; Yang, C.; Chen, Y.; Chen, Z.; Jiang, J.; Ren, R.; Li, Y.; Tang, X.; Liu, Z.; Liu, P.; Nie, J.-Y.; and Wen, J.-R. 2023. A Survey of Large Language Models

work page 2023

[73] [73]

Zhong, M.; Liu, P.; Chen, Y.; Wang, D.; Qiu, X.; and Huang, X. 2020. Extractive Summarization as Text Matching. In ACL

work page 2020

[74] [74]

Ziems, C.; Vigfusson, Y.; and Morstatter, F. 2020. Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification. ICWSM

work page 2020