LLM-MemCluster: Empowering Large Language Models with Dynamic Memory for Text Clustering

Jindong Wang; Ke Xu; Liangwei Yang; Philip S. Yu; Weizhi Zhang; Yuanjie Zhu; Zihe Song

arxiv: 2511.15424 · v2 · submitted 2025-11-19 · 💻 cs.CL

LLM-MemCluster: Empowering Large Language Models with Dynamic Memory for Text Clustering

Yuanjie Zhu , Liangwei Yang , Ke Xu , Weizhi Zhang , Zihe Song , Jindong Wang , Philip S. Yu This is my paper

Pith reviewed 2026-05-17 20:41 UTC · model grok-4.3

classification 💻 cs.CL

keywords text clusteringlarge language modelsdynamic memorydual-prompt strategyend-to-end clusteringunsupervised learningtuning-free framework

0 comments

The pith

Dynamic memory and dual prompts let LLMs handle text clustering end-to-end without tuning or external modules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models understand text semantics but lack built-in ways to track state during iterative clustering or decide how many clusters to form. The paper presents LLM-MemCluster as a framework that adds dynamic memory to retain awareness across steps and a dual-prompt strategy so the model itself reasons about cluster count. This turns clustering into a single, native LLM process with no fine-tuning, no post-processing, and no outside components. On benchmark datasets the method beats strong existing approaches. Readers would care because it removes the usual patchwork of extra tools and lets the model's semantic strengths drive the whole task directly.

Core claim

The paper claims that reconceptualizing text clustering as a fully LLM-native task through a Dynamic Memory to instill state awareness and a Dual-Prompt Strategy to enable the model to reason about and determine the number of clusters produces a tuning-free framework that significantly and consistently outperforms strong baselines on several benchmark datasets.

What carries the argument

Dynamic Memory that maintains state across iterative clustering steps together with Dual-Prompt Strategy that lets the LLM decide cluster numbers itself.

If this is right

Text clustering no longer requires complex pipelines that combine LLMs with separate external modules.
The entire process becomes interpretable through the LLM's own step-by-step reasoning.
No task-specific fine-tuning or post-processing steps are needed to reach competitive results.
The same mechanisms support consistent gains across multiple standard text clustering benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

State-maintenance techniques like the dynamic memory could transfer to other iterative LLM tasks such as long-horizon planning or sequential decision problems.
The dual-prompt method for self-setting parameters might generalize to other unsupervised settings where LLMs must choose their own hyperparameters.
Combining this memory with existing LLM tool-use patterns could produce more autonomous systems for data exploration tasks.

Load-bearing premise

That an LLM can maintain useful state across clustering steps and accurately determine the number of clusters using only the dual-prompt strategy without external modules or post-processing.

What would settle it

On the same benchmark datasets used in the paper, if the framework shows no consistent outperformance over strong baselines or if the cluster counts it selects deviate markedly from known ground truth, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2511.15424 by Jindong Wang, Ke Xu, Liangwei Yang, Philip S. Yu, Weizhi Zhang, Yuanjie Zhu, Zihe Song.

**Figure 2.** Figure 2: Comprehensive ablation study and adaptive clustering strategy comparison. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Hyperparameter sensitivity analysis of the prompt transition threshold, demonstrating robust and near [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The unified prompt template (system prompt). [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: The unified prompt template (user prompt, with [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Content for placeholder [SYSTEM_GUIDELINE]. Injected into Figure [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Content for placeholder [USER_CONSTRAINT]. Injected into Figure [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Large Language Models (LLMs) are reshaping unsupervised learning by offering an unprecedented ability to perform text clustering based on their deep semantic understanding. However, their direct application is fundamentally limited by a lack of stateful memory for iterative refinement and the difficulty of managing cluster granularity. As a result, existing methods often rely on complex pipelines with external modules, sacrificing a truly end-to-end approach. We introduce LLM-MemCluster, a novel framework that reconceptualizes clustering as a fully LLM-native task. It leverages a Dynamic Memory to instill state awareness and a Dual-Prompt Strategy to enable the model to reason about and determine the number of clusters. Evaluated on several benchmark datasets, our tuning-free framework significantly and consistently outperforms strong baselines. LLM-MemCluster presents an effective, interpretable, and truly end-to-end paradigm for LLM-based text clustering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces LLM-MemCluster, a tuning-free framework that reconceptualizes text clustering as an LLM-native task. It uses a Dynamic Memory mechanism to provide state awareness across iterative steps and a Dual-Prompt Strategy to let the model determine the number of clusters, claiming to be fully end-to-end without external modules and to significantly and consistently outperform strong baselines on benchmark datasets.

Significance. If the central experimental claims hold under rigorous verification, the work would offer a meaningful advance in LLM-based unsupervised learning by demonstrating a native, interpretable alternative to pipeline-heavy approaches. The emphasis on in-context state maintenance for iterative refinement is a timely direction, though the current evidence base leaves the practical reliability open.

major comments (3)

[§4 Experiments, Table 1] §4 (Experiments) and Table 1: the reported outperformance lacks error bars, standard deviations, dataset sizes, statistical significance tests, or implementation details for the baselines. Without these, the claim of 'significantly and consistently outperforms' cannot be assessed as load-bearing evidence.
[§3.1 Dynamic Memory] §3.1 (Dynamic Memory): the mechanism for maintaining cluster assignments across steps is described at a high level but provides no explicit safeguard, verification step, or analysis against cumulative drift or loss of earlier assignments. This directly underpins the state-awareness premise required for the end-to-end claim.
[§3.2 Dual-Prompt Strategy] §3.2 (Dual-Prompt Strategy): the second prompt for inferring k is presented without ablation studies, failure-case analysis, or comparison against ground-truth cluster counts on ambiguous datasets. The assumption that the LLM will reliably produce a stable, correct k therefore remains unverified and central to the framework's autonomy.

minor comments (2)

[§3.1] Notation for memory updates could be formalized with a short pseudocode snippet to improve reproducibility.
[§1] A few sentences in the introduction repeat the motivation for end-to-end clustering; tightening would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our paper. We address each of the major comments in detail below and have made revisions to the manuscript to strengthen the presentation of our results and methods.

read point-by-point responses

Referee: [§4 Experiments, Table 1] §4 (Experiments) and Table 1: the reported outperformance lacks error bars, standard deviations, dataset sizes, statistical significance tests, or implementation details for the baselines. Without these, the claim of 'significantly and consistently outperforms' cannot be assessed as load-bearing evidence.

Authors: We agree that including error bars, standard deviations, dataset sizes, and statistical significance tests would provide stronger evidence for our claims. In the revised manuscript, we have updated Table 1 to include mean performance with standard deviations across multiple runs, specified the dataset sizes, added p-values from statistical tests comparing to baselines, and expanded the implementation details for all baselines in the experimental setup section. revision: yes
Referee: [§3.1 Dynamic Memory] §3.1 (Dynamic Memory): the mechanism for maintaining cluster assignments across steps is described at a high level but provides no explicit safeguard, verification step, or analysis against cumulative drift or loss of earlier assignments. This directly underpins the state-awareness premise required for the end-to-end claim.

Authors: The Dynamic Memory is designed to store and update cluster assignments in a persistent structure that is referenced in subsequent prompts to maintain continuity. While the original description was high-level, we have added explicit details on how assignments are preserved and refreshed at each iteration to prevent drift. Additionally, we include an analysis of assignment consistency across steps in the revised version to verify the state-awareness. revision: yes
Referee: [§3.2 Dual-Prompt Strategy] §3.2 (Dual-Prompt Strategy): the second prompt for inferring k is presented without ablation studies, failure-case analysis, or comparison against ground-truth cluster counts on ambiguous datasets. The assumption that the LLM will reliably produce a stable, correct k therefore remains unverified and central to the framework's autonomy.

Authors: We acknowledge the need for more verification on the Dual-Prompt Strategy for determining k. In the revision, we have incorporated ablation studies that compare the model's inferred number of clusters against ground-truth values across the benchmark datasets, including analysis on ambiguous cases. We also discuss failure cases where the inferred k deviates and how the framework handles them, providing empirical support for the reliability of this component. revision: yes

Circularity Check

0 steps flagged

No circularity: framework defined by explicit components with no derivation or reduction to inputs

full rationale

The paper presents LLM-MemCluster as a tuning-free, end-to-end framework that uses a Dynamic Memory mechanism and Dual-Prompt Strategy to enable LLMs to perform text clustering with state awareness and automatic cluster count determination. No equations, mathematical derivations, fitted parameters, or self-citation chains appear in the provided claims or abstract. The method is introduced by direct definition of its two core components rather than by any reduction that equates outputs to inputs by construction. Empirical performance claims rest on benchmark evaluations, which remain independent of the definitional structure itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that current LLMs possess sufficient semantic understanding and reasoning to maintain clustering state and choose cluster count without external scaffolding; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption LLMs can perform iterative clustering refinement when given stateful memory and prompts that ask them to reason about cluster count
Invoked in the abstract description of the Dual-Prompt Strategy and Dynamic Memory

pith-pipeline@v0.9.0 · 5461 in / 1110 out tokens · 28022 ms · 2026-05-17T20:41:48.431136+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 5 internal anchors

[1]

GPT-4 Technical Report

Gpt-4 techni- cal report.arXiv preprint arXiv:2303.08774. Prafulla Bafna, Dhanya Pramod, and Anagha Vaidya

work page internal anchor Pith review Pith/arXiv arXiv
[2]

In 2016 International Conference on Electrical, Elec- tronics, and Optimization Techniques (ICEEOT), pages 61–66

Document clustering: Tf-idf approach. In 2016 International Conference on Electrical, Elec- tronics, and Optimization Techniques (ICEEOT), pages 61–66. Sugato Basu, Arindam Banerjee, and Raymond J Mooney

work page 2016
[3]

InProceedings of the 2004 SIAM international conference on data mining, pages 333–344

Active semi-supervision for pairwise constrained clustering. InProceedings of the 2004 SIAM international conference on data mining, pages 333–344. SIAM. Dingsheng Deng

work page 2004
[4]

9 Absalom E Ezugwu, Abiodun M Ikotun, Olaide O Oye- lade, Laith Abualigah, Jeffery O Agushaka, Christo- pher I Eke, and Andronicus A Akinyelu

k-llmmeans: scalable, sta- ble, and interpretable text clustering via llm-based centroids.arXiv preprint arXiv:2502.09667. 9 Absalom E Ezugwu, Abiodun M Ikotun, Olaide O Oye- lade, Laith Abualigah, Jeffery O Agushaka, Christo- pher I Eke, and Andronicus A Akinyelu

work page arXiv
[5]

InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 18455–18462, Miami, Florida, USA

LLMEdgeRefine: En- hancing text clustering with LLM-based boundary point refinement. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 18455–18462, Miami, Florida, USA. Maarten Grootendorst

work page 2024
[6]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Bertopic: Neural topic modeling with a class-based tf-idf procedure.arXiv preprint arXiv:2203.05794. Amir Hadifar, Lucas Sterckx, Thomas Demeester, and Chris Develder

work page internal anchor Pith review Pith/arXiv arXiv
[7]

InProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 194–199, Florence, Italy

A self-training approach for short text clustering. InProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 194–199, Florence, Italy. Chen Huang and Guoxiu He

work page 2019
[8]

Xin Jin and Jiawei Han

Text cluster- ing as classification with llms.arXiv preprint arXiv:2410.00927. Xin Jin and Jiawei Han

work page arXiv
[9]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, and 1 others

Zerodl: Zero-shot distribution learning for text clus- tering via large language models.arXiv preprint arXiv:2406.13342. Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, and 1 others. 2024a. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437. Guangda Liu, Chengwei Li, Ji...

work page arXiv
[10]

MTEB: Massive Text Embedding Benchmark

Mteb: Massive text embedding benchmark.arXiv preprint arXiv:2210.07316. Andrew Ng, Michael Jordan, and Yair Weiss

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Xingcheng Ran, Yue Xi, Yonggang Lu, Xiangwen Wang, and Zhenyu Lu

Her- cules: Hierarchical embedding-based recursive clus- tering using llms for efficient summarization.arXiv preprint arXiv:2506.19992. Xingcheng Ran, Yue Xi, Yonggang Lu, Xiangwen Wang, and Zhenyu Lu

work page arXiv
[12]

One embedder, any task: Instruction-finetuned text embeddings.arXiv preprint arXiv:2212.09741, 2022

One embedder, any task: Instruction-finetuned text em- beddings.arXiv preprint arXiv:2212.09741. Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Mil- lican, and 1 others

work page arXiv
[13]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805. Hongtao Wang, Taiyan Zhang, Renchi Yang, and Jianliang Xu

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei

Cost-effective text cluster- ing with large language models.arXiv preprint arXiv:2504.15640. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei

work page arXiv
[15]

A-MEM: Agentic Memory for LLM Agents

A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110. Yuwei Zhang, Zihan Wang, and Jingbo Shang

work page internal anchor Pith review Pith/arXiv arXiv
[16]

InProceedings of the 2023 Confer- ence on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore

ClusterLLM: Large language models as a guide for text clustering. InProceedings of the 2023 Confer- ence on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore. Sheng Zhou, Hongjia Xu, Zhuonan Zheng, Jiawei Chen, Zhao Li, Jiajun Bu, Jia Wu, Xin Wang, Wenwu Zhu, and Martin Ester

work page 2023

[1] [1]

GPT-4 Technical Report

Gpt-4 techni- cal report.arXiv preprint arXiv:2303.08774. Prafulla Bafna, Dhanya Pramod, and Anagha Vaidya

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

In 2016 International Conference on Electrical, Elec- tronics, and Optimization Techniques (ICEEOT), pages 61–66

Document clustering: Tf-idf approach. In 2016 International Conference on Electrical, Elec- tronics, and Optimization Techniques (ICEEOT), pages 61–66. Sugato Basu, Arindam Banerjee, and Raymond J Mooney

work page 2016

[3] [3]

InProceedings of the 2004 SIAM international conference on data mining, pages 333–344

Active semi-supervision for pairwise constrained clustering. InProceedings of the 2004 SIAM international conference on data mining, pages 333–344. SIAM. Dingsheng Deng

work page 2004

[4] [4]

9 Absalom E Ezugwu, Abiodun M Ikotun, Olaide O Oye- lade, Laith Abualigah, Jeffery O Agushaka, Christo- pher I Eke, and Andronicus A Akinyelu

k-llmmeans: scalable, sta- ble, and interpretable text clustering via llm-based centroids.arXiv preprint arXiv:2502.09667. 9 Absalom E Ezugwu, Abiodun M Ikotun, Olaide O Oye- lade, Laith Abualigah, Jeffery O Agushaka, Christo- pher I Eke, and Andronicus A Akinyelu

work page arXiv

[5] [5]

InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 18455–18462, Miami, Florida, USA

LLMEdgeRefine: En- hancing text clustering with LLM-based boundary point refinement. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 18455–18462, Miami, Florida, USA. Maarten Grootendorst

work page 2024

[6] [6]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Bertopic: Neural topic modeling with a class-based tf-idf procedure.arXiv preprint arXiv:2203.05794. Amir Hadifar, Lucas Sterckx, Thomas Demeester, and Chris Develder

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

InProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 194–199, Florence, Italy

A self-training approach for short text clustering. InProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 194–199, Florence, Italy. Chen Huang and Guoxiu He

work page 2019

[8] [8]

Xin Jin and Jiawei Han

Text cluster- ing as classification with llms.arXiv preprint arXiv:2410.00927. Xin Jin and Jiawei Han

work page arXiv

[9] [9]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, and 1 others

Zerodl: Zero-shot distribution learning for text clus- tering via large language models.arXiv preprint arXiv:2406.13342. Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, and 1 others. 2024a. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437. Guangda Liu, Chengwei Li, Ji...

work page arXiv

[10] [10]

MTEB: Massive Text Embedding Benchmark

Mteb: Massive text embedding benchmark.arXiv preprint arXiv:2210.07316. Andrew Ng, Michael Jordan, and Yair Weiss

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Xingcheng Ran, Yue Xi, Yonggang Lu, Xiangwen Wang, and Zhenyu Lu

Her- cules: Hierarchical embedding-based recursive clus- tering using llms for efficient summarization.arXiv preprint arXiv:2506.19992. Xingcheng Ran, Yue Xi, Yonggang Lu, Xiangwen Wang, and Zhenyu Lu

work page arXiv

[12] [12]

One embedder, any task: Instruction-finetuned text embeddings.arXiv preprint arXiv:2212.09741, 2022

One embedder, any task: Instruction-finetuned text em- beddings.arXiv preprint arXiv:2212.09741. Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Mil- lican, and 1 others

work page arXiv

[13] [13]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805. Hongtao Wang, Taiyan Zhang, Renchi Yang, and Jianliang Xu

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei

Cost-effective text cluster- ing with large language models.arXiv preprint arXiv:2504.15640. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei

work page arXiv

[15] [15]

A-MEM: Agentic Memory for LLM Agents

A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110. Yuwei Zhang, Zihan Wang, and Jingbo Shang

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

InProceedings of the 2023 Confer- ence on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore

ClusterLLM: Large language models as a guide for text clustering. InProceedings of the 2023 Confer- ence on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore. Sheng Zhou, Hongjia Xu, Zhuonan Zheng, Jiawei Chen, Zhao Li, Jiajun Bu, Jia Wu, Xin Wang, Wenwu Zhu, and Martin Ester

work page 2023