arxiv: 2605.10097 · v1 · submitted 2026-05-11 · 💻 cs.IR

Recognition: 2 theorem links

· Lean Theorem

H-MAPS: Hierarchical Memory-Augmented Proactive Search Assistant for Scientific Literature

Koji Nishikawa , Makoto P. Kato

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:01 UTC · model grok-4.3

classification 💻 cs.IR

keywords proactive information retrievalhierarchical memoryimplicit user modelingon-device neural retrievalscientific literature searchpersonalized reading assistantcontext-aware search

0 comments

The pith

H-MAPS turns implicit reading behaviors into on-device personalized literature questions via three-layered memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Scientific readers often need external papers but stop to type searches that break concentration. H-MAPS watches how a person scrolls, pauses, and spends time on sections of a paper, then uses a three-layered memory structure to infer that reader's background and specific gaps. From those inferences the system writes explicit natural-language questions and runs neural retrieval entirely on the local device so no reading data leaves the machine. A demonstration shows the same paper producing NLP-focused suggestions for one specialist and HCI-focused suggestions for another.

Core claim

H-MAPS resolves context ambiguity in proactive information retrieval by maintaining a three-layered hierarchical memory that converts implicit reading signals into explicit natural-language questions and performs entirely on-device neural retrieval to preserve privacy. In the presented scenario the system produces distinct, profile-matched literature lists for two researchers who read identical text.

What carries the argument

Three-layered hierarchical memory that stores user background, current reading context, and inferred latent needs, then maps observed behaviors onto generated questions for local retrieval.

If this is right

Readers finish a paper without ever leaving the document to type a search.
The same source text yields different follow-up literature depending on the reader's domain focus.
All question generation and retrieval runs locally so no reading traces are transmitted.
The approach can be triggered automatically by natural pauses rather than explicit user commands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same memory layers could be adapted to non-scientific long-form reading such as textbooks or reports.
On-device models would need to be small enough to run without perceptible lag during normal scrolling.
Future versions might combine the memory with explicit user corrections to refine the inferred profile over multiple papers.

Load-bearing premise

Implicit signals such as time on sections and scrolling patterns can be mapped reliably onto a reader's specific background and information needs.

What would settle it

A controlled study in which two readers with identical scrolling and dwell patterns but different expertise receive the same generated questions and the same retrieved papers.

Figures

Figures reproduced from arXiv: 2605.10097 by Koji Nishikawa, Makoto P. Kato.

**Figure 2.** Figure 2: H-MAPS overlay UI. The assistant operates as a peripheral overlay on the desktop, generating multiple literature [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 1.** Figure 1: System architecture of H-MAPS, which comprises [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Scientific reading is an active process that frequently requires consulting external resources, but manual keyword searching interrupts the reading flow and imposes a high cognitive load. Existing proactive information retrieval systems often suffer from context ambiguity, as they rely solely on on-screen text and ignore the reader's specific background and intent. In this demonstration, we present H-MAPS (Hierarchical Memory-Augmented Proactive Search Assistant), a proactive literature exploration assistant that resolves this ambiguity by leveraging a three-layered hierarchical memory. Triggered by implicit reading behaviors, H-MAPS articulates the user's latent information needs into explicit natural language questions and performs neural retrieval entirely on the local device to ensure privacy. We demonstrate H-MAPS using a scenario where two researchers, specializing in NLP and HCI, read the same paper. In response, the system generates profile-specific questions and retrieves distinct literature tailored to each user.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

H-MAPS is a clean system demo that turns implicit reading signals into profile-specific questions via hierarchical memory and keeps retrieval local for privacy, but it rests on a single untested scenario with no metrics at all.

read the letter

This paper describes H-MAPS, a proactive assistant that watches how someone reads a paper—dwell time, scrolling—and uses a three-layer memory to turn those signals into explicit questions, then pulls related work from a local neural index. The two-researcher scenario (NLP vs HCI) is straightforward and shows the personalization angle clearly. The on-device execution is a practical choice that avoids sending reading data anywhere.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces H-MAPS, a Hierarchical Memory-Augmented Proactive Search Assistant for scientific literature. It claims to resolve context ambiguity in proactive IR by using a three-layered hierarchical memory that infers latent user needs from implicit reading behaviors (e.g., time on sections, scrolling), generates explicit natural language questions, and performs on-device neural retrieval to maintain privacy. The system is demonstrated through a qualitative scenario involving two researchers with different specializations (NLP and HCI) reading the same paper, leading to profile-specific questions and tailored literature retrieval.

Significance. If validated, the approach could meaningfully advance proactive IR by addressing user-specific intent and privacy in scientific reading assistants. The core idea of hierarchical memory triggered by implicit signals offers a plausible path beyond text-only context, but the single-scenario demonstration supplies no evidence that the mapping from behaviors to accurate questions or improved retrieval holds in practice.

major comments (2)

[Demonstration Scenario] Demonstration section: the central claim that the three-layered hierarchical memory produces accurate, profile-specific questions from implicit behaviors rests entirely on one illustrative scenario with two researchers. No metrics (question relevance, intent alignment, retrieval precision@K, or comparisons to non-hierarchical baselines) or user-study data are reported, leaving the effectiveness of the memory layers untested.
[System Architecture] System description: the manuscript provides no technical specification of the three memory layers, including how implicit signals are mapped to each layer, the exact question-generation process, or the on-device retrieval model. Without these details the architecture cannot be evaluated or reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our demonstration paper. This work presents H-MAPS as a conceptual system for proactive literature search, illustrated via a scenario rather than through quantitative evaluation. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Demonstration Scenario] Demonstration section: the central claim that the three-layered hierarchical memory produces accurate, profile-specific questions from implicit behaviors rests entirely on one illustrative scenario with two researchers. No metrics (question relevance, intent alignment, retrieval precision@K, or comparisons to non-hierarchical baselines) or user-study data are reported, leaving the effectiveness of the memory layers untested.

Authors: We agree that the paper relies on a single illustrative scenario rather than empirical data. As this is explicitly a demonstration paper, the scenario with NLP and HCI researchers is intended only to show how the hierarchical memory could differentiate user intent from implicit signals and produce tailored questions and retrievals. We make no claims of measured accuracy, alignment, or superiority over baselines. To clarify this, we will revise the demonstration section to explicitly label the example as illustrative, remove any implication of validated effectiveness, and add a limitations paragraph outlining the need for future user studies with metrics such as question relevance ratings and retrieval precision. revision: partial
Referee: [System Architecture] System description: the manuscript provides no technical specification of the three memory layers, including how implicit signals are mapped to each layer, the exact question-generation process, or the on-device retrieval model. Without these details the architecture cannot be evaluated or reproduced.

Authors: We acknowledge that the architecture is currently described at a conceptual level without implementation specifics. We will revise the system description to add technical details: the three layers (short-term for on-screen context, mid-term for session-level behaviors such as dwell time and scroll patterns, long-term for inferred profile), the mapping of implicit signals via simple heuristics and embedding updates, question generation via an LLM prompted with the aggregated memory state, and the on-device retrieval using a quantized local embedding model with privacy guarantees. These additions will support evaluation and reproducibility while preserving the demonstration focus. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural system description with no derivations or load-bearing self-references

full rationale

The paper is a demonstration of the H-MAPS system architecture, which uses a three-layered hierarchical memory triggered by implicit reading behaviors to generate explicit questions and perform on-device retrieval. The full text (as referenced) and abstract contain no equations, parameter fittings, predictions, uniqueness theorems, or derivation chains. The central claim is illustrated solely via a single scenario with two researchers producing profile-specific outputs; this is a direct example rather than a reduction of any quantity to prior inputs. No self-citations are invoked to justify mathematical premises, and the design choices stand independently without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the effectiveness of an invented three-layered hierarchical memory structure whose ability to resolve intent from implicit signals is postulated without external benchmarks or evidence in the abstract.

invented entities (1)

three-layered hierarchical memory no independent evidence
purpose: To capture reader background and intent from implicit behaviors and resolve context ambiguity
Introduced as the core technical component but no independent validation or falsifiable handle is supplied in the abstract.

pith-pipeline@v0.9.0 · 5440 in / 1137 out tokens · 61333 ms · 2026-05-12T03:01:43.658223+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three-layered hierarchical memory M=<M_loc, M_ses, m_prof> ... triggered by implicit reading behaviors ... articulates latent information needs into explicit natural language questions
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Behavior-Driven Trigger ... Sustained Attention ... Content Revisit ... Jaccard similarity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, and Tianyu Gao. 2024. LitSearch: A Retrieval Benchmark for Scientific Literature Search. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 15068–15083. doi:10.48550/arXiv.2407.18940

work page doi:10.48550/arxiv.2407.18940 2024
[2]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page
[3]

InThe Twelfth International Conference on Learning Representations

Self-RAG: Learning to Retrieve, Generate, and Critique through Self- Reflection. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=hSyW5go0v8

work page
[4]

Bennett, Ryen W

Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short- and long-term behavior on search personalization. InProceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval(Portland, Oregon, USA)(SIGIR ’12). Association for ...

work page doi:10.1145/2348283.2348312 2012
[5]

Jeff Johnson, Matthijs Douze, and Herve Jegou. 2021. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 03 (July 2021), 535–547. doi:10.1109/TBDATA.2019.2921572

work page doi:10.1109/tbdata.2019.2921572 2021
[6]

Weize Kong, Rui Li, Jie Luo, Aston Zhang, Yi Chang, and James Allan. 2015. Predicting Search Intent Based on Pre-Search Context. InProceedings of the 38th SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Koji Nishikawa and Makoto P. Kato International ACM SIGIR Conference on Research and Development in Information Retrieval(Santiago, Chile)(SIGIR ’1...

work page doi:10.1145/2766462.2767757 2015
[7]

Markus Koskela, Petri Luukkonen, Tuukka Ruotsalo, Mats SjÖberg, and Patrik Floréen. 2018. Proactive Information Retrieval by Capturing Search Intent from Primary Task Context.ACM Trans. Interact. Intell. Syst.8, 3, Article 20, 25 pages. doi:10.1145/3150975

work page doi:10.1145/3150975 2018
[8]

Liebling, Paul N

Daniel J. Liebling, Paul N. Bennett, and Ryen W. White. 2012. Anticipatory search: using context to initiate search. InProceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (Portland, Oregon, USA)(SIGIR ’12). Association for Computing Machinery, New York, NY, USA, 1035–1036. doi:10.1145/2348283.2348456

work page doi:10.1145/2348283.2348456 2012
[9]

Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld

work page
[10]

S 2 ORC : The Semantic Scholar Open Research Corpus

S2ORC: The Semantic Scholar Open Research Corpus. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 4969–4983. doi:10.18653/v1/2020.acl-main.447

work page doi:10.18653/v1/2020.acl-main.447 2020
[11]

Chuan Meng, Francesco Tonolini, Fengran Mo, Nikolaos Aletras, Emine Yilmaz, and Gabriella Kazai. 2025. Bridging the Gap: From Ad-hoc to Proactive Search in Conversations(SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 64–74. doi:10.1145/3726302.3729915

work page doi:10.1145/3726302.3729915 2025
[12]

Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Sara- van Rajmohan, Weiwei Deng, Dongmei Zhang, and Feng Sun. 2025. Token- level Proximal Policy Optimization for Query Generation. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processi...

work page doi:10.18653/v1/2025.emnlp-main.1589 2025
[13]

Jan Heinrich Reimer, Sebastian Schmidt, Maik Fröbe, Lukas Gienapp, Harrisen Scells, Benno Stein, Matthias Hagen, and Martin Potthast. 2023. The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in In...

work page doi:10.1145/3539618.3591890 2023
[14]

Dumais, and Eric Horvitz

Jaime Teevan, Susan T. Dumais, and Eric Horvitz. 2010. Potential for per- sonalization.ACM Trans. Comput.-Hum. Interact.17, 1, Article 4, 31 pages. doi:10.1145/1721831.1721835

work page doi:10.1145/1721831.1721835 2010
[15]

Tung Vuong, Giulio Jacucci, and Tuukka Ruotsalo. 2017. Proactive Information Retrieval via Screen Surveillance. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval(Shinjuku, Tokyo, Japan)(SIGIR ’17). Association for Computing Machinery, New York, NY, USA, 1313–1316. doi:10.1145/3077136.3084151

work page doi:10.1145/3077136.3084151 2017
[16]

Tung Vuong, Giulio Jacucci, and Tuukka Ruotsalo. 2017. Watching inside the Screen: Digital Activity Monitoring for Task Recognition and Proactive Informa- tion Retrieval.Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies1, 3 (11 Sept. 2017). doi:10.1145/3130974

work page doi:10.1145/3130974 2017
[17]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training.arXiv preprint arXiv:2212.03533(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022