Large-scale scan of 1.2 TB arXiv source data uncovers thousands of PII leaks, exposed credentials, private links, and GPS-tagged files via a new LLM-assisted detection framework.
Preprint
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Low information density is identified as the root cause of NER failures on user-generated content, with the Window-Aware Optimization Module delivering up to 4.5% F1 gains and new SOTA on WNUT2017.
JPT enables bidirectional token classification in causal LLMs for zero-shot NER via input concatenation plus definition-guided embeddings, delivering +7.9 F1 gains and over 20x speedup on benchmarks.
K2V extends RLVR to knowledge-intensive domains by synthesizing verifiable data and verifying reasoning processes, yielding improved domain reasoning with preserved general capabilities.
A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.
citing papers explorer
-
You Have Been LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives Using Large Language Models
Large-scale scan of 1.2 TB arXiv source data uncovers thousands of PII leaks, exposed credentials, private links, and GPS-tagged files via a new LLM-assisted detection framework.
-
A Mechanism and Optimization Study on the Impact of Information Density on User-Generated Content Named Entity Recognition
Low information density is identified as the root cause of NER failures on user-generated content, with the Window-Aware Optimization Module delivering up to 4.5% F1 gains and new SOTA on WNUT2017.
-
Just Pass Twice: Efficient Token Classification with LLMs for Zero-Shot NER
JPT enables bidirectional token classification in causal LLMs for zero-shot NER via input concatenation plus definition-guided embeddings, delivering +7.9 F1 gains and over 20x speedup on benchmarks.
-
Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains
K2V extends RLVR to knowledge-intensive domains by synthesizing verifiable data and verifying reasoning processes, yielding improved domain reasoning with preserved general capabilities.
-
Retrieval-Augmented Generation with Graphs (GraphRAG)
A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.