Feder Cooper, Sanmi Koyejo, and Percy Liang

Extracting books from production language models , author= · 2026 · arXiv 2601.02671

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

LLMs show high memorization capability under prefix attacks but low propensity under generic or dataset-specific prompts, with continual pre-training further reducing both.

Overcoming Copyright Barriers in Corpus Distribution Through Non-Reversible Hashing

cs.CL · 2026-04-25 · conditional · novelty 7.0

A non-reversible hashing technique allows legal distribution of annotations for copyrighted texts by enabling alignment between user-owned copies and shared hashed data with high accuracy.

Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?

cs.DB · 2026-02-25 · unverdicted · novelty 7.0

New Text-to-Big SQL metrics show that LLM agents must balance accuracy with cost and speed at scale, where GPT-4o trades some accuracy for up to 12x speedup and GPT-5.2 proves more cost-effective than Gemini 3 Pro on large inputs.

Cheap Expertise: Mapping and Challenging Industry Perspectives in the Expert Data Gig Economy

cs.CY · 2026-05-05 · unverdicted · novelty 5.0

AI data firms view human expertise as an extractable, low-cost resource to feed AI systems while treating institutional expertise as something needing liberation or reform to fit this model.

Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Merlin achieves byte-exact deduplication of text at up to 8.7 GB/s using SIMD-optimized hashing, reducing LLM context sizes by 13.9-71% with no data loss.

citing papers explorer

Showing 4 of 4 citing papers after filters.

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs cs.CL · 2026-06-04 · unverdicted · none · ref 7
LLMs show high memorization capability under prefix attacks but low propensity under generic or dataset-specific prompts, with continual pre-training further reducing both.
Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"? cs.DB · 2026-02-25 · unverdicted · none · ref 1
New Text-to-Big SQL metrics show that LLM agents must balance accuracy with cost and speed at scale, where GPT-4o trades some accuracy for up to 12x speedup and GPT-5.2 proves more cost-effective than Gemini 3 Pro on large inputs.
Cheap Expertise: Mapping and Challenging Industry Perspectives in the Expert Data Gig Economy cs.CY · 2026-05-05 · unverdicted · none · ref 4
AI data firms view human expertise as an extractable, low-cost resource to feed AI systems while treating institutional expertise as something needing liberation or reform to fit this model.
Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference cs.CL · 2026-05-11 · unverdicted · none · ref 4
Merlin achieves byte-exact deduplication of text at up to 8.7 GB/s using SIMD-optimized hashing, reducing LLM context sizes by 13.9-71% with no data loss.

Feder Cooper, Sanmi Koyejo, and Percy Liang

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer