mega hub Canonical reference

LLaMA: Open and Efficient Foundation Language Models

· 2023 · cs.CL · arXiv 2302.13971

Canonical reference. 82% of citing Pith papers cite this work as background.

1103 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 1103 citing papers arXiv PDF

abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 206 method 19 baseline 8 other 6 dataset 1 extension 1

citation-polarity summary

background 198 use method 20 unclear 13 baseline 7 extend 1 support 1 use dataset 1

claims ledger

abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0 · 2 refs

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

cs.SE · 2025-06-16 · conditional · novelty 8.0

First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

cs.HC · 2024-05-13 · conditional · novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

cs.CL · 2023-11-27 · unverdicted · novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

cs.CL · 2023-05-17 · accept · novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

cs.CL · 2023-04-14 · conditional · novelty 8.0

API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

cs.AI · 2026-06-26 · unverdicted · novelty 7.0

DynaSteer dynamically steers LLM reasoning trajectories toward truth via pattern clustering, Fisher-LDA projection, and entropy-triggered representation edits, improving performance on MATH and generalizing to coding.

A Sensitivity-Aware Test Collection for Search Among Personal Information

cs.IR · 2026-06-25 · accept · novelty 7.0

A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.

PatternGSL: A Structured Specification Language for Template-Free and Simulation-Ready 3D Garments

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

PatternGSL is a new template-free specification language for complete sewing patterns that enables direct single-image prediction of simulation-ready garments via a vision-language model, supported by a new 300K paired dataset.

Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

SPARE reformulates visual token pruning as column subset selection to minimize reconstruction error and uses anti-relevance for context-aware selection in VLMs.

citing papers explorer

Showing 44 of 44 citing papers after filters.

Privacy Auditing with Zero (0) Training Run cs.CR · 2026-05-14 · unverdicted · none · ref 39 · internal anchor
Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
Backdoor Attacks on Decentralised Post-Training cs.CR · 2026-03-31 · conditional · none · ref 12 · 2 links · internal anchor
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents cs.CR · 2024-06-19 · unverdicted · none · ref 56 · internal anchor
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference cs.CR · 2026-05-22 · unverdicted · none · ref 55 · internal anchor
CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.
TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting cs.CR · 2026-05-21 · unverdicted · none · ref 126 · internal anchor
TimeGuard defends time series forecasting against backdoors via channel-wise pool training initialized by time-aware criteria and expanded with distance-regularized loss selection, improving poisoned MAE by 1.96x while keeping clean MAE within 5%.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs cs.CR · 2026-05-14 · unverdicted · none · ref 23 · internal anchor
MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks cs.CR · 2026-05-09 · unverdicted · none · ref 40 · 2 links · internal anchor
PASA is an embedding-space watermarking method for LLM text that uses semantic clusters and synchronized randomness to achieve robustness against paraphrasing while remaining distortion-free.
Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs cs.CR · 2026-05-06 · unverdicted · none · ref 55 · internal anchor
PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization cs.CR · 2026-05-06 · conditional · none · ref 4 · internal anchor
TAGO performs sparse jailbreak optimization on audio LMs by retaining only high-gradient-energy tokens, preserving near-full ASR at 25% retention across three models.
Membership Inference Attacks Against Video Large Language Models cs.CR · 2026-04-29 · unverdicted · none · ref 36 · internal anchor
A temperature-perturbed black-box attack infers video training membership in VideoLLMs with 0.68 AUC by exploiting sharper generation behavior on member samples.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 24 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models cs.CR · 2026-04-22 · unverdicted · none · ref 19 · internal anchor
A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.
PrefixWall: Mitigating Prefix Caching Side Channels in Shared LLM Systems cs.CR · 2026-03-11 · unverdicted · none · ref 63 · internal anchor
PrefixWall mitigates APC side channels in multi-tenant LLM systems via selective prefix isolation, delivering up to 70% higher cache reuse and 30% lower latency than full-isolation baselines.
SoK: Honeypots & LLMs, More Than the Sum of Their Parts? cs.CR · 2025-10-29 · unverdicted · none · ref 111 · internal anchor
A systematization of knowledge paper that taxonomizes honeypot detection vectors, synthesizes LLM-honeypot literature into canonical architecture and evaluation methods, and proposes a roadmap for autonomous deception systems.
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From cs.CR · 2025-09-30 · unverdicted · none · ref 9 · internal anchor
SeedPrints fingerprints LLMs using persistent biases from initialization seeds for lineage verification across pretraining and adaptation stages.
PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs cs.CR · 2025-09-03 · unverdicted · none · ref 52 · internal anchor
PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.
Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense cs.CR · 2026-06-29 · unverdicted · none · ref 91 · internal anchor
Prompt injection defenses create a security-fidelity tradeoff with no model or defense achieving both high security and high fidelity on the SecFid benchmark across 1,168 examples.
Ellipsoid Control: A White-list Jailbreak Defense via Benign Latent Modeling cs.CR · 2026-05-23 · unverdicted · none · ref 1 · internal anchor
Ellipsoid Control is a white-list test-time jailbreak defense that fits an anisotropic ellipsoid from benign activations to constrain projected gradient descent updates, aiming to improve the safety-utility tradeoff over black-list RepE methods.
Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models cs.CR · 2026-05-18 · unverdicted · none · ref 42 · internal anchor
AIA generates universal interference audio infused with Acoustic Latent Semantics to bypass LALM safety alignment, achieving SOTA attack success rates on 10 models across five datasets.
Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models cs.CR · 2026-05-13 · unverdicted · none · ref 17 · 2 links · internal anchor
A hierarchical genetic algorithm induces overthinking in black-box large reasoning models by perturbing logical structure, achieving up to 26.1x longer outputs on the MATH benchmark.
Context-Aware Spear Phishing: Generative AI-Enabled Attacks Against Individuals via Public Social Media Data cs.CR · 2026-05-11 · conditional · none · ref 85 · internal anchor
Generative AI enables scalable, context-aware spear phishing by extracting profiles from public social media, producing emails that outperform real-world phishing samples in personalization and lower recipient suspicion.
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts cs.CR · 2026-05-07 · unverdicted · none · ref 4 · 2 links · internal anchor
PragLocker generates function-preserving but non-portable prompts for LLM agents via code-symbol semantic anchoring followed by target-model feedback noise injection.
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation cs.CR · 2026-05-06 · unverdicted · none · ref 131 · internal anchor
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference cs.CR · 2026-05-06 · conditional · none · ref 192 · internal anchor
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Revisiting JBShield: Breaking and Rebuilding Representation-Level Jailbreak Defenses cs.CR · 2026-05-04 · accept · none · ref 46 · internal anchor
JBShield is vulnerable to adaptive JB-GCG attacks (up to 53% ASR) because jailbreak representations occupy a distinct region in refusal-direction space; the new RTV defense using Mahalanobis detection on multi-layer fingerprints reaches 0.99 AUROC and limits adaptive ASR to 7%.
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking cs.CR · 2026-05-01 · unverdicted · none · ref 21 · 2 links · internal anchor
BREW uses block voting and window-shifting verification to reach TPR 0.965 and FPR 0.02 under 10% synonym substitution, addressing high false-positive issues in prior multi-bit LLM watermarking.
Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers cs.CR · 2026-04-23 · unverdicted · none · ref 2 · internal anchor
BadStyle creates stealthy backdoors in LLMs by poisoning samples with imperceptible style triggers and using an auxiliary loss to stabilize payload injection, achieving high attack success rates across multiple models while evading defenses.
AVISE: Framework for Evaluating the Security of AI Systems cs.CR · 2026-04-22 · unverdicted · none · ref 42 · internal anchor
AVISE provides a new framework and automated SET that identifies jailbreak vulnerabilities in language models with 92% accuracy, finding all nine tested models vulnerable to an augmented Red Queen attack.
Geometry-Aware Localized Watermarking for Copyright Protection in Embedding-as-a-Service cs.CR · 2026-04-13 · unverdicted · none · ref 33 · internal anchor
GeoMark decouples local watermark triggering from centralized ownership attribution using geometry-separated anchors and adaptive neighborhoods to improve robustness against paraphrasing, dimension changes, and clustering attacks while preserving utility.
CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks cs.CR · 2026-04-05 · unverdicted · none · ref 27 · internal anchor
CoopGuard deploys cooperative agents to track conversation history and counter evolving multi-round attacks on LLMs, achieving a 78.9% reduction in attack success rate on a new 5,200-sample benchmark.
MEASER: Malware embedding attacks on open-source LLMs cs.CR · 2025-10-12 · unverdicted · none · ref 30 · internal anchor
MEASER embeds malware into open-source LLMs via parameter targeting and MAR-QIM modulation, achieving 0 BER and high stealth even after quantization and PEFT.
Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI cs.CR · 2025-07-08 · unverdicted · none · ref 72 · internal anchor
Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
Fast Homomorphic Linear Algebra with BLAS cs.CR · 2025-03-20 · unverdicted · none · ref 55 · internal anchor
Reduces CKKS homomorphic matrix-vector and matrix-matrix products to plaintext BLAS equivalents, achieving 4-12x overhead versus double-precision floating-point square matrix multiplication.
Peering Behind the Shield: Guardrail Identification in Large Language Models cs.CR · 2025-02-03 · unverdicted · none · ref 36 · internal anchor
AP-Test identifies deployed guardrails in LLMs via adversarial prompt testing and a match score metric, reporting perfect accuracy on four open-source guardrails.
Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment cs.CR · 2024-05-20 · unverdicted · none · ref 24 · internal anchor
SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models cs.CR · 2023-08-07 · unverdicted · none · ref 81 · internal anchor
Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.
Adversarial Reframing: A Framework for Targeted Generation in Language Models cs.CR · 2026-05-20 · unverdicted · none · ref 48 · internal anchor
THREAT uses coordinated LLMs in an iterative optimization loop to generate jailbreak prompts that achieve higher success rates and lower detection rates than previous methods across tested models and datasets.
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications cs.CR · 2026-05-17 · unverdicted · none · ref 37 · internal anchor
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.
LymphNode: A Plug-and-Play Access Control Method for Deep Neural Networks cs.CR · 2026-05-15 · unverdicted · none · ref 2 · internal anchor
LymphNode enforces default-deny access control on DNNs by injecting GSUAP into the feature space to neutralize utility for unauthorized queries and selectively restore it for authorized inputs carrying a stealthy credential, using under 100 samples from surrogate data.
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective cs.CR · 2026-04-20 · unverdicted · none · ref 14 · internal anchor
BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.
FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model cs.CR · 2025-06-06 · unverdicted · none · ref 37 · internal anchor
FedShield-LLM integrates pruning and FHE on LoRA parameters to support secure, scalable federated fine-tuning of LLMs such as Llama-2.
From Detection to Response: A Deep Learning and Retrieval-Augmented Generation Framework for Network Intrusion Mitigation cs.CR · 2026-05-18 · unverdicted · none · ref 57 · internal anchor
Ensemble of three binary DNNs classifies network flows as benign, DoS or DDoS at 99.84% and 95.30% accuracy on CICIDS2018 and UNSW-NB15, paired with RAG to generate mitigation reports that outperform vanilla LLM outputs.
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey cs.CR · 2024-09-26 · unverdicted · none · ref 150 · internal anchor
Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.
SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits cs.CR · 2026-04-01 · unreviewed · ref 20 · internal anchor

LLaMA: Open and Efficient Foundation Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer