super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

575 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 575 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Entropy-Gated Latent Recursion

cs.LG · 2026-06-15 · unverdicted · novelty 8.0 · 2 refs

EGLR adds a deterministic layer-recursion axis gated by entropy that is complementary to temperature sampling, raising joint oracle accuracy on MATH-500 from 83.4% to 91.6% for a 3B model.

Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation

cs.CV · 2026-06-03 · unverdicted · novelty 8.0

A safety direction estimated in a source LLM is transported to a target generator through lightweight alignment on benign data alone, matching native safety performance without any target-side unsafe data.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Text-to-Image Models Need Less from Text Encoders Than You Think

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A bag-of-position-tagged-words embedding guides text-to-image diffusion models as effectively as full contextual text embeddings from standard encoders.

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

ClinicalMC is a benchmark of 1,275 Chinese and 5,804 English multi-course clinical samples across four stages, evaluated via a multi-agent framework on closed-source, open-source, and medical LLMs in static and dynamic settings.

EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Introduces EURO-5K dataset from 136 EU acts and benchmarks full fine-tuning vs QLoRA for BERT and LLM models on reporting obligation extraction, reporting 0.89 F1 with limited gains from legal pretraining except under parameter-efficient adaptation.

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Defines representational capacity as the upper bound on distinguishable near-orthogonal directions in transformer latent spaces, derived from embedding similarity distributions and an adjusted Johnson-Lindenstrauss formula dependent on the k/d ratio.

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

cs.DC · 2026-06-01 · unverdicted · novelty 7.0

A new fault-injection framework enables a systematic empirical study that produces 17 takeaways on error propagation in LLM inference and four software-only mitigation directions.

citing papers explorer

Showing 32 of 32 citing papers after filters.

Privacy Auditing with Zero (0) Training Run cs.CR · 2026-05-14 · unverdicted · none · ref 21 · internal anchor
Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
Backdoor Attacks on Decentralised Post-Training cs.CR · 2026-03-31 · conditional · none · ref 9 · internal anchor
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs cs.CR · 2025-11-27 · conditional · none · ref 43 · internal anchor
CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback cs.CR · 2026-05-17 · unverdicted · none · ref 22 · internal anchor
Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.
Attention Is Where You Attack cs.CR · 2026-04-30 · unverdicted · none · ref 10 · internal anchor
ARA jailbreaks safety-aligned LLMs like LLaMA-3 and Mistral by redirecting attention in safety-heavy heads with as few as 5 tokens, achieving 30-36% attack success while ablating the same heads barely affects refusals.
Provably Secure Steganography Based on List Decoding cs.CR · 2026-04-23 · conditional · none · ref 9 · internal anchor
List decoding enables a provably secure steganography scheme with higher embedding capacity for LLMs via candidate sets and suffix matching.
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models cs.CR · 2026-04-22 · unverdicted · none · ref 9 · internal anchor
A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.
LLM Agents can Autonomously Exploit One-day Vulnerabilities cs.CR · 2024-04-11 · unverdicted · none · ref 7 · internal anchor
GPT-4 LLM agents autonomously exploit 87% of tested one-day vulnerabilities when given CVE descriptions, far outperforming other models and tools.
When the Aggregator Cheats: Data-Free Backdoors in Federated LLM-based QA Systems cs.CR · 2026-06-25 · unverdicted · none · ref 23 · internal anchor
Two-stage gradient-inversion attack recovers 5-20% of client samples to inject stealthy ad backdoors into federated QA LLMs, reaching ~100% ASR with negligible clean-task drop.
Echoes within the Reasoning: Stealthy and Effective Watermarking via Chain of Thought cs.CR · 2026-05-27 · unverdicted · none · ref 22 · internal anchor
BiCoT embeds watermarks into the internal geometry of Chain-of-Thought reasoning traces in LLMs via private signature subspace alignment and introduces Robust Subspace Registration for black-box verification under attacks.
Ellipsoid Control: A White-list Jailbreak Defense via Benign Latent Modeling cs.CR · 2026-05-23 · unverdicted · none · ref 26 · internal anchor
Ellipsoid Control is a white-list test-time jailbreak defense that fits an anisotropic ellipsoid from benign activations to constrain projected gradient descent updates, aiming to improve the safety-utility tradeoff over black-list RepE methods.
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation cs.CR · 2026-05-18 · unverdicted · none · ref 6 · internal anchor
P2F generates low-rank parameter increments for LLM fingerprinting directly from textual descriptions in a single forward pass.
PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems cs.CR · 2026-05-15 · unverdicted · none · ref 47 · internal anchor
PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
EVA: Editing for Versatile Alignment against Jailbreaks cs.CR · 2026-05-14 · unverdicted · none · ref 74 · internal anchor
EVA applies direct model editing to surgically neutralize jailbreak vulnerabilities in LLMs and VLMs by targeting specific neurons while preserving general capabilities.
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation cs.CR · 2026-05-06 · unverdicted · none · ref 70 · internal anchor
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking cs.CR · 2026-05-01 · unverdicted · none · ref 7 · 2 links · internal anchor
BREW uses block voting and window-shifting verification to reach TPR 0.965 and FPR 0.02 under 10% synonym substitution, addressing high false-positive issues in prior multi-bit LLM watermarking.
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning cs.CR · 2026-04-30 · unverdicted · none · ref 14 · internal anchor
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
AVISE: Framework for Evaluating the Security of AI Systems cs.CR · 2026-04-22 · unverdicted · none · ref 47 · internal anchor
AVISE provides a new framework and automated SET that identifies jailbreak vulnerabilities in language models with 92% accuracy, finding all nine tested models vulnerable to an augmented Red Queen attack.
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection cs.CR · 2026-04-21 · unverdicted · none · ref 25 · internal anchor
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models cs.CR · 2026-04-15 · unverdicted · none · ref 42 · internal anchor
BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.
ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying cs.CR · 2026-04-10 · unverdicted · none · ref 9 · internal anchor
ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training cs.CR · 2026-04-09 · unverdicted · none · ref 31 · internal anchor
ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense cs.CR · 2026-04-09 · unverdicted · none · ref 12 · internal anchor
TrajGuard detects jailbreaks by tracking how hidden-state trajectories move toward high-risk regions during decoding, achieving 95% defense rate with 5.2 ms/token latency across tested attacks.
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines cs.CR · 2026-04-08 · unverdicted · none · ref 21 · internal anchor
A single legitimate request can cause LLM orchestrators to output plans that violate security policies through the composition of benign subtasks, bypassing subtask-level checks.
Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks cs.CR · 2025-11-16 · unverdicted · none · ref 1 · internal anchor
Backdoor defense for LLMs detects anomalous attention-head similarity on triggers and applies head-wise alignment via fine-tuning to reduce attack success.
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs cs.CR · 2025-04-08 · unverdicted · none · ref 36 · internal anchor
ShadowCoT introduces a reasoning-level backdoor attack on LLMs achieving 94.4% attack success rate and 88.4% hijacking success rate with 0.15% parameter updates via internal state conditioning and reasoning chain pollution.
Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment cs.CR · 2024-05-20 · unverdicted · none · ref 10 · internal anchor
SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications cs.CR · 2026-05-17 · unverdicted · none · ref 20 · internal anchor
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.
A Sociotechnical, Practitioner-Centered Approach to Technology Adoption in Cybersecurity Operations: An LLM Case cs.CR · 2026-04-23 · unverdicted · none · ref 54 · internal anchor
A six-month ethnographic co-creation project in a real SOC demonstrates that practitioner involvement in LLM tool design can overcome typical adoption barriers in cybersecurity operations.
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations cs.CR · 2026-04-07 · unverdicted · none · ref 40 · internal anchor
LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection cs.CR · 2025-07-10 · unverdicted · none · ref 19 · internal anchor
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG cs.CR · 2025-06-04 · unverdicted · none · ref 46 · internal anchor
Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer