super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

623 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 623 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Entropy-Gated Latent Recursion

cs.LG · 2026-06-15 · unverdicted · novelty 8.0 · 2 refs

EGLR adds a deterministic layer-recursion axis gated by entropy that is complementary to temperature sampling, raising joint oracle accuracy on MATH-500 from 83.4% to 91.6% for a 3B model.

Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation

cs.AI · 2026-06-04 · accept · novelty 8.0

Across 30 LLMs and 205 TLA+ tasks, syntactic correctness reaches at most 26.6% and semantic correctness 8.6%, with all successes limited to progressive prompting and no advantage from larger models.

Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation

cs.CV · 2026-06-03 · unverdicted · novelty 8.0

A safety direction estimated in a source LLM is transported to a target generator through lightweight alignment on benign data alone, matching native safety performance without any target-side unsafe data.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Polar: A Benchmark for Evaluating Political Bias in LLMs

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Fine-tuned Mistral-7B via QLoRA achieves up to 12% higher F1 than GPT-4o on biomedical claim verification with 1008 examples, identifies a structural shortcut in SciFact, and shows robust cross-domain transfer from sound data.

Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data

cs.LG · 2026-06-10 · unverdicted · novelty 7.0

ICL in LLMs shows a sharp ceiling on categorical distributions for high-cardinality tabular data, failing to reproduce rare classes despite examples, while numerical fidelity improves.

INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

INFRAMIND is an infrastructure-aware multi-agent orchestration framework that uses RL on a hierarchical constrained MDP to jointly optimize topology, model selection, and scheduling under dynamic load.

citing papers explorer

Showing 40 of 40 citing papers after filters.

Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation cs.CV · 2026-06-03 · unverdicted · none · ref 25 · internal anchor
A safety direction estimated in a source LLM is transported to a target generator through lightweight alignment on benign data alone, matching native safety performance without any target-side unsafe data.
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? cs.CV · 2024-08-23 · conditional · none · ref 28 · internal anchor
MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.
Text-to-Image Models Need Less from Text Encoders Than You Think cs.CV · 2026-06-02 · unverdicted · none · ref 13 · internal anchor
A bag-of-position-tagged-words embedding guides text-to-image diffusion models as effectively as full contextual text embeddings from standard encoders.
What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness cs.CV · 2026-05-29 · unverdicted · none · ref 50 · internal anchor
The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement
Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models cs.CV · 2026-05-27 · unverdicted · none · ref 20 · internal anchor
Introduces SANSA paradigm for semantic-agnostic vision-language segmentation via dictionary or example-based prompts, with finetuning delivering up to 20% mIoU gains on the new task while retaining standard performance.
A Sanity Check on Composed Image Retrieval cs.CV · 2026-04-14 · unverdicted · none · ref 13 · internal anchor
The paper creates FISD, a controlled benchmark for composed image retrieval that removes query ambiguity via generative models, and proposes a multi-round agentic evaluation to assess models in interactive settings.
BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning cs.CV · 2026-04-13 · unverdicted · none · ref 9 · internal anchor
By drawing object boxes and motion trails visually on video frames instead of serializing coordinates as text, BoxTuning reduces token costs dramatically and improves accuracy on video question answering benchmarks.
Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining cs.CV · 2026-04-02 · unverdicted · none · ref 27 · internal anchor
Pretraining on 1M wild videos followed by post-training on curated data yields high-fidelity feedforward 3D avatars that generalize across identities, clothing, and lighting with emergent relightability and loose-garment support.
StochasT: Learning with Stochastic Turn Depth for Visual Instruction Tuning cs.CV · 2026-07-01 · unverdicted · none · ref 25 · internal anchor
StochasT uses stochastic clustering of language tasks into varying turn depths for the same image to improve LVLMs on both single-turn and multi-turn scenarios without discarding data.
MIRROR: Aligning Semantic Relations from Language to Image via Gromov--Wasserstein cs.CV · 2026-06-28 · unverdicted · none · ref 22 · internal anchor
MIRROR derives a closed-form Semi-Inverse Gromov-Wasserstein loss to align language-derived relational priors with visual representations inside decoder-only Transformers.
MedCTA: A Benchmark for Clinical Tool Agents cs.CV · 2026-06-10 · unverdicted · none · ref 20 · internal anchor
MedCTA is a new benchmark with 107 real-world tasks and process-aware metrics that shows frontier multimodal models remain brittle at autonomous tool selection, execution, and trajectory completion in clinical settings.
FlowNar: Scalable Streaming Narration for Long-Form Videos cs.CV · 2026-05-30 · unverdicted · none · ref 9 · internal anchor
FlowNar achieves bounded memory and 3x higher throughput for streaming narration on Ego4D, EgoExo4D, and EpicKitchens100 by combining dynamic historical context removal with a Cross Linear Attentive Memory module.
An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset? cs.CV · 2026-05-25 · unverdicted · none · ref 8 · internal anchor
Introduces ExtrAnom, a new multi-modal VAD dataset with 1001 videos and four textual descriptions per video, focused on women-centric anomalies like stalking and chain snatching.
Motion-Compensated Weight Compression cs.CV · 2026-05-23 · unverdicted · none · ref 26 · internal anchor
MCWC aligns permutation-symmetric blocks across layers to enable sequential prediction and residual entropy coding, improving rate-accuracy tradeoffs versus quantization and prior codecs on language and vision models.
Translating Signals to Languages for sEMG-Based Activity Recognition cs.CV · 2026-05-21 · unverdicted · none · ref 36 · internal anchor
LLM-sEMG maps sEMG signals to language via a dedicated mechanism to enable LLMs to perform accurate activity recognition.
Learning to See What You Need: Gaze Attention for Multimodal Large Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 149 · internal anchor
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
ViLL-E: Video LLM Embeddings for Retrieval cs.CV · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
ViLL-E introduces a dynamic embedding mechanism and joint contrastive-generative training for VideoLLMs, delivering up to 7% gains in temporal localization and 4% in video retrieval while enabling new zero-shot capabilities.
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling cs.CV · 2025-10-23 · unverdicted · none · ref 28 · internal anchor
RAPO++ is a three-stage prompt optimization framework combining retrieval-augmented refinement, closed-loop test-time scaling, and LLM fine-tuning to enhance text-to-video generation quality.
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs cs.CV · 2025-07-29 · unverdicted · none · ref 16 · internal anchor
ReGATE introduces a teacher-student adaptive token elision method that reduces training tokens to 38% while matching or exceeding baseline accuracy on multimodal benchmarks.
Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024-03-29 · conditional · none · ref 18 · internal anchor
Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark cs.CV · 2023-11-28 · accept · none · ref 29 · internal anchor
MVBench is a benchmark of 20 temporal video understanding tasks built by transforming static tasks into dynamic ones, with VideoChat2 outperforming prior MLLMs by over 15%.
EchoSonar-R: A Multi-View Reasoning-Enabled Model for Disease Classification and Report Generation in Echocardiography cs.CV · 2026-06-26 · unverdicted · none · ref 18 · internal anchor
EchoSonar-R is a multi-view VLM for echocardiography that jointly does disease classification and report generation via SFT followed by GRPO reinforcement learning, reporting accuracy gains on private and public data.
SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning cs.CV · 2026-06-22 · unverdicted · none · ref 248 · internal anchor
SingGuard introduces a policy-adaptive multimodal LLM guardrail with dynamic reasoning regimes and SingGuard-Bench, reporting SOTA F1 scores across 35 datasets and improved policy-following accuracy under runtime shifts.
Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning cs.CV · 2026-06-10 · unverdicted · none · ref 38 · internal anchor
TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.
Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition cs.CV · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
A fine-tuned large language-vision model achieves 98% accuracy on visual question answering for military vehicle identification in SAR imagery from an extended MSTAR benchmark.
Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling cs.CV · 2026-05-07 · unverdicted · none · ref 41 · internal anchor
A closed-loop system couples LLM-based 3D scene generation with RL optimization and VR user interactions to produce adaptive, immersive environments, claiming SOTA results on the ALFRED benchmark.
Make Your LVLM KV Cache More Lightweight cs.CV · 2026-05-01 · unverdicted · none · ref 44 · internal anchor
LightKV compresses vision-token KV cache in LVLMs to 55% size via prompt-guided cross-modality aggregation, halving cache memory, cutting compute 40%, and maintaining performance on benchmarks.
Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning cs.CV · 2026-04-20 · unverdicted · none · ref 15 · internal anchor
A vision-language model pre-trained via instruction tuning on CT-report pairs improves survival prediction accuracy over baselines, especially when clinical data alone is weak, while also producing text answers to clinical questions.
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations cs.CV · 2025-06-08 · unverdicted · none · ref 12 · internal anchor
Synthetic clinical demonstrations at inference time improve safety of Med-VLMs against visual and textual jailbreaks while preserving general performance on medical tasks.
Beyond Words: Multimodal LLM Knows When to Speak cs.CV · 2025-05-20 · unverdicted · none · ref 11 · internal anchor
MM-When2Speak reformulates conversational timing as dense response-type prediction and achieves up to 3x better performance by integrating video, audio, and text cues on top of an LLM backbone using a new dyadic conversation dataset.
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation cs.CV · 2024-08-29 · unverdicted · none · ref 23 · internal anchor
M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning cs.CV · 2026-06-10 · unverdicted · none · ref 134 · internal anchor
InternVideo3 introduces Multimodal Contextual Reasoning and M^2LA attention to enable closed-loop evidence accumulation in long-video understanding and agentic tool use, reporting strong benchmark results.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 82 · internal anchor
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
TerraQ: Spatiotemporal Question-Answering on Satellite Image Archives cs.CV · 2025-02-06 · unverdicted · none · ref 3 · internal anchor
TerraQ is a spatiotemporal question-answering engine for satellite image archives that processes natural language requests involving image metadata and knowledge base entities.
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction cs.CV · 2025-01-03 · conditional · none · ref 22 · internal anchor
VITA-1.5 integrates vision and speech into a single LLM through multi-stage training, delivering competitive benchmark results on image, video, and speech tasks with near real-time response speed.
Open-Sora Plan: Open-Source Large Video Generation Model cs.CV · 2024-11-28 · unverdicted · none · ref 7 · internal anchor
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024-09-27 · conditional · none · ref 12 · internal anchor
MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs cs.CV · 2024-06-11 · unverdicted · none · ref 20 · internal anchor
VideoLLaMA 2 improves video LLMs via a new STC connector for spatial-temporal dynamics and joint audio training, reaching competitive results on video QA and captioning benchmarks.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 87 · internal anchor
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale cs.CV · 2026-04-20 · unreviewed · ref 43 · internal anchor

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer