super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

546 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 546 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Entropy-Gated Latent Recursion

cs.LG · 2026-06-15 · unverdicted · novelty 8.0 · 2 refs

EGLR adds a deterministic layer-recursion axis gated by entropy that is complementary to temperature sampling, raising joint oracle accuracy on MATH-500 from 83.4% to 91.6% for a 3B model.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Next-Billion AI Index: The compass for AI utility and adoption in the global majority

cs.CY · 2026-05-29 · unverdicted · novelty 7.0

Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.

Vector Linking via Cross-Model Local Isometric Consistency

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

A reference-based geometric hashing method recovers cross-model vector correspondences by exploiting local isometric consistency in contrastive embeddings and iteratively bootstrapping from a seed of paired anchors.

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Introduces SANSA paradigm for semantic-agnostic vision-language segmentation via dictionary or example-based prompts, with finetuning delivering up to 20% mIoU gains on the new task while retaining standard performance.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

citing papers explorer

Showing 46 of 546 citing papers.

Open-Sora Plan: Open-Source Large Video Generation Model cs.CV · 2024-11-28 · unverdicted · none · ref 7 · internal anchor
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? cs.SE · 2024-10-29 · unverdicted · none · ref 60 · internal anchor
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024-09-27 · conditional · none · ref 12 · internal anchor
MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation cs.RO · 2024-09-19 · unverdicted · none · ref 6 · internal anchor
TinyVLA achieves faster inference and higher data efficiency than OpenVLA on robotic manipulation tasks by initializing from high-speed multimodal models and adding a diffusion policy decoder, without any pre-training phase.
Qwen2.5-Coder Technical Report cs.CL · 2024-09-18 · unverdicted · none · ref 23 · internal anchor
Qwen2.5-Coder models claim state-of-the-art results on over 10 code benchmarks, outperforming larger models of similar size.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs cs.CV · 2024-06-11 · unverdicted · none · ref 20 · internal anchor
VideoLLaMA 2 improves video LLMs via a new STC connector for spatial-temporal dynamics and joint audio training, reaching competitive results on video QA and captioning benchmarks.
Automatic Combination of Sample Selection Strategies for Few-Shot Learning cs.LG · 2024-02-05 · unverdicted · none · ref 6 · internal anchor
ACSESS automatically combines 23 sample selection strategies to outperform individual strategies in few-shot learning on text and image datasets.
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism cs.CL · 2024-01-05 · unverdicted · none · ref 135 · internal anchor
DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.
Customized Generative AI Agent for Transportation Engineering Practice: A Development and Continued Pre-training Guideline cs.AI · 2026-06-27 · unverdicted · none · ref 16 · internal anchor
A framework is described for adapting six LLMs to transportation engineering via LoRA-based continued pretraining on domain documents, with two models showing strongest results on BLEU-4 and ROUGE metrics.
Hybrid Metadata Extraction from League of Nations Index Cards: From Feasibility Study to Archival System Integration cs.DL · 2026-06-04 · unverdicted · none · ref 8 · internal anchor
A hybrid AI workflow combining fine-tuned vision-language models with specialized OCR extracts metadata from League of Nations index cards for improved archival access in the LONTAD project.
Mellum2 Technical Report cs.CL · 2026-05-29 · unverdicted · none · ref 33 · internal anchor
Mellum 2 is a 12B MoE model with 2.5B active parameters, trained on 10.6T tokens with MoE, GQA, SWA, and MTP, then post-trained into Instruct and Thinking variants, claimed competitive with 4B-14B models at 2.5B compute.
K-Quantization and its Impact on Output Performance cs.CL · 2026-05-19 · unverdicted · none · ref 38 · internal anchor
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.
Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance cs.CL · 2026-05-14 · unverdicted · none · ref 2 · internal anchor
Analysis of 144 task-model pairs finds mathematical reasoning produces the highest attention entropy in all architectures while decoder models show significantly higher sparsity than encoders.
Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models cs.LG · 2026-05-14 · unverdicted · none · ref 32 · internal anchor
Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.
Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers cs.LG · 2026-05-09 · unverdicted · none · ref 14 · internal anchor
This survey organizes LLM optimizer literature into categories and argues the field is shifting toward rigorous, multi-factor comparisons of convergence, memory, stability, and complexity.
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems cs.LG · 2026-05-08 · unverdicted · none · ref 144 · internal anchor
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities cs.DC · 2026-04-24 · unverdicted · none · ref 76 · internal anchor
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
SLM Finetuning for Natural Language to Domain Specific Code Generation in Production cs.LG · 2026-04-10 · unverdicted · none · ref 13 · internal anchor
Fine-tuned small language models outperform larger models in natural language to domain-specific code generation with improved performance, latency, and the ability to adapt to customer-specific scenarios without losing general capabilities.
Developing an ESG-Oriented Large Language Model through ESG Practices cs.CE · 2026-03-20 · unverdicted · none · ref 14 · internal anchor
ESG-adapted versions of Qwen-3-4B using LoRA and IRM outperform the base model and Llama-3/Gemma-3 baselines on generative ESG question-answering tasks.
AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval cs.IR · 2026-03-17 · unverdicted · none · ref 18 · internal anchor
AgriIR is a configurable RAG framework using modular stages and 1B-parameter models to deliver grounded, citable answers for Indian agricultural information access.
LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems cs.LG · 2026-01-20 · unverdicted · none · ref 78 · internal anchor
A survey taxonomy of LLMs identifies three scaling crises and six efficiency paradigms while tracing the shift from generation to tool-using agents.
Multilingual Vision-Language Models, A Survey cs.CL · 2025-09-26 · accept · none · ref 70 · internal anchor
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 115 · internal anchor
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation cs.CL · 2025-02-22 · unverdicted · none · ref 80 · internal anchor
Fine-tuning and data augmentation improve LLM performance on medical jargon extraction and prioritization from EHR notes, with augmented open-source models sometimes outperforming closed-source ones on 106 annotated notes.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 87 · internal anchor
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
Qwen2.5 Technical Report cs.CL · 2024-12-19 · unverdicted · none · ref 24 · internal anchor
Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models cs.SE · 2024-11-16 · unverdicted · none · ref 29 · internal anchor
Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 65 · internal anchor
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
Data-Centric Foundation Models in Computational Healthcare: A Survey cs.LG · 2024-01-04 · unverdicted · none · ref 128 · internal anchor
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.
Findings of the Counter Turing Test: AI-Generated Text Detection cs.CL · 2026-05-20 · unverdicted · none · ref 25 · internal anchor
Shared task findings show near-perfect binary detection of AI-generated text but greater difficulty in attributing outputs to particular language models.
High-Dimensional Statistics: Reflections on Progress and Open Problems math.ST · 2026-05-06 · unverdicted · none · ref 50 · 2 links · internal anchor
This review synthesizes representative advances in high-dimensional statistics, highlights common themes and open problems, and points to key entry works.
Applications of Large Language Models in Radiation Oncology: From Workflow Automation to Clinical Intelligence physics.med-ph · 2026-04-03 · unverdicted · none · ref 55 · internal anchor
This review summarizes how large language models are being used for workflow automation, clinical decision support, and patient engagement in radiation oncology.
Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies cs.LG · 2026-03-30 · unverdicted · none · ref 27 · internal anchor
The paper surveys and taxonomizes inference optimization methods for large vision-language models across four categories while noting limitations and open problems.
Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis cs.LG · 2025-12-15 · unverdicted · none · ref 10 · internal anchor
On 349 financial headlines, embedding-based gradient boosting models overfit small validation sets and underperform trivial baselines, showing that embedding quality alone cannot solve low-data sentiment classification.
Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026) cs.CL · 2025-01-03 · unverdicted · none · ref 57 · internal anchor
A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale cs.CV · 2026-04-20 · unreviewed · ref 43 · internal anchor
TInR: Exploring Tool-Internalized Reasoning in Large Language Models cs.CL · 2026-04-12 · unreviewed · ref 1 · internal anchor
Learning to Predict Future-Aligned Research Proposals with Language Models cs.CL · 2026-03-28 · unreviewed · ref 1 · internal anchor
SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding q-bio.GN · 2026-01-19 · unreviewed · ref 24 · internal anchor
When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems cs.MA · 2026-01-15 · unreviewed · ref 29 · internal anchor
Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches cs.CL · 2025-12-14 · unreviewed · ref 11 · internal anchor
Selective Rotary Position Embedding cs.CL · 2025-11-21 · unreviewed · ref 28 · internal anchor
Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling cs.CL · 2025-07-08 · unreviewed · ref 12 · internal anchor
From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations cs.CL · 2025-07-07 · unreviewed · ref 19 · internal anchor
How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective stat.ME · 2025-02-25 · unreviewed · ref 31 · internal anchor
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees cs.CL · 2024-10-21 · unreviewed · ref 14 · internal anchor

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer