super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

511 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 511 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Next-Billion AI Index: The compass for AI utility and adoption in the global majority

cs.CY · 2026-05-29 · unverdicted · novelty 7.0

Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.

Vector Linking via Cross-Model Local Isometric Consistency

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

A reference-based geometric hashing method recovers cross-model vector correspondences by exploiting local isometric consistency in contrastive embeddings and iteratively bootstrapping from a seed of paired anchors.

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Introduces SANSA paradigm for semantic-agnostic vision-language segmentation via dictionary or example-based prompts, with finetuning delivering up to 20% mIoU gains on the new task while retaining standard performance.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

MATCHA: Matching Text via Contrastive Semantic Alignment

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

StakeBench is a new benchmark using market-derived supervision from resolved prediction markets to test LLMs on commitment detection, side identification, action anticipation, and odds projection, revealing partial success on sides but structural failures on higher tasks.

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

Representational convergence across 16 LLMs on 800 reasoning problems is stronger for failed tasks and pre-decision stages but shows minimal causal influence on predictions, pointing to shared processing constraints over shared reasoning.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

citing papers explorer

Showing 50 of 511 citing papers.

Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression cs.LG · 2026-04-23 · unverdicted · none · ref 5 · internal anchor
Sub-token routing in LoRA-adapted transformers adds a finer compression axis for KV caches, with query-independent and query-aware designs that improve efficiency under reduced budgets when combined with token-level selection.
SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference cs.NI · 2026-04-23 · unverdicted · none · ref 5 · internal anchor
SparKV reduces time-to-first-token by 1.3x-5.1x and energy use by 1.5x-3.3x for on-device LLM inference by adaptively choosing between cloud KV streaming and local computation while overlapping execution and adjusting for runtime conditions.
AVISE: Framework for Evaluating the Security of AI Systems cs.CR · 2026-04-22 · unverdicted · none · ref 47 · internal anchor
AVISE provides a new framework and automated SET that identifies jailbreak vulnerabilities in language models with 92% accuracy, finding all nine tested models vulnerable to an augmented Red Queen attack.
Rethinking Intrinsic Dimension Estimation in Neural Representations cs.LG · 2026-04-22 · unverdicted · none · ref 62 · internal anchor
Common ID estimators fail to track the true intrinsic dimension of neural representations and are instead driven by other factors.
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization cs.CL · 2026-04-21 · unverdicted · none · ref 32 · internal anchor
LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 57 · internal anchor
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation cs.CL · 2026-04-21 · unverdicted · none · ref 15 · internal anchor
SHADE adaptively combines coverage and spectral signals to estimate semantic alphabet size from few LLM samples, yielding better performance than baselines in low-sample regimes for alphabet estimation and QA error detection.
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection cs.CR · 2026-04-21 · unverdicted · none · ref 25 · internal anchor
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
Reasoning Structure Matters for Safety Alignment of Reasoning Models cs.AI · 2026-04-21 · unverdicted · none · ref 2 · internal anchor
Changing the internal reasoning structure of large reasoning models through simple supervised fine-tuning on 1K examples produces strong safety alignment that generalizes across tasks and languages.
MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model cs.CE · 2026-04-20 · unverdicted · none · ref 7 · internal anchor
MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization cs.AR · 2026-04-20 · unverdicted · none · ref 34 · internal anchor
AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.
TLoRA: Task-aware Low Rank Adaptation of Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 23 · internal anchor
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards cs.CL · 2026-04-20 · unverdicted · none · ref 33 · internal anchor
PDDL planning problems are used to generate about one million precise reasoning steps for training Process Reward Models, and adding this data to existing datasets improves LLM performance on both mathematical and non-mathematical reasoning benchmarks.
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition cs.AI · 2026-04-20 · unverdicted · none · ref 51 · internal anchor
Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation cs.CL · 2026-04-19 · unverdicted · none · ref 130 · internal anchor
LLMs prompted with few-shot examples and rationales generate better reasoned distractors for MCQs than fine-tuned contrastive models across six benchmarks.
Phase-Scheduled Multi-Agent Systems for Token-Efficient Coordination cs.AI · 2026-04-19 · unverdicted · none · ref 9 · internal anchor
PSMAS reduces token use in LLM multi-agent systems by 27.3% on average via phase-based temporal scheduling and context compression, with task performance staying within 2.1 points of full activation.
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification cs.AI · 2026-04-18 · unverdicted · none · ref 14 · internal anchor
Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
Expressing Social Emotions: Misalignment Between LLMs and Human Cultural Emotion Norms cs.CL · 2026-04-18 · unverdicted · none · ref 60 · internal anchor
Frontier LLMs over-express engaging emotions relative to disengaging ones and generate deterministic responses that fail to match the cultural and individual diversity observed in human social emotion expression.
Aligning Backchannel and Dialogue Context Representations via Contrastive LLM Fine-Tuning cs.CL · 2026-04-17 · unverdicted · none · ref 4 · internal anchor
A contrastive LLM fine-tuning method creates joint embeddings for dialogue contexts and backchannel realizations, improving retrieval performance and alignment with human judgments over raw WavLM features.
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 19 · internal anchor
LLMs perform substantially better as pragmatic listeners judging language than as speakers generating it, revealing weak alignment between the two roles.
LLMs Corrupt Your Documents When You Delegate cs.CL · 2026-04-17 · unverdicted · none · ref 37 · internal anchor
LLMs corrupt an average of 25% of document content during long delegated editing workflows across 52 domains, even frontier models, and agentic tools do not mitigate the issue.
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models cs.LG · 2026-04-16 · unverdicted · none · ref 18 · internal anchor
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference cs.LG · 2026-04-16 · unverdicted · none · ref 12 · internal anchor
FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.
BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models cs.CR · 2026-04-15 · unverdicted · none · ref 42 · internal anchor
BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.
TOPCELL: Topology Optimization of Standard Cell via LLMs cs.LG · 2026-04-15 · unverdicted · none · ref 19 · internal anchor
TOPCELL reformulates standard cell topology optimization as an LLM generative task with GRPO fine-tuning, outperforming base models and matching exhaustive solvers with 85.91x speedup in 2nm/7nm industrial flows.
Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints cs.AI · 2026-04-14 · unverdicted · none · ref 20 · internal anchor
Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.
ViLL-E: Video LLM Embeddings for Retrieval cs.CV · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
ViLL-E introduces a dynamic embedding mechanism and joint contrastive-generative training for VideoLLMs, delivering up to 7% gains in temporal localization and 4% in video retrieval while enabling new zero-shot capabilities.
MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis cs.CL · 2026-04-13 · unverdicted · none · ref 4 · internal anchor
Adversarial evolution of constraint graphs generates diverse mathematical reasoning datasets that enable 1K-sample fine-tuning to outperform standard datasets like LIMO and s1K on eight benchmarks with better out-of-distribution generalization.
ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval cs.IR · 2026-04-13 · unverdicted · none · ref 3 · internal anchor
ARHN refines hard-negative training data for dense retrieval by using LLMs to convert answer-containing passages into additional positives and exclude answer-containing passages from the negative set.
CausalGaze: Unveiling Hallucinations via Counterfactual Graph Intervention in Large Language Models cs.LG · 2026-04-13 · unverdicted · none · ref 2 · 2 links · internal anchor
CausalGaze detects LLM hallucinations by modeling internal states as causal graphs and using counterfactual interventions, yielding a 3.3% AUROC gain on TruthfulQA over baselines.
Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds cs.CL · 2026-04-13 · unverdicted · none · ref 5 · internal anchor
Mature small language models share nearly identical 21-emotion geometries across architectures with Spearman correlations 0.74-0.92 despite opposite behavioral profiles, while immature models restructure under RLHF and prior comprehension-generation differences decompose into four distinct layers.
When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities cs.CL · 2026-04-12 · unverdicted · none · ref 25 · internal anchor
Presents the Mediom multilingual multimodal idiom corpus and the HIDE hinting-based framework to benchmark and improve AI comprehension of figurative meanings across languages.
Strix: Re-thinking NPU Reliability from a System Perspective cs.AR · 2026-04-12 · unverdicted · none · ref 33 · internal anchor
Strix delivers sub-microsecond fault localisation, detection, and correction on NPUs with 1.04x slowdown and minimal hardware cost by system-level re-partitioning and targeted safeguards.
ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying cs.CR · 2026-04-10 · unverdicted · none · ref 9 · internal anchor
ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders cs.IR · 2026-04-09 · unverdicted · none · ref 22 · internal anchor
KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training cs.CR · 2026-04-09 · unverdicted · none · ref 31 · internal anchor
ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense cs.CR · 2026-04-09 · unverdicted · none · ref 12 · internal anchor
TrajGuard detects jailbreaks by tracking how hidden-state trajectories move toward high-risk regions during decoding, achieving 95% defense rate with 5.2 ms/token latency across tested attacks.
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs cs.CL · 2026-04-08 · unverdicted · none · ref 28 · internal anchor
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines cs.CR · 2026-04-08 · unverdicted · none · ref 21 · internal anchor
A single legitimate request can cause LLM orchestrators to output plans that violate security policies through the composition of benign subtasks, bypassing subtask-level checks.
AgentGate: A Lightweight Structured Routing Engine for the Internet of Agents cs.AI · 2026-04-08 · unverdicted · none · ref 24 · internal anchor
AgentGate decomposes routing into action decision and structural grounding stages, allowing small 3B-7B models to dispatch queries competitively on a curated benchmark after targeted fine-tuning.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 32 · internal anchor
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models cs.CL · 2026-04-07 · unverdicted · none · ref 1 · internal anchor
Knowledge editing in LLMs frequently achieves only surface compliance by mimicking target outputs without overwriting internal beliefs, as diagnosed by a new ICL self-assessment probe that also reveals instability from recursive edits.
Weakly Supervised Distillation of Hallucination Signals into Transformer Representations cs.AI · 2026-04-07 · unverdicted · none · ref 9 · internal anchor
Weak supervision signals can be distilled into LLM hidden states so that simple probes on internal activations detect hallucinations at inference without external tools.
CPT: Controllable and Editable Design Variations with Language Models cs.LG · 2026-04-06 · unverdicted · none · ref 7 · internal anchor
CPT is a fine-tuned language model that uses Creative Markup Language representations of professional designs to generate controllable, stylistically coherent, and fully editable design variations.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 14 · internal anchor
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing cs.LG · 2026-04-03 · unverdicted · none · ref 25 · internal anchor
Stochastic training with random cross-layer KV attention enables depth-wise cache sharing in transformers, cutting memory footprint while preserving or improving performance.
Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference cs.IR · 2026-04-03 · conditional · none · ref 9 · internal anchor
LLMLingua prompt compression yields up to 18% end-to-end LLM speedups with unchanged quality when prompt length, ratio, and hardware align, plus an open profiler to predict the break-even point.
FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving cs.LG · 2026-04-03 · unverdicted · none · ref 27 · internal anchor
FluxMoE decouples MoE expert weights from persistent GPU residency via on-demand paging, achieving up to 3x throughput gains over vLLM in memory-constrained inference without accuracy loss.
Can Heterogeneous Language Models Be Fused? cs.AI · 2026-04-02 · unverdicted · none · ref 43 · internal anchor
HeteroFusion fuses heterogeneous LLMs via topology-based alignment and conflict-aware denoising, outperforming merging and ensemble baselines in cross-family and multi-source settings.
Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs cs.CL · 2026-03-27 · unverdicted · none · ref 18 · internal anchor
Hallucination neurons in LLMs are domain-specific, with cross-domain classifiers dropping from AUROC 0.783 within-domain to 0.563 across domains.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer