super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

536 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 536 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Next-Billion AI Index: The compass for AI utility and adoption in the global majority

cs.CY · 2026-05-29 · unverdicted · novelty 7.0

Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.

Vector Linking via Cross-Model Local Isometric Consistency

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

A reference-based geometric hashing method recovers cross-model vector correspondences by exploiting local isometric consistency in contrastive embeddings and iteratively bootstrapping from a seed of paired anchors.

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Introduces SANSA paradigm for semantic-agnostic vision-language segmentation via dictionary or example-based prompts, with finetuning delivering up to 20% mIoU gains on the new task while retaining standard performance.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

MATCHA: Matching Text via Contrastive Semantic Alignment

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.

citing papers explorer

Showing 50 of 536 citing papers.

SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference cs.LG · 2026-04-24 · unverdicted · none · ref 19 · internal anchor
SpikingBrain2.0 is a 5B hybrid spiking-Transformer that recovers most base model performance while delivering 10x TTFT speedup at 4M context and supporting over 10M tokens on limited GPUs via dual sparse attention and dual quantization paths.
A Sociotechnical, Practitioner-Centered Approach to Technology Adoption in Cybersecurity Operations: An LLM Case cs.CR · 2026-04-23 · unverdicted · none · ref 54 · internal anchor
A six-month ethnographic co-creation project in a real SOC demonstrates that practitioner involvement in LLM tool design can overcome typical adoption barriers in cybersecurity operations.
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion cs.AI · 2026-04-23 · unverdicted · none · ref 16 · internal anchor
GS-Quant generates coarse-to-fine discrete codes for KG entities via semantic hierarchy injection and causal sequence reconstruction, enabling LLMs to perform knowledge graph completion by treating the codes as vocabulary tokens.
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures cs.AI · 2026-04-23 · unverdicted · none · ref 13 · 2 links · internal anchor
ReCAPA adds predictive correction and multi-level semantic alignment to VLA models, plus two new metrics for tracking error spread and recovery, yielding competitive benchmark results over LLM baselines.
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling cs.LG · 2026-04-21 · unverdicted · none · ref 9 · internal anchor
Nexusformer uses a three-stage nonlinear mapping in attention to enable stable, inheritable scaling of transformers, matching baseline perplexity with up to 41.5% less compute when growing from 240M to 440M parameters.
Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression cs.AI · 2026-04-21 · unverdicted · none · ref 113 · internal anchor
LightEdit enables scalable lifelong knowledge editing in LLMs via selective knowledge retrieval and probability suppression during decoding, outperforming prior methods on ZSRE, Counterfact, and RIPE while reducing training costs.
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation cs.IR · 2026-04-21 · unverdicted · none · ref 78 · internal anchor
STK-Adapter adds Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate evolving TKG graphs and event chains into LLMs, reducing information loss and improving extrapolation performance over prior methods.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 28 · internal anchor
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling cs.LG · 2026-04-20 · unverdicted · none · ref 38 · internal anchor
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.
Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning cs.CV · 2026-04-20 · unverdicted · none · ref 15 · internal anchor
A vision-language model pre-trained via instruction tuning on CT-report pairs improves survival prediction accuracy over baselines, especially when clinical data alone is weak, while also producing text answers to clinical questions.
Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 14 · internal anchor
Fine-tuned recurrent models like Mamba2 produce competitive text embeddings with linear-time constant-memory inference via vertical chunking, outperforming transformers in memory use.
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation cs.CL · 2026-04-19 · unverdicted · none · ref 113 · internal anchor
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 150 · internal anchor
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
TStore: Rethinking AI Model Hub with Tensor-Centric Compression cs.DC · 2026-04-18 · unverdicted · none · ref 48 · 2 links · internal anchor
TStore reduces AI model storage via tensor-level fingerprinting, clustering, and compression without annotations while claiming to preserve usability.
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention cs.DC · 2026-04-18 · unverdicted · none · ref 5 · internal anchor
HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and
Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs cs.LG · 2026-04-17 · unverdicted · none · ref 23 · internal anchor
Pruning removes 'unsafe tickets' from LLMs via gradient-free attribution, reducing harmful outputs and jailbreak vulnerability with minimal utility loss.
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking cs.IR · 2026-04-17 · unverdicted · none · ref 36 · internal anchor
AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.
IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation cs.CL · 2026-04-16 · unverdicted · none · ref 20 · internal anchor
IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.
Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs? cs.CL · 2026-04-13 · unverdicted · none · ref 3 · internal anchor
Big Five personality traits become decodable early in LLMs, are represented by mid-layer selective neurons, and can be shifted by targeted activation interventions, though effects on generated labels are weaker and spill across traits.
Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer cs.CL · 2026-04-13 · unverdicted · none · ref 12 · internal anchor
BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness cs.CL · 2026-04-11 · unverdicted · none · ref 2 · internal anchor
FAITH improves LLM factual accuracy by mapping confidence and semantic entropy into natural-language knowledge-state quadrants for trustworthiness and honestness, then applying PPO with a combined reward and retrieval augmentation.
Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning cs.CL · 2026-04-11 · unverdicted · none · ref 208 · internal anchor
APMPO boosts average Pass@1 scores on math reasoning benchmarks by 3 points over GRPO by using an adaptive power-mean policy objective and feedback-driven clipping bounds in RLVR training.
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs cs.CL · 2026-04-11 · unverdicted · none · ref 223 · internal anchor
FREIA applies free energy principles and adaptive advantage shaping to unsupervised RL, outperforming baselines by 0.5-3.5 Pass@1 points on math reasoning with a 1.5B model.
Policy-Aware Edge LLM-RAG Framework for Internet of Battlefield Things Mission Orchestration cs.NI · 2026-04-10 · unverdicted · none · ref 19 · internal anchor
PA-LLM-RAG adds policy retrieval and dual-LLM verification to enable reliable low-latency mission orchestration in simulated IoBT environments, with Gemma-2B reaching 100% policy compliance at 4.17s latency.
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations cs.CR · 2026-04-07 · unverdicted · none · ref 40 · internal anchor
LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.
PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning cs.CL · 2026-04-06 · unverdicted · none · ref 33 · internal anchor
PassiveQA trains models via supervised finetuning to decide Answer, Ask, or Abstain using structured information-state representations and knowledge-graph context, yielding better abstention and lower hallucination on QA datasets.
SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs cs.IR · 2026-03-30 · conditional · none · ref 22 · internal anchor
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations cs.CL · 2026-03-25 · unverdicted · none · ref 14 · internal anchor
CRVA-TGRAG combines parent-document segmentation, ensemble retrieval, and teacher-guided fine-tuning to mitigate knowledge conflicts and improve accuracy in LLM-based CVE vulnerability analysis.
Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali cs.CL · 2026-03-25 · unverdicted · none · ref 7 · internal anchor
Fine-tuning Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali data enables effective generation where zero-shot fails, with Qwen3-8B performing best overall and Llama-3.1-8B showing the largest gains.
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks cs.AI · 2026-03-12 · unverdicted · none · ref 27 · internal anchor
Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.
On the Limits of Layer Pruning for Generative Reasoning in Large Language Models cs.LG · 2026-02-02 · unverdicted · none · ref 15 · internal anchor
Layer pruning preserves classification performance in LLMs but fundamentally limits recovery of generative reasoning capabilities even after extensive self-supervised finetuning.
Different types of syntactic agreement recruit the same units within large language models cs.CL · 2025-12-03 · unverdicted · none · ref 29 · internal anchor
Different types of syntactic agreement recruit overlapping units within LLMs, indicating that agreement forms a meaningful functional category across English, Russian, Chinese, and structurally similar languages.
Chinese Short-Form Creative Content Generation via Explanation-Oriented Multi-Objective Optimization cs.CL · 2025-11-19 · unverdicted · none · ref 86 · internal anchor
MAGIC-HMO is a multi-agent framework that treats Chinese short-form creative NLG as heterogeneous multi-objective optimization over personalized constraints plus explanation reliability and outperforms baselines on a baby-naming benchmark.
LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation cs.LG · 2025-10-24 · unverdicted · none · ref 37 · internal anchor
LLM4Delay improves flight delay prediction accuracy by using instance-level projection to adapt LLMs for integrating textual aeronautical information with multiple aircraft trajectories.
Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters cs.CL · 2025-10-16 · conditional · none · ref 6 · internal anchor
A 300M multilingual embedding model matches or exceeds 7B retrieval performance via optimized data scale, hard negatives, and task diversity over language diversity.
SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba cs.NE · 2025-10-06 · unverdicted · none · ref 11 · internal anchor
SpikingMamba distills Mamba into an SNN LLM achieving 4.76x energy savings with a 4.78% zero-shot accuracy gap that narrows to 2.23% after RL.
OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models cs.AI · 2025-10-01 · unverdicted · none · ref 33 · internal anchor
OntoLogX is a system that applies LLMs with ontology guidance, RAG, and iterative fixes to build valid knowledge graphs from cybersecurity logs and predict ATT&CK tactics from aggregated sessions.
BoHA: Blockwise Hadamard Product Adaptation for Parameter-Efficient Fine-Tuning cs.LG · 2025-09-25 · unverdicted · none · ref 14 · internal anchor
BoHA partitions frozen weights into a b by b grid and applies independent low-rank Hadamard factors per block, outperforming LoRA on matched-budget single-task averages while retaining 57.66% first-stage accuracy in a commonsense-to-arithmetic continual-learning test on Llama-3.2-3B.
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference cs.PF · 2025-08-22 · unverdicted · none · ref 24 · internal anchor
ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference while maintaining accuracy.
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection cs.CR · 2025-07-10 · unverdicted · none · ref 19 · internal anchor
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction cs.LG · 2025-07-03 · unverdicted · none · ref 16 · internal anchor
8:16 sparsity with variance correction and outlier handling lets compressed LLMs match or exceed dense-model accuracy under fixed memory limits, outperforming the common 2:4 pattern in flexibility.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 63 · internal anchor
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Task-conditioned probing of instruction-tuned multimodal LLMs: Region-specific brain alignment patterns under naturalistic stimuli q-bio.NC · 2025-06-09 · unverdicted · none · ref 7 · internal anchor
Instruction-tuned multimodal LLMs show higher brain alignment (~9-20% over baselines) under task-specific instructions during naturalistic video stimuli, with task-conditioned subspaces and weak semantic coupling.
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations cs.CV · 2025-06-08 · unverdicted · none · ref 12 · internal anchor
Synthetic clinical demonstrations at inference time improve safety of Med-VLMs against visual and textual jailbreaks while preserving general performance on medical tasks.
Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG cs.CR · 2025-06-04 · unverdicted · none · ref 46 · internal anchor
Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.
Beyond Words: Multimodal LLM Knows When to Speak cs.CV · 2025-05-20 · unverdicted · none · ref 11 · internal anchor
MM-When2Speak reformulates conversational timing as dense response-type prediction and achieves up to 3x better performance by integrating video, audio, and text cues on top of an LLM backbone using a new dyadic conversation dataset.
Advancing AI Research Assistants with Expert-Involved Learning cs.AI · 2025-05-03 · unverdicted · none · ref 39 · internal anchor
ARIEL evaluates LLMs and LMMs on full-length biomedical summarization and figure interpretation with blinded expert review, identifies limitations, and demonstrates gains from prompt engineering, fine-tuning, and an integrated agent for hypothesis generation.
Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning cs.CL · 2025-02-26 · unverdicted · none · ref 3 · internal anchor
LCG applies centroid clustering and confidence-guided semi-supervised selection to curate 6K-sample subsets that yield superior MT-bench performance compared with prior filtering methods when used for instruction tuning.
MSMO-ABSA: Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis cs.CL · 2025-02-19 · unverdicted · none · ref 17 · internal anchor
MSMO framework achieves claimed SOTA cross-lingual ABSA via sentence- and aspect-level alignment, code-switching, consistency training, and knowledge distillation.
Social media polarization during conflict: Insights from an ideological stance dataset on Israel-Palestine Reddit comments cs.CL · 2025-02-01 · unverdicted · none · ref 32 · internal anchor
A new labeled dataset of 9,969 Israel-Palestine Reddit comments is created and used to compare stance classification methods, with a specific Mixtral prompt achieving the highest performance.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer