super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

534 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 534 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Next-Billion AI Index: The compass for AI utility and adoption in the global majority

cs.CY · 2026-05-29 · unverdicted · novelty 7.0

Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.

Vector Linking via Cross-Model Local Isometric Consistency

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

A reference-based geometric hashing method recovers cross-model vector correspondences by exploiting local isometric consistency in contrastive embeddings and iteratively bootstrapping from a seed of paired anchors.

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Introduces SANSA paradigm for semantic-agnostic vision-language segmentation via dictionary or example-based prompts, with finetuning delivering up to 20% mIoU gains on the new task while retaining standard performance.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

MATCHA: Matching Text via Contrastive Semantic Alignment

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.

citing papers explorer

Showing 50 of 534 citing papers.

Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM cs.CL · 2026-06-28 · unverdicted · none · ref 39 · internal anchor
K-VEC is a coverage-aware KV-cache eviction strategy using cross-head and cross-layer modules that improves performance by up to 10.35 points over prior methods on LongBench subsets at fixed memory budget.
CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems cs.AI · 2026-05-30 · unverdicted · none · ref 4 · internal anchor
CoMIC is a parameter-free cloud-edge framework that circulates memory and insights between edge agents and a central critic to improve long-horizon LLM agent performance on symbolic and text tasks.
Dialectics of Alignment: Harnessing Unsafe Knowledge for Dynamic Safety Routing cs.LG · 2026-05-30 · unverdicted · none · ref 40 · internal anchor
SafeMoE isolates unsafe knowledge in domain-specific LoRA experts and routes them via a lightweight gate trained on safe responses to produce safer and more informative LLM outputs with zero-shot generalization.
Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs cs.LG · 2026-05-30 · unverdicted · none · ref 12 · internal anchor
Linear probes on mid-layer hidden states in quantized LLMs detect hallucinations at 0.904-1.000 AUROC, exceeding sampling baselines and showing consistent layer bands across model families.
COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models cs.CL · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
COFT is a decoding technique that creates masked counterfactual prompts, fuses logits to attenuate bias, and applies dual-branch split-conformal calibration to certify fair token sets with marginal validity guarantees under exchangeability.
Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode cs.AR · 2026-05-28 · conditional · none · ref 11 · internal anchor
Batch-1 autoregressive decode is memory-dominated yet launch overhead caps gains from higher-bandwidth GPUs, shown by measurements and CUDA Graphs ablation across four NVIDIA GPUs.
Representation Collapse in Sequential Post-Training of Large Language Models cs.LG · 2026-05-28 · unverdicted · none · ref 51 · internal anchor
Sequential post-training of LLMs induces representation collapse that correlates with reduced plasticity, weaker generalization, and poorer calibration, with lightweight interventions tested to mitigate it.
From Prompts to Context: An Ontology-Driven Framework for Human-Generative AI Collaboration cs.HC · 2026-05-28 · unverdicted · none · ref 31 · internal anchor
Presents the CCAI ontology and SPARQL retrieval method to convert ephemeral Human-Generative AI prompt interactions into explicit, machine-readable collaboration traces, illustrated in a competency-profile software case study.
Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models cs.AI · 2026-05-28 · unverdicted · none · ref 15 · internal anchor
An iterative writer-editor multi-agent LLM process improves perceived story quality in simulations of child collaborative storytelling.
Reverse Probing: Supervised Token-level Uncertainty Quantification for Large Language Models in Clinical Text cs.CL · 2026-05-27 · unverdicted · none · ref 2 · internal anchor
Reverse Probing extracts token-level uncertainty from LLM internal activations on labeled clinical summaries, outperforming eight baselines with up to 4x higher AUPRC on two expert-annotated datasets while lowering compute costs.
Sampling Data with Chains of Forward-Backward Diffusion Steps cs.LG · 2026-05-26 · unverdicted · none · ref 43 · internal anchor
U-turn chains are Markov chains formed by short forward-backward diffusion steps that remain on the learned manifold and, with Metropolis-Hastings, sample from energy-modified targets, exhibiting an ergodicity-breaking transition on fragmented manifolds.
Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override cs.CL · 2026-05-25 · unverdicted · none · ref 30 · internal anchor
Pretrained lexical priors in language models persist despite explicit remapping rules, as shown by a Stroop paradigm where prior strength predicts interference and activation patching localizes the repair mechanism.
SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack cs.CL · 2026-05-24 · unverdicted · none · ref 17 · internal anchor
SEP-Attack uses DPP-generated diverse surrogate ensemble weights to compute improved prediction confidence and word importance scores for selecting transferable adversarial text examples, outperforming baselines on four datasets and two APIs.
ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services cs.LG · 2026-05-23 · unverdicted · none · ref 27 · internal anchor
ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.
LLMs Show No Signs Of Individuated Metacognition cs.LG · 2026-05-22 · unverdicted · none · ref 16 · internal anchor
LLM confidence judgments are dominated by a shared difficulty factor across models, with the confidence-performance link collapsing after removing agreed items, yielding no evidence for individuated metacognition.
EVA: Accelerating LLM Decoding via an Efficient Vector Quantization Architecture cs.AR · 2026-05-22 · unverdicted · none · ref 28 · 2 links · internal anchor
EVA is a vector-quantization hardware architecture that transforms LLM decoding from GEMV to GEMM via direct codebook dot products and conflict-free output buffering, claiming up to 11.17x speedup over prior lookup designs.
Do Factual Recall Mechanisms Carry over from Text to Speech in Multimodal Language Models? cs.CL · 2026-05-21 · unverdicted · none · ref 4 · internal anchor
Causal mediation analysis on SpiritLM reveals discrepancies in factual recall between text-to-text and speech-to-text paths, indicating only partial carry-over of mechanisms from text to speech modality.
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations? cs.CL · 2026-05-17 · unverdicted · none · ref 3 · 2 links · internal anchor
LLMs assigned high or low status personas in multi-turn dialogues exhibit socio-cognitive effects including language coordination, pronoun patterns, persuasion success, and compliance with unsafe requests.
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications cs.CR · 2026-05-17 · unverdicted · none · ref 20 · internal anchor
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.
NGM: A Plug-and-Play Training-Free Memory Module for LLMs cs.AI · 2026-05-16 · unverdicted · none · ref 23 · internal anchor
NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.
Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation cs.DB · 2026-05-15 · unverdicted · none · ref 28 · internal anchor
Introduces FARO, a scalable quadratic optimization approach for fairness-aware top-k retrieval in RAG that mitigates generation bias via controlled reranking and position-aware propagation modeling.
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity cs.LG · 2026-05-13 · unverdicted · none · ref 62 · internal anchor
Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records cs.IR · 2026-05-12 · unverdicted · none · ref 66 · internal anchor
EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding cs.CL · 2026-05-12 · unverdicted · none · ref 28 · internal anchor
DCRD uses attention-map analysis to detect context-memory conflicts in LLMs and conditionally applies either greedy or fidelity-based dynamic decoding, achieving SOTA results on QA tasks across four models and six datasets.
When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
Paraesthesia is an emotion-style dynamic backdoor attack achieving ~99% success rate on instruction and classification tasks across four LLMs while preserving clean performance.
Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition cs.CV · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
A fine-tuned large language-vision model achieves 98% accuracy on visual question answering for military vehicle identification in SAR imagery from an extended MSTAR benchmark.
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings cs.CL · 2026-05-11 · unverdicted · none · ref 47 · internal anchor
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework cs.CL · 2026-05-11 · unverdicted · none · ref 23 · internal anchor
C-BPO personalizes LLMs via preference-calibrated binary signals and PU learning theory to isolate inter-user differences from shared task knowledge.
Quantifying the Utility of User Simulators for Building Collaborative LLM Assistants cs.CL · 2026-05-10 · unverdicted · none · ref 62 · internal anchor
Fine-tuned simulators grounded in real human data produce LLM assistants that win more often against real users than those trained against role-playing simulators.
Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies cs.LG · 2026-05-10 · unverdicted · none · ref 10 · internal anchor
A joint task-model adaptation method learns optimal weights for data selection indicators via ICL proxies on small validation sets, matching or exceeding full-dataset fine-tuning performance with only 30% of samples on GSM8K.
Do Linear Probes Generalize Better in Persona Coordinates? cs.AI · 2026-05-10 · unverdicted · none · ref 3 · 2 links · internal anchor
Persona axes derived from contrastive prompts and PCA yield linear probes that generalize better than raw-activation probes across 10 datasets for deception and sycophancy.
Max-pooling Network Revisited: Analyzing the Role of Semantic Probability in Multiple Instance Learning for Hallucination Detection cs.CL · 2026-05-09 · unverdicted · none · ref 14 · 2 links · internal anchor
A lightweight max-pooling network with MLP detects LLM hallucinations competitively without semantic consistency computations by adaptively aggregating internal token features.
Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards cs.LG · 2026-05-07 · unverdicted · none · ref 29 · 2 links · internal anchor
Develops a McDiarmid-type concentration inequality for causal autoregressive processes that preserves sparsity to achieve O(1) variance proxies instead of O(N).
Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling cs.CV · 2026-05-07 · unverdicted · none · ref 41 · internal anchor
A closed-loop system couples LLM-based 3D scene generation with RL optimization and VR user interactions to produce adaptive, immersive environments, claiming SOTA results on the ALFRED benchmark.
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization cs.LG · 2026-05-05 · unverdicted · none · ref 5 · 2 links · internal anchor
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.
Mesh Based Simulations with Spatial and Temporal awareness cs.LG · 2026-05-02 · unverdicted · none · ref 56 · internal anchor
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.
Make Your LVLM KV Cache More Lightweight cs.CV · 2026-05-01 · unverdicted · none · ref 44 · internal anchor
LightKV compresses vision-token KV cache in LVLMs to 55% size via prompt-guided cross-modality aggregation, halving cache memory, cutting compute 40%, and maintaining performance on benchmarks.
An Investigation of Linguistic Biases in LLM-Based Recommendations cs.CL · 2026-04-28 · unverdicted · none · ref 1 · internal anchor
LLMs exhibit dialect-dependent biases when recommending restaurants and products, with Mistral-small-3.1 and Llama-3.1 models showing heightened sensitivity to Indian English and code-switched prompts in specific categories.
Compute Aligned Training: Optimizing for Test Time Inference cs.LG · 2026-04-27 · unverdicted · none · ref 15 · 2 links · internal anchor
Derives new loss functions for SFT and RL that optimize directly for test-time inference operators like aggregation or filtering, with empirical gains in scaling.
Generating Place-Based Compromises Between Two Points of View cs.CL · 2026-04-27 · unverdicted · none · ref 37 · internal anchor
Empathic similarity feedback in prompts generates more acceptable compromises than chain-of-thought, and margin-based training on the resulting data lets smaller models produce them without ongoing empathy estimation.
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference cs.LG · 2026-04-25 · unverdicted · none · ref 27 · internal anchor
A hybrid JIT-CUDA Graph framework reduces TTFT by up to 66% and P99 latency versus TensorRT-LLM for single-GPU LLaMA-2 7B inference on short prompts.
SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference cs.LG · 2026-04-24 · unverdicted · none · ref 19 · internal anchor
SpikingBrain2.0 is a 5B hybrid spiking-Transformer that recovers most base model performance while delivering 10x TTFT speedup at 4M context and supporting over 10M tokens on limited GPUs via dual sparse attention and dual quantization paths.
A Sociotechnical, Practitioner-Centered Approach to Technology Adoption in Cybersecurity Operations: An LLM Case cs.CR · 2026-04-23 · unverdicted · none · ref 54 · internal anchor
A six-month ethnographic co-creation project in a real SOC demonstrates that practitioner involvement in LLM tool design can overcome typical adoption barriers in cybersecurity operations.
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion cs.AI · 2026-04-23 · unverdicted · none · ref 16 · internal anchor
GS-Quant generates coarse-to-fine discrete codes for KG entities via semantic hierarchy injection and causal sequence reconstruction, enabling LLMs to perform knowledge graph completion by treating the codes as vocabulary tokens.
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures cs.AI · 2026-04-23 · unverdicted · none · ref 13 · 2 links · internal anchor
ReCAPA adds predictive correction and multi-level semantic alignment to VLA models, plus two new metrics for tracking error spread and recovery, yielding competitive benchmark results over LLM baselines.
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling cs.LG · 2026-04-21 · unverdicted · none · ref 9 · internal anchor
Nexusformer uses a three-stage nonlinear mapping in attention to enable stable, inheritable scaling of transformers, matching baseline perplexity with up to 41.5% less compute when growing from 240M to 440M parameters.
Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression cs.AI · 2026-04-21 · unverdicted · none · ref 113 · internal anchor
LightEdit enables scalable lifelong knowledge editing in LLMs via selective knowledge retrieval and probability suppression during decoding, outperforming prior methods on ZSRE, Counterfact, and RIPE while reducing training costs.
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation cs.IR · 2026-04-21 · unverdicted · none · ref 78 · internal anchor
STK-Adapter adds Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate evolving TKG graphs and event chains into LLMs, reducing information loss and improving extrapolation performance over prior methods.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 28 · internal anchor
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling cs.LG · 2026-04-20 · unverdicted · none · ref 38 · internal anchor
Autoregressive transformer modeling with missingness-aware contrastive pre-training outperforms baselines on MIMIC-IV and eICU benchmarks and mitigates divergent behavior from removed modalities in clinical trajectories.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer