mega hub Canonical reference

LLaMA: Open and Efficient Foundation Language Models

· 2023 · cs.CL · arXiv 2302.13971

Canonical reference. 82% of citing Pith papers cite this work as background.

1061 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 1061 citing papers arXiv PDF

abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 206 method 19 baseline 8 other 6 dataset 1 extension 1

citation-polarity summary

background 198 use method 20 unclear 13 baseline 7 extend 1 support 1 use dataset 1

claims ledger

abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0 · 2 refs

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

cs.SE · 2025-06-16 · conditional · novelty 8.0

First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

cs.HC · 2024-05-13 · conditional · novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

cs.CL · 2023-11-27 · unverdicted · novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

cs.CL · 2023-05-17 · accept · novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

cs.CL · 2023-04-14 · conditional · novelty 8.0

API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

A Sensitivity-Aware Test Collection for Search Among Personal Information

cs.IR · 2026-06-25 · accept · novelty 7.0

A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.

Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

SPARE reformulates visual token pruning as column subset selection to minimize reconstruction error and uses anti-relevance for context-aware selection in VLMs.

End-to-End Text Line Detection and Ordering

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

Orli is an autoregressive image-to-sequence model that jointly detects text lines and determines their reading order on historical documents via chord-frame baselines, trained on 196k pages across ten scripts.

When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

Defines cost-aware RAG with evidence cost tiers and shows static selectors are brittle while agentic LLM-based selection is promising but model-dependent.

RWGBench: Evaluating Scholarly Positioning in Related Work Generation

cs.DL · 2026-05-30 · unverdicted · novelty 7.0

RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.

Next-Billion AI Index: The compass for AI utility and adoption in the global majority

cs.CY · 2026-05-29 · unverdicted · novelty 7.0

Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.

citing papers explorer

Showing 50 of 1061 citing papers.

DREAM: Dynamic Refinement of Early Assignment Mappings cs.IR · 2026-06-05 · unverdicted · none · ref 46 · internal anchor
DREAM proposes intent-aware tokenization, frozen-model evaluation, and dynamic beams to refine early SID assignments and improve cold-start performance in generative recommenders on Amazon benchmarks.
Training-free image inversion for one-step diffusion models cs.CV · 2026-05-31 · unverdicted · none · ref 54 · internal anchor
TFinv proposes iterative noise alignment and suffix learning to enable training-free inversion and editing for one-step diffusion models, achieving SOTA performance and higher efficiency than multistep methods.
When Data Is Scarce: Scaling Sparse Language Models with Repeated Training cs.LG · 2026-05-31 · unverdicted · none · ref 17 · internal anchor
Sparse LLMs in data-scarce multi-epoch regimes follow a scaling law based on active parameters, unique tokens, repetition count, and sparsity level that predicts performance and delays data saturation.
Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders cs.CV · 2026-05-30 · unverdicted · none · ref 105 · internal anchor
C-GSPN scales 2D spatial propagation to foundation vision encoders via a fast CUDA kernel, compressed blocks, and two-stage distillation, matching ViT performance with 15% fewer parameters and 4x block speedup at 2K resolution.
Saliency-Aware Model Merging cs.LG · 2026-05-30 · unverdicted · none · ref 18 · internal anchor
SA-Merging extends SynFlow-style saliency to task vectors, adds merge-aware modulation and iterative pruning, and applies rank-wise decomposition to LoRAs, narrowing the gap to test-time adaptation on vision and language tasks.
ProjQ: Project-and-Quantize for Adapter-Aware LLM Compression cs.LG · 2026-05-30 · unverdicted · none · ref 57 · internal anchor
ProjQ constrains post-training quantization noise to a low-rank manifold through orthogonal subspace projection, enabling better compensation by LoRA adapters and preserving greater model plasticity than standard PTQ.
Rethinking the Role of Temperature in Large Language Model Distillation cs.LG · 2026-05-29 · unverdicted · none · ref 15 · internal anchor
Including temperature scaling makes forward KL divergence outperform reverse KL in LLM distillation on instruction benchmarks, overturning the τ=1 preference for reverse KL.
InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate cs.LG · 2026-05-29 · unverdicted · none · ref 210 · internal anchor
InfoAtlas is a pretrained neural model for zero-shot mutual information estimation that matches state-of-the-art accuracy with 100x speedup and handles varying dimensions via a single model.
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization cs.LG · 2026-05-29 · unverdicted · none · ref 34 · internal anchor
An exposure-based split on BLiMP data reveals delayed generalization in five grammatical phenomena during LLM pre-training, with post-generalization shifts in concept vector predictiveness and attention patterns.
Contribution Weights: A Geometrical Analysis of Self-Attention Transformers cs.LG · 2026-05-29 · unverdicted · none · ref 51 · 2 links · internal anchor
Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.
DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression cs.LG · 2026-05-29 · unverdicted · none · ref 59 · internal anchor
DiffOR reformulates ordinal regression as continuous generative modeling using diffusion models with dual-decoupling to capture soft semantic transitions.
MLIPilot: LLM-Driven Auto-Research for Machine-Learned Interatomic Potentials physics.chem-ph · 2026-05-29 · unverdicted · none · ref 7 · internal anchor
MLIPilot deploys LLM agents to autonomously optimize MACE MLIP training on molecular and periodic datasets by proposing code edits and validating against a domain-specific scorecard.
Veda: Scalable Video Diffusion via Distilled Sparse Attention cs.CV · 2026-05-28 · unverdicted · none · ref 18 · internal anchor
Veda formulates tile selection in video diffusion attention as a reconstruction problem from full attention maps, using statistics-aware and head-aware scoring to enable high sparsity with maintained quality and hardware speedups up to 5.1x end-to-end.
From GPS Points to Travel Patterns: Flexible and Semantic Trajectory Generation with LLMs cs.AI · 2026-05-28 · unverdicted · none · ref 42 · internal anchor
HTP hierarchically generates travel patterns via RQ-VAE tokenization then uses SFT-tuned LLMs to produce conditioned trajectory sequences, outperforming baselines by 29.78% on two datasets.
STAP: A Shuffle-Tokenized App Predictor with Ultra Long Context for Vocabulary-Free Mobile App Prediction cs.LG · 2026-05-28 · unverdicted · none · ref 18 · internal anchor
A Transformer model with app-identity shuffling and ultra-long context achieves vocabulary-free next-app prediction with cross-dataset zero-shot capability and competitive cold-start performance.
AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling cs.CV · 2026-05-28 · unverdicted · none · ref 46 · internal anchor
AnyMo is a masked-modeling framework for any-modality human motion generation trained on the new OmniHuMo dataset of 5,000+ hours of multimodal motion sequences.
FedSmoothLoRA: Toward Smoother and Faster Convergence in Federated Low-Rank Adaptation cs.CV · 2026-05-28 · unverdicted · none · ref 2 · internal anchor
FedSmoothLoRA improves federated LoRA fine-tuning by constructing local initializations from a round-matching matrix for cross-round continuity and a gradient-aligned matrix for client-specific guidance, yielding faster convergence than prior methods in image and text tasks.
Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer cs.LG · 2026-05-27 · unverdicted · none · ref 31 · internal anchor
BiCo transfers task vectors across models differing in width, depth, and pre-training by estimating dual-space orthogonal Procrustes mappings from one forward-backward pass on a calibration set.
Locality-Aware Redundancy Pruning for LLM Depth Compression cs.LG · 2026-05-27 · unverdicted · none · ref 30 · internal anchor
LoRP uses a new Representation Locality Score derived from inter-layer hidden-state similarity to cluster layers and prune intra-cluster redundancies in one shot, yielding better perplexity and task accuracy than prior depth-pruning baselines across LLM families.
More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations cs.LG · 2026-05-26 · unverdicted · none · ref 38 · internal anchor
Mixture of Activations mixes activation functions token-adaptively in FFNs via lightweight gates, strictly more expressive than fixed or learnable activations, and yields lower pretraining loss from 0.12B to 2B models.
CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection cs.CV · 2026-05-26 · unverdicted · none · ref 67 · internal anchor
Introduces a commercial-model contrastive AIGC video dataset and a hybrid contrastive-MLLM detection framework claiming SOTA performance on realistic video forgery detection.
When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning cs.LG · 2026-05-25 · unverdicted · none · ref 13 · internal anchor
Task-preserving perturbations of correct exemplars can degrade ICL performance by changing the effective evidence mixture used for inference.
Not All Modalities Are Equal: Instruction-Aware Gating for Multimodal Videos cs.CV · 2026-05-25 · unverdicted · none · ref 37 · internal anchor
UniMVU applies instruction-conditioned inner-modality and modality-level gates to adaptively fuse multiple video modalities, achieving gains of up to 13.5 CIDEr on six benchmarks including AVQA and MVBench.
Channel-wise Vector Quantization cs.CV · 2026-05-25 · unverdicted · none · ref 38 · internal anchor
CVQ replaces patch-wise vector quantization with channel-wise quantization of feature maps, enabling a next-channel autoregressive model that reports 100% codebook utilization and text-to-image scores of DPG 86.7 and GenEval 0.79.
EfficientGraph-RAG: Structured Retrieval-State Management for Cross-Task Retrieval-Augmented Generation cs.CL · 2026-05-25 · unverdicted · none · ref 17 · internal anchor
EfficientGraph-RAG structures retrieval state with TAM, MARS and SMP, ranking first on averaged LongBench answer-quality metrics while cutting token use 3.51x on HotpotQA.
NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference cs.AR · 2026-05-22 · unverdicted · none · ref 3 · internal anchor
NASiC fuses CAM-based expert selection and multibit CIM computation in 3D NAND into one cycle for MoE LLM inference, claiming 4-114.8x performance and 3.9-70x energy efficiency gains over prior designs with high accuracy.
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling cs.LG · 2026-05-22 · unverdicted · none · ref 5 · internal anchor
SemiPrune uses a small labeled subset and semi-supervised pseudo-labeling to enable supervised dataset pruning methods, achieving state-of-the-art results on domain-specific, image-corrupted, and long-tailed datasets.
Model Collapse as Cultural Evolution cs.CL · 2026-05-21 · unverdicted · none · ref 19 · internal anchor
Iterated learning theory predicts and LLM experiments confirm non-monotonic compositionality during self-training, reframing model collapse as cultural transmission with matching human regularization patterns.
Cambrian-P: Pose-Grounded Video Understanding cs.CV · 2026-05-21 · unverdicted · none · ref 91 · internal anchor
Cambrian-P adds per-frame camera pose tokens and a regression head to video MLLMs, delivering 4.5-6.5% gains on spatial benchmarks, generalization to other video QA tasks, and SOTA streaming pose estimation on ScanNet.
LACO: Adaptive Latent Communication for Collaborative Driving cs.AI · 2026-05-21 · unverdicted · none · ref 31 · internal anchor
LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.
Translating Signals to Languages for sEMG-Based Activity Recognition cs.CV · 2026-05-21 · unverdicted · none · ref 79 · internal anchor
LLM-sEMG maps sEMG signals to language via a dedicated mechanism to enable LLMs to perform accurate activity recognition.
TextTeacher: What Can Language Teach About Images? cs.CV · 2026-05-21 · unverdicted · none · ref 61 · internal anchor
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
Investigating Concept Alignment Using Implausible Category Members cs.AI · 2026-05-20 · unverdicted · none · ref 33 · internal anchor
AI models misalign with humans on concept boundaries when probed with implausible category members, such as classifying words as vehicles or vegetables as fruit.
TCARD: Nearly Balanced Two-Level Designs with Treatment Cardinality Constraints with an Application to LLM Prompt Engineering stat.ME · 2026-05-20 · unverdicted · none · ref 40 · internal anchor
Proposes nearly balanced TCARDs that minimize the first two generalized word-length pattern components, defines Φ_BCD criterion linked to classical optimality, and constructs designs via coordinate exchange with simulation-calibrated weights for LLM prompt engineering.
UniT: Unified Geometry Learning with Group Autoregressive Transformer cs.CV · 2026-05-20 · unverdicted · none · ref 40 · internal anchor
UniT unifies online and offline 3D geometry perception via a Group Autoregressive Transformer that processes observation groups with anchor-free point map prediction and a scale-adaptive loss.
Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models cs.CV · 2026-05-20 · conditional · none · ref 16 · internal anchor
SPpruner reduces visual tokens in VLMs via focus identification followed by context-aware scanning, retaining 22.2% tokens for 2.53x speedup on Qwen2.5-VL with negligible accuracy loss.
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving cs.CV · 2026-05-19 · unverdicted · none · ref 22 · internal anchor
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models cs.CL · 2026-05-19 · conditional · none · ref 40 · internal anchor
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
C2CServe: Leveraging NVLink-C2C for Elastic Serverless LLM Serving on MIG cs.OS · 2026-05-19 · unverdicted · none · ref 38 · internal anchor
C2CServe is a request-granularity serverless LLM serving system that keeps weights in host memory and streams them via C2C to MIG instances, cutting cold-start latency up to 7.1x while preserving TTFT/TPOT under contention.
When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR cs.LG · 2026-05-19 · unverdicted · none · ref 28 · internal anchor
Dynamic Gradient Gating monitors lm_head gradient norms to safely reuse rollout batches in RLVR, achieving up to 2.93x sample efficiency and 2.14x wall-clock speedup across math, ALFWorld, WebShop, and QA tasks.
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning cs.LG · 2026-05-19 · unverdicted · none · ref 39 · internal anchor
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
GOAL: Graph-based Objective-Aligned Diffusion Solvers for Dynamic Multi-Objective Optimization cs.NE · 2026-05-18 · unverdicted · none · ref 54 · internal anchor
GOAL uses conditioned diffusion on relational graphs with typed edges to produce feasible multi-objective solutions for scheduling problems, reporting 100% feasibility and sub-0.2% MAPE on FSP, JSP, and FJSP up to 20 jobs.
Lance: Unified Multimodal Modeling by Multi-Task Synergy cs.CV · 2026-05-18 · unverdicted · none · ref 107 · 2 links · internal anchor
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
Vision Foundation Models as Generalist Tokenizers for Image Generation cs.CV · 2026-05-18 · unverdicted · none · ref 76 · internal anchor
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models cs.CR · 2026-05-18 · unverdicted · none · ref 42 · internal anchor
AIA generates universal interference audio infused with Acoustic Latent Semantics to bypass LALM safety alignment, achieving SOTA attack success rates on 10 models across five datasets.
TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction cs.AI · 2026-05-18 · unverdicted · none · ref 41 · internal anchor
TRACE uses cross-layer candidate trajectories inside frozen LLMs to dynamically select and apply one of three correction operators, delivering mean gains of +12.26 MC1 and +8.65 MC2 points across 15 models and 3 benchmarks with no regressions.
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking cs.CL · 2026-05-18 · unverdicted · none · ref 32 · internal anchor
Introduces BanglaMedVQA dataset of clinically validated image-question-answer pairs and benchmarks foundation models, finding substantially lower performance than on English MedVQA especially on diagnostic questions.
KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference cs.CL · 2026-05-18 · unverdicted · none · ref 32 · internal anchor
KVDrive introduces a multi-tier KV cache management system that achieves up to 1.74x higher throughput for long-context LLM inference through adaptive cache placement, pipeline restructuring, and cross-tier coordination while preserving accuracy.
Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates cs.LG · 2026-05-18 · unverdicted · none · ref 11 · internal anchor
The Adam-SGD gap in large-batch LLM pre-training arises mainly from SGD's restricted effective learning rates caused by small gradients and output-layer spikes; clipping lets SGD recover nearly all of Adam's performance.
PULSE: Agentic Investigation with Passive Sensing for Proactive Intervention in Cancer Survivorship cs.HC · 2026-05-17 · unverdicted · none · ref 62 · internal anchor
PULSE demonstrates that agentic LLM-based investigation of passive smartphone sensing data achieves balanced accuracies of 0.743 (with diary) and 0.713 (sensing-only) for predicting emotion regulation desire and intervention availability in 50 cancer survivors.

LLaMA: Open and Efficient Foundation Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer