mega hub Canonical reference

LLaMA: Open and Efficient Foundation Language Models

· 2023 · cs.CL · arXiv 2302.13971

Canonical reference. 82% of citing Pith papers cite this work as background.

1099 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 1099 citing papers arXiv PDF

abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 206 method 19 baseline 8 other 6 dataset 1 extension 1

citation-polarity summary

background 198 use method 20 unclear 13 baseline 7 extend 1 support 1 use dataset 1

claims ledger

abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0 · 2 refs

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

cs.SE · 2025-06-16 · conditional · novelty 8.0

First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

cs.HC · 2024-05-13 · conditional · novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

cs.CL · 2023-11-27 · unverdicted · novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

cs.CL · 2023-05-17 · accept · novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

cs.CL · 2023-04-14 · conditional · novelty 8.0

API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

A Sensitivity-Aware Test Collection for Search Among Personal Information

cs.IR · 2026-06-25 · accept · novelty 7.0

A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.

PatternGSL: A Structured Specification Language for Template-Free and Simulation-Ready 3D Garments

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

PatternGSL is a new template-free specification language for complete sewing patterns that enables direct single-image prediction of simulation-ready garments via a vision-language model, supported by a new 300K paired dataset.

Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

SPARE reformulates visual token pruning as column subset selection to minimize reconstruction error and uses anti-relevance for context-aware selection in VLMs.

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

cs.DC · 2026-06-07 · conditional · novelty 7.0

APEX4 co-designs pure INT4 GEMM kernels with ρ-aware granularity adaptation to deliver up to 2.09× end-to-end speedup on GPUs with low ρ while keeping LLaMA-2-70B perplexity within 0.63 of FP16.

citing papers explorer

Showing 50 of 1099 citing papers.

Privacy Auditing with Zero (0) Training Run cs.CR · 2026-05-14 · unverdicted · none · ref 39 · internal anchor
Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
Effective Context in Transformers: An Analysis of Fragmentation and Tokenization cs.LG · 2026-05-13 · unverdicted · none · ref 35 · internal anchor
Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models cs.LG · 2026-05-12 · accept · none · ref 35 · internal anchor
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models cs.LG · 2026-05-08 · unverdicted · none · ref 28 · internal anchor
Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds cs.LG · 2026-05-07 · unverdicted · none · ref 38 · internal anchor
SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.
Backdoor Attacks on Decentralised Post-Training cs.CR · 2026-03-31 · conditional · none · ref 12 · 2 links · internal anchor
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers cs.SE · 2025-06-16 · conditional · none · ref 136 · internal anchor
First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.
BEAVER: An Enterprise Benchmark for Text-to-SQL cs.CL · 2024-09-03 · unverdicted · none · ref 13 · internal anchor
BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? cs.CV · 2024-08-23 · conditional · none · ref 62 · internal anchor
MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents cs.CR · 2024-06-19 · unverdicted · none · ref 56 · internal anchor
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments cs.HC · 2024-05-13 · conditional · none · ref 23 · internal anchor
AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.
ORPO: Monolithic Preference Optimization without Reference Model cs.CL · 2024-03-12 · conditional · none · ref 50 · internal anchor
ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.
Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders cs.IR · 2024-03-06 · unverdicted · none · ref 44 · internal anchor
BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces cs.LG · 2023-12-01 · unverdicted · none · ref 105 · internal anchor
Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI cs.CL · 2023-11-27 · unverdicted · none · ref 75 · internal anchor
MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models cs.CL · 2023-05-17 · accept · none · ref 33 · internal anchor
Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs cs.CL · 2023-04-14 · conditional · none · ref 17 · internal anchor
API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.
Instruction Tuning with GPT-4 cs.CL · 2023-04-06 · unverdicted · none · ref 11 · internal anchor
GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.
Language-Assisted Super-Resolution from Real-World Low-Resolution Patches cs.CV · 2026-06-30 · unverdicted · none · ref 128 · internal anchor
LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.
Probing Memorization of Tabular In-Context Learning cs.LG · 2026-06-30 · unverdicted · none · ref 108 · internal anchor
A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.
A Sensitivity-Aware Test Collection for Search Among Personal Information cs.IR · 2026-06-25 · accept · none · ref 63 · internal anchor
A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.
PatternGSL: A Structured Specification Language for Template-Free and Simulation-Ready 3D Garments cs.CV · 2026-06-23 · unverdicted · none · ref 85 · internal anchor
PatternGSL is a new template-free specification language for complete sewing patterns that enables direct single-image prediction of simulation-ready garments via a vision-language model, supported by a new 300K paired dataset.
Moving Beyond Diversity: Visual Token Pruning as Subspace Reconstruction for Efficient VLMs cs.CV · 2026-06-17 · unverdicted · none · ref 31 · internal anchor
SPARE reformulates visual token pruning as column subset selection to minimize reconstruction error and uses anti-relevance for context-aware selection in VLMs.
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing cs.DC · 2026-06-07 · conditional · none · ref 25 · internal anchor
APEX4 co-designs pure INT4 GEMM kernels with ρ-aware granularity adaptation to deliver up to 2.09× end-to-end speedup on GPUs with low ρ while keeping LLaMA-2-70B perplexity within 0.63 of FP16.
End-to-End Text Line Detection and Ordering cs.CV · 2026-06-02 · unverdicted · none · ref 28 · internal anchor
Orli is an autoregressive image-to-sequence model that jointly detects text lines and determines their reading order on historical documents via chord-frame baselines, trained on 196k pages across ten scripts.
When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation cs.CL · 2026-06-01 · unverdicted · none · ref 15 · internal anchor
Defines cost-aware RAG with evidence cost tiers and shows static selectors are brittle while agentic LLM-based selection is promising but model-dependent.
RWGBench: Evaluating Scholarly Positioning in Related Work Generation cs.DL · 2026-05-30 · unverdicted · none · ref 55 · internal anchor
RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.
Next-Billion AI Index: The compass for AI utility and adoption in the global majority cs.CY · 2026-05-29 · unverdicted · none · ref 136 · internal anchor
Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.
What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness cs.CV · 2026-05-29 · unverdicted · none · ref 3 · internal anchor
The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement
Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them cs.LG · 2026-05-29 · conditional · none · ref 63 · 2 links · internal anchor
Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.
Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs cs.CL · 2026-05-29 · unverdicted · none · ref 51 · internal anchor
Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.
Learn from your own latents and not from tokens: A sample-complexity theory cs.LG · 2026-05-26 · unverdicted · none · ref 8 · internal anchor
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems cs.AI · 2026-05-25 · unverdicted · none · ref 33 · internal anchor
AgingBench demonstrates multi-dimensional degradation in deployed AI agents through four aging mechanisms diagnosed by temporal graphs and counterfactual probes across hundreds of runs.
Phantom transitions in language model fine-tuning cs.CL · 2026-05-25 · accept · none · ref 31 · internal anchor
Apparent phase transitions during fine-tuning on near-synonym tasks are phantoms originating in the softmax readout; an order parameter isolates kinematic and structural failure modes and a few dimensionless quantities predict critical learning rates across architectures via blind test.
Large Language Model Selection with Limited Annotations cs.CL · 2026-05-24 · unverdicted · none · ref 109 · internal anchor
SELECT-LLM is the first active model selection framework for LLMs that uses expected information gain from pairwise output similarities to minimize required annotations, reporting up to 84.78% cost reduction across 23 datasets and 156 models.
Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment cs.AI · 2026-05-22 · unverdicted · none · ref 41 · internal anchor
Introduces Latent Adversarial Robustification and Rank-Constrained Subspace Learning to enable robust generalization in multimodal knowledge editing through adversarial subspace alignment.
CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference cs.CR · 2026-05-22 · unverdicted · none · ref 55 · internal anchor
CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.
Brain-LLM Alignment Tracks Training Data, Not Typology cs.CL · 2026-05-21 · unverdicted · none · ref 14 · internal anchor
Training-language dominance, not English inherent properties, determines brain-LLM alignment across English, Chinese, and French, with additional independent effects from typological distance concentrated in syntactic brain regions.
A mathematical theory of balancing relational generalization and memorization cs.LG · 2026-05-21 · unverdicted · none · ref 66 · internal anchor
Introduces transitive inference with exceptions task and analytically shows kernel ridge regression balances relational generalization and memorization depending on representational geometry, with validation in finetuned language models.
Understanding Data Temporality Impact on Large Language Models Pre-training cs.CL · 2026-05-21 · unverdicted · none · ref 21 · internal anchor
Pre-training 6B LLMs on temporally ordered Common Crawl snapshots yields models with improved factual freshness and temporal precision over shuffled baselines while matching on general language understanding.
TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting cs.CR · 2026-05-21 · unverdicted · none · ref 126 · internal anchor
TimeGuard defends time series forecasting against backdoors via channel-wise pool training initialized by time-aware criteria and expanded with distance-regularized loss selection, improving poisoned MAE by 1.96x while keeping clean MAE within 5%.
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs cs.AI · 2026-05-21 · unverdicted · none · ref 9 · internal anchor
ST-SimDiff is a training-free method using a spatio-temporal graph and dual similarity-difference selection to compress video tokens for MLLMs while retaining static and dynamic content.
Generative Conversational Recommender System cs.IR · 2026-05-21 · unverdicted · none · ref 27 · internal anchor
A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.
Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training cs.CV · 2026-05-20 · unverdicted · none · ref 43 · internal anchor
AutoScale is a closed-loop data engine using Graph-RAE for scene representation and Cluster-GA for importance-based retrieval to improve real-synthetic co-training for autonomous driving.
On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective cs.LG · 2026-05-20 · unverdicted · none · ref 88 · internal anchor
Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.
Provable Joint Decontamination for Benchmarking Multiple Large Language Models cs.LG · 2026-05-20 · unverdicted · none · ref 165 · internal anchor
JECS aggregates per-model conformal p-values via their maximum and reconstructs a conservative envelope of the max-p null distribution to select benchmarks with global contamination rate control.
Head-Aware Key-Value Compression for Efficient Autoregressive Image Generation cs.CV · 2026-05-20 · conditional · none · ref 32 · internal anchor
HeadKV compresses KV cache for autoregressive image generation via head-aware budget allocation, early head-type identification from consistent patterns, and stratified token eviction.
RECIPE: Procedural Planning via Grounding in Instructional Video cs.CV · 2026-05-19 · unverdicted · none · ref 42 · internal anchor
RECIPE improves visual procedural planners by rewarding plans according to their grounding quality in ASR transcripts via GRPO, yielding +7–8 in-domain and up to +16 zero-shot macro-accuracy gains over base models and outperforming supervised fine-tuning on seven benchmarks.
Modality-Decoupled Online Recursive Editing cs.LG · 2026-05-19 · conditional · none · ref 16 · internal anchor
M-ORE decouples text and visual update statistics in MLLMs and applies recursive low-rank edits in an orthogonal subspace to reduce cross-modal conflict and long-horizon interference.
PMF-CL: Pareto-Minimal-Forgetting Continual Learner for Conflicting Tasks cs.LG · 2026-05-18 · unverdicted · none · ref 17 · 2 links · internal anchor
PMF-CL derives Pareto-minimal-forgetting algorithms for linear/basis-function regression and quadratic-bounded losses like logistic regression, achieving static O(d²) memory for d-parameter models.

LLaMA: Open and Efficient Foundation Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer