mega hub Canonical reference

LLaMA: Open and Efficient Foundation Language Models

· 2023 · cs.CL · arXiv 2302.13971

Canonical reference. 82% of citing Pith papers cite this work as background.

1121 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 1121 citing papers arXiv PDF

abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 206 method 19 baseline 8 other 6 dataset 1 extension 1

citation-polarity summary

background 198 use method 20 unclear 13 baseline 7 extend 1 support 1 use dataset 1

claims ledger

abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

eess.AS · 2026-05-31 · unverdicted · novelty 8.0

SVHalluc benchmark shows open-source audio-visual LLMs achieve near-random accuracy on semantic and temporal speech-vision alignment tasks while Gemini 2.5 Pro performs substantially better.

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0 · 2 refs

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

cs.SE · 2025-06-16 · conditional · novelty 8.0

First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

cs.HC · 2024-05-13 · conditional · novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

cs.CL · 2023-11-27 · unverdicted · novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

cs.CL · 2023-05-17 · accept · novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

cs.CL · 2023-04-14 · conditional · novelty 8.0

API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

cs.AI · 2026-06-26 · unverdicted · novelty 7.0

DynaSteer dynamically steers LLM reasoning trajectories toward truth via pattern clustering, Fisher-LDA projection, and entropy-triggered representation edits, improving performance on MATH and generalizing to coding.

A Sensitivity-Aware Test Collection for Search Among Personal Information

cs.IR · 2026-06-25 · accept · novelty 7.0

A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.

Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.

citing papers explorer

Showing 50 of 1121 citing papers.

MAny: Merge Anything for Multimodal Continual Instruction Tuning cs.LG · 2026-04-15 · unverdicted · none · ref 12 · internal anchor
MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.
Identifying and Mitigating Gender Cues in Academic Recommendation Letters: An Interpretability Case Study cs.LG · 2026-04-14 · unverdicted · none · ref 44 · internal anchor
Transformer models detect applicant gender in de-gendered academic recommendation letters via implicit linguistic patterns such as associations with words like 'emotional' and 'humanitarian', and removing these cues reduces but does not eliminate prediction accuracy above chance.
Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer cs.CL · 2026-04-13 · unverdicted · none · ref 11 · internal anchor
BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation cs.RO · 2026-04-13 · unverdicted · none · ref 45 · internal anchor
Compositional Simulation generates scalable real-world robot training data by combining classical simulation with neural simulation in a closed-loop real-sim-real augmentation pipeline.
SignReasoner: Compositional Reasoning for Complex Traffic Sign Understanding via Functional Structure Units cs.CV · 2026-04-12 · unverdicted · none · ref 27 · internal anchor
SignReasoner decomposes traffic signs into functional structure units and uses a two-stage VLM post-training pipeline to achieve state-of-the-art compositional reasoning on a new benchmark.
Wearable AI in the Era of Large Sensor Models eess.SP · 2026-04-11 · unverdicted · none · ref 36 · internal anchor
Large Sensor Models trained on large-scale multimodal wearable data can provide a scalable, general framework for wearable AI by learning transferable representations across modalities and tasks.
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models cs.CL · 2026-04-11 · unverdicted · none · ref 37 · internal anchor
SEPTQ simplifies LLM post-training quantization to two steps via static global importance scoring and mask-guided column-wise weight updates, claiming superior results over baselines in low-bit settings.
Policy-Aware Edge LLM-RAG Framework for Internet of Battlefield Things Mission Orchestration cs.NI · 2026-04-10 · unverdicted · none · ref 4 · internal anchor
PA-LLM-RAG adds policy retrieval and dual-LLM verification to enable reliable low-latency mission orchestration in simulated IoBT environments, with Gemma-2B reaching 100% policy compliance at 4.17s latency.
Customized Fusion: A Closed-Loop Dynamic Network for Adaptive Multi-Task-Aware Infrared-Visible Image Fusion cs.CV · 2026-04-10 · unverdicted · none · ref 42 · internal anchor
CLDyN establishes a closed-loop semantic transmission chain with a Requirement-driven Semantic Compensation module to make infrared-visible fusion adapt to diverse downstream tasks.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing cs.AI · 2026-04-09 · unverdicted · none · ref 45 · internal anchor
SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 138 · internal anchor
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction cs.CV · 2026-04-09 · unverdicted · none · ref 43 · internal anchor
MESA reduces hallucinations in LVLMs via controlled selective latent intervention that preserves the original token distribution.
From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference cs.AR · 2026-04-08 · unverdicted · none · ref 28 · internal anchor
An RL agent using Soft Actor-Critic with Mixture-of-Experts jointly optimizes ASIC architecture, memory hierarchy, and partitioning for AI inference, achieving 29809 tokens/s for Llama 3.1 at 3nm and under 13mW for SmolVLM across 3-28nm nodes without manual retuning.
An Analysis of Artificial Intelligence Adoption in NIH-Funded Research cs.AI · 2026-04-08 · unverdicted · none · ref 18 · internal anchor
AI makes up 15.9% of NIH-funded biomedical projects in 2025 with a 13.4% funding premium, yet 79% stay in research stages, only 14.7% reach clinical deployment, and health disparities work is just 5.7% of AI projects.
A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM cs.CL · 2026-04-08 · unverdicted · none · ref 64 · internal anchor
G-Defense builds claim-centered graphs from sub-claims, applies RAG for evidence and competing explanations, then uses graph inference to detect fake news veracity and generate intuitive explanation graphs, claiming SOTA results.
Large Language Model Assisted Discovery of Optimal Dopants for Enhanced Thermoelectric Performance in CoSb$_3$ Based Skutterudites cond-mat.mtrl-sci · 2026-04-07 · unverdicted · none · ref 8 · internal anchor
LLM-based extraction and modeling from literature data identifies new filler compositions for CoSb3 skutterudites predicted to have improved thermoelectric figure of merit, with DFT and MD validation.
Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models cs.AI · 2026-04-07 · unverdicted · none · ref 3 · internal anchor
JCQL uses an SLM-trained KBC model as an action in an LLM agent for KBQA to reduce hallucinations, then fine-tunes the KBC model with KBQA reasoning paths, outperforming baselines on two benchmarks.
Identifying Influential N-grams in Confidence Calibration via Regression Analysis cs.CL · 2026-04-07 · unverdicted · none · ref 7 · internal anchor
Regression identifies specific n-grams in LLM reasoning that drive overconfidence, enabling calibration via their suppression without performance loss.
Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck cs.SD · 2026-04-07 · unverdicted · none · ref 30 · internal anchor
A singing voice conversion system with boundary-aware information bottleneck and high-frequency augmentation achieves the best naturalness in SVCC2025 subjective tests while using less extra data than competitors.
Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems cs.AI · 2026-04-06 · unverdicted · none · ref 22 · internal anchor
An instruction-tuned 8B LLaMA model parses HPC logs with accuracy matching larger models and processes 600 million Frontier supercomputer logs to reveal temporal patterns and anomalies.
Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting cs.AI · 2026-04-05 · unverdicted · none · ref 21 · internal anchor
Solar-VLM fuses time-series, satellite imagery, and text encoders with graph attention across sites to improve PV power forecasting on real data from eight Chinese stations.
BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design cs.LG · 2026-04-05 · unverdicted · none · ref 61 · internal anchor
BWTA achieves near full-precision accuracy on BERT and LLMs using binary weights and ternary activations, with 16-24x kernel speedups via specialized CUDA kernels.
Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity cs.AI · 2026-04-03 · conditional · none · ref 25 · internal anchor
A role clarity matrix from softmax-normalized behavior-role similarities is employed as a regularizer to enhance role consistency in multi-agent LLM collaborations.
Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook eess.SP · 2026-04-03 · accept · none · ref 140 · internal anchor
The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with large language models.
Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs cs.CV · 2026-04-03 · unverdicted · none · ref 62 · internal anchor
Efficient3D prunes visual tokens in 3D MLLMs via DVTIE and ATR modules, reporting better performance than unpruned baselines on Scan2Cap and other benchmarks.
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks cs.CV · 2026-04-02 · unverdicted · none · ref 51 · internal anchor
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
MolClaw: An Autonomous Agent with Hierarchical Skills for Drug Molecule Evaluation, Screening, and Optimization cs.AI · 2026-04-02 · unverdicted · none · ref 60 · 2 links · internal anchor
MolClaw deploys a hierarchical skill architecture to reach state-of-the-art results on a new benchmark of multi-step drug discovery tasks.
Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali cs.CL · 2026-03-25 · unverdicted · none · ref 6 · internal anchor
Fine-tuning Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali data enables effective generation where zero-shot fails, with Qwen3-8B performing best overall and Llama-3.1-8B showing the largest gains.
Cyberlanguage: Native Communication for the Cyber-Physical-Social-Thinking Fusion Space cs.ET · 2026-03-18 · unverdicted · none · ref 44 · internal anchor
Cyberlanguage is proposed as a four-dimensional communicative framework for the CPST fusion space, built on a Cybersign semiotic model, synchronous grammar, five-layer stack, and context-driven pragmatics to coordinate heterogeneous agents.
Attention Residuals cs.CL · 2026-03-16 · unverdicted · none · ref 52 · internal anchor
Attention Residuals replaces fixed residual summation with input-dependent softmax attention over preceding layers, and a blocked variant is shown to improve uniformity and downstream performance in a 48B-parameter model pre-trained on 1.4T tokens.
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction cs.LG · 2026-03-10 · unverdicted · none · ref 27 · internal anchor
HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.
Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty cs.LG · 2026-02-13 · unverdicted · none · ref 15 · internal anchor
CUD reshapes the teacher's predictive distribution before distillation so that students receive calibrated uncertainty signals alongside accuracy, yielding more robust and better-calibrated models on high-cardinality and distribution-shift benchmarks.
SOCKET: SOft Collision Kernel EsTimator for Sparse Attention cs.LG · 2026-02-06 · unverdicted · none · ref 44 · internal anchor
SOCKET replaces hard LSH bucket matches with soft probabilistic collision aggregation to enable efficient, high-quality token selection for sparse attention, matching or exceeding prior methods with up to 1.5x throughput gains.
MAR: Efficient Large Language Models via Module-aware Architecture Refinement cs.AI · 2026-01-29 · unverdicted · none · ref 5 · internal anchor
MAR integrates SSMs and sparsification with new ATMN neurons and SBDS distillation to produce efficient LLMs that match dense-model performance at substantially lower inference energy.
Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving cs.AI · 2026-01-29 · unverdicted · none · ref 29 · internal anchor
An MLLM interpreter generates concise CDL descriptions from diagrams, enabling an off-the-shelf LLM to solve plane geometry problems competitively after training on only 5.5k examples.
AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture cs.AI · 2025-11-28 · unverdicted · none · ref 41 · internal anchor
AgroCoT is a new Chain-of-Thought VQA benchmark with 4759 samples to evaluate reasoning capabilities of vision-language models in agriculture.
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering cs.SD · 2025-11-12 · unverdicted · none · ref 36 · internal anchor
CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.
Remembering Unequally: Global and Disciplinary Bias in LLM Reconstruction of Scholarly Coauthor Lists cs.CL · 2025-11-01 · unverdicted · none · ref 33 · internal anchor
LLMs show systematic bias toward highly cited scholars when reconstructing coauthor lists, with more balanced outcomes in fields like Clinical Medicine and some African regions.
LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation cs.LG · 2025-10-24 · unverdicted · none · ref 34 · internal anchor
LLM4Delay improves flight delay prediction accuracy by using instance-level projection to adapt LLMs for integrating textual aeronautical information with multiple aircraft trajectories.
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters cs.DC · 2025-10-13 · unverdicted · none · ref 48 · internal anchor
FlexPipe introduces runtime pipeline refactoring for LLMs to achieve higher resource efficiency and lower latency in serverless GPU clusters with fragmentation.
Users as Annotators: LLM Preference Learning from Comparison Mode cs.CL · 2025-10-10 · unverdicted · none · ref 24 · internal anchor
Introduces a latent user quality model and EM algorithm to infer and filter noisy user-provided pairwise preferences for improved LLM alignment.
Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 71 · internal anchor
Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.
SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba cs.NE · 2025-10-06 · unverdicted · none · ref 18 · internal anchor
SpikingMamba distills Mamba into an SNN LLM achieving 4.76x energy savings with a 4.78% zero-shot accuracy gap that narrows to 2.23% after RL.
OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models cs.AI · 2025-10-01 · unverdicted · none · ref 31 · internal anchor
OntoLogX is a system that applies LLMs with ontology guidance, RAG, and iterative fixes to build valid knowledge graphs from cybersecurity logs and predict ATT&CK tactics from aggregated sessions.
Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training cs.LG · 2025-09-15 · unverdicted · none · ref 50 · internal anchor
Proposes low-rank orthogonalization and derives low-rank Muon and MSGD variants that outperform standard Muon on GPT-2 and LLaMA pretraining while providing iteration complexity bounds.
Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference cs.AR · 2025-09-11 · unverdicted · none · ref 67 · internal anchor
PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.
OneRec-V2 Technical Report cs.IR · 2025-08-28 · unverdicted · none · ref 20 · internal anchor
OneRec-V2 scales generative recommendation to 8B parameters via decoder-only design and real-world preference alignment, improving user engagement metrics in production A/B tests.
Scalable Object Detection in the Car Interior With Vision Foundation Models cs.CV · 2025-08-27 · unverdicted · none · ref 16 · internal anchor
ODAL framework distributes vision foundation models across on-board and cloud for car interior object detection, with fine-tuned LLaVA 1.5 7B reaching 89% ODAL score, 71% improvement, and outperforming GPT-4o while reducing hallucinations.
Enhancing Speech Large Language Models through Reinforced Behavior Alignment cs.CL · 2025-08-25 · unverdicted · none · ref 46 · internal anchor
Reinforced Behavior Alignment (RBA) uses self-synthesized data from a teacher LLM and reinforcement learning to close the instruction-following gap in SpeechLMs, outperforming distillation and reaching SOTA on spoken QA and speech-to-text translation benchmarks.
HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling cs.DC · 2025-08-21 · unverdicted · none · ref 42 · internal anchor
HFX jointly designs scheduling and scaling for multi-SLO LLM serving, achieving up to 4.44x higher SLO attainment, 65.82% lower latency, and 49.81% lower cost than prior systems on multi-task workloads.

LLaMA: Open and Efficient Foundation Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer