hub Canonical reference

TinyLlama: An Open-Source Small Language Model

Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu · 2024 · cs.CL · arXiv 2401.02385

Canonical reference. 100% of citing Pith papers cite this work as background.

47 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 47 citing papers arXiv PDF

abstract

We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

cs.LG · 2026-05-14 · conditional · novelty 7.0

A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Bayesian Filtering Transformer reframes attention as precision-weighted kriging and residual connections as Kalman updates, delivering gains on cold-start recommendation and noisy LLM fine-tuning tasks.

When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation

cs.CR · 2026-05-08 · conditional · novelty 7.0

Strict regex parsing of LLM security log outputs introduces systematic errors that can make functional models appear non-functional, with a 76-point accuracy gap recovered by fuzzy parsing.

Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

cs.DC · 2026-05-01 · unverdicted · novelty 7.0

Tempus delivers 607 GOPS at 10.677 W using fixed 16 AIE cores on Versal AI Edge, with 211.2x better platform-aware utility than spatial SOTA ARIES and zero URAM/DSP utilization.

BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.

Strong Teacher Not Needed? On Distillation in LLM Pretraining

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.

SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

cs.SD · 2026-05-14 · unverdicted · novelty 6.0

SpeakerLLM unifies speaker profiling, recording-condition understanding, and structured verification reasoning in an audio-LLM via a hierarchical tokenizer and decision traces.

Common-agency Games for Multi-Objective Test-Time Alignment

cs.GT · 2026-05-08 · unverdicted · novelty 6.0

CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.

Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

LLMs prompted with few-shot examples and rationales generate better reasoned distractors for MCQs than fine-tuned contrastive models across six benchmarks.

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

cs.AR · 2026-04-13 · unverdicted · novelty 6.0

A CIM-based hardware-software co-design in 65nm achieves up to 7.3x higher throughput and 49.59x better energy efficiency than NVIDIA Orin Nano for LLaMA3.2-1B, averaging 336 tokens/s and 173 tokens/J under INT4 across multiple SLMs.

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.

RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systems

cs.CR · 2026-04-01 · conditional · novelty 6.0

RAGShield detects all numerical manipulations in government RAG systems via pattern-based value extraction and cross-source verification, achieving 0% attack success rate on 430 real IRS-derived attacks where embedding defenses miss 79-90%.

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation

cs.CL · 2026-02-24 · unverdicted · novelty 6.0

A modified divergence decouples top-K teacher probabilities from the distribution tail during distillation, yielding competitive performance on decoder models with standard compute.

Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs

cs.LG · 2025-10-21 · unverdicted · novelty 6.0

A conditional scaling law fitted on over 200 models from 80M to 3B parameters identifies architectures that deliver up to 2.1% higher accuracy and 42% higher inference throughput than LLaMA-3.2 under the same training budget.

Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding

cs.CL · 2025-09-29 · unverdicted · novelty 6.0

Speculative Verification adds a companion model that estimates draft-target alignment via information gain to dynamically set verification length, delivering up to 2x speedup over standard speculative decoding across tested models and batch sizes.

InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy

cs.LG · 2025-06-30 · unverdicted · novelty 6.0

InvisibleInk achieves high-utility differentially private long-form LLM text generation at 4-8x the cost of non-private generation by isolating and clipping sensitive logits and sampling from a small superset of top-k private tokens without privacy cost.

When Attention Sink Emerges in Language Models: An Empirical View

cs.CL · 2024-10-14 · accept · novelty 6.0

Attention sinks emerge in language models from softmax-induced token dependence on attention scores and do not appear when using sigmoid attention without normalization in models up to 1B parameters.

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

cs.CL · 2024-04-09 · conditional · novelty 6.0

MiniCPM 1.2B and 2.4B models reach parity with 7B-13B LLMs via model wind-tunnel scaling and a WSD scheduler that yields a higher optimal data-to-model ratio than Chinchilla scaling.

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

cs.LG · 2024-01-26 · unverdicted · novelty 6.0

EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.

Spectral structural distortion reveals redundant neurons in neural networks

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

A graph-spectral importance score based on layer-wise structural distortion between pre- and post-activation neuron graphs identifies removable neurons for iterative pruning without intermediate updates, followed by recovery fine-tuning.

citing papers explorer

Showing 47 of 47 citing papers.

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm cs.LG · 2026-05-14 · conditional · none · ref 91 · internal anchor
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise cs.LG · 2026-05-12 · unverdicted · none · ref 35 · internal anchor
Bayesian Filtering Transformer reframes attention as precision-weighted kriging and residual connections as Kalman updates, delivering gains on cold-start recommendation and noisy LLM fine-tuning tasks.
When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation cs.CR · 2026-05-08 · conditional · none · ref 4 · internal anchor
Strict regex parsing of LLM security log outputs introduces systematic errors that can make functional models appear non-functional, with a 76-point accuracy gap recovered by fuzzy parsing.
Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge cs.DC · 2026-05-01 · unverdicted · none · ref 27 · internal anchor
Tempus delivers 607 GOPS at 10.677 W using fixed 16 AIE cores on Versal AI Edge, with 211.2x better platform-aware utility than spatial SOTA ARIES and zero URAM/DSP utilization.
BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration cs.CL · 2026-04-03 · unverdicted · none · ref 29 · internal anchor
BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.
Strong Teacher Not Needed? On Distillation in LLM Pretraining cs.LG · 2026-05-22 · unverdicted · none · ref 14 · internal anchor
Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion cs.CL · 2026-05-21 · unverdicted · none · ref 11 · internal anchor
Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 28 · internal anchor
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning cs.SD · 2026-05-14 · unverdicted · none · ref 39 · internal anchor
SpeakerLLM unifies speaker profiling, recording-condition understanding, and structured verification reasoning in an audio-LLM via a hierarchical tokenizer and decision traces.
Common-agency Games for Multi-Objective Test-Time Alignment cs.GT · 2026-05-08 · unverdicted · none · ref 243 · internal anchor
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts cs.LG · 2026-05-07 · unverdicted · none · ref 59 · internal anchor
A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation cs.CL · 2026-04-19 · unverdicted · none · ref 164 · internal anchor
LLMs prompted with few-shot examples and rationales generate better reasoned distractors for MCQs than fine-tuned contrastive models across six benchmarks.
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models cs.LG · 2026-04-16 · unverdicted · none · ref 47 · internal anchor
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models cs.AR · 2026-04-13 · unverdicted · none · ref 35 · internal anchor
A CIM-based hardware-software co-design in 65nm achieves up to 7.3x higher throughput and 49.59x better energy efficiency than NVIDIA Orin Nano for LLaMA3.2-1B, averaging 336 tokens/s and 173 tokens/J under INT4 across multiple SLMs.
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring cs.RO · 2026-04-08 · unverdicted · none · ref 32 · internal anchor
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systems cs.CR · 2026-04-01 · conditional · none · ref 21 · internal anchor
RAGShield detects all numerical manipulations in government RAG systems via pattern-based value extraction and cross-source verification, achieving 0% attack success rate on 430 real IRS-derived attacks where embedding defenses miss 79-90%.
Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation cs.CL · 2026-02-24 · unverdicted · none · ref 25 · internal anchor
A modified divergence decouples top-K teacher probabilities from the distribution tail during distillation, yielding competitive performance on decoder models with standard compute.
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs cs.LG · 2025-10-21 · unverdicted · none · ref 51 · internal anchor
A conditional scaling law fitted on over 200 models from 80M to 3B parameters identifies architectures that deliver up to 2.1% higher accuracy and 42% higher inference throughput than LLaMA-3.2 under the same training budget.
Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding cs.CL · 2025-09-29 · unverdicted · none · ref 32 · internal anchor
Speculative Verification adds a companion model that estimates draft-target alignment via information gain to dynamically set verification length, delivering up to 2x speedup over standard speculative decoding across tested models and batch sizes.
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy cs.LG · 2025-06-30 · unverdicted · none · ref 74 · internal anchor
InvisibleInk achieves high-utility differentially private long-form LLM text generation at 4-8x the cost of non-private generation by isolating and clipping sensitive logits and sampling from a small superset of top-k private tokens without privacy cost.
When Attention Sink Emerges in Language Models: An Empirical View cs.CL · 2024-10-14 · accept · none · ref 62 · internal anchor
Attention sinks emerge in language models from softmax-induced token dependence on attention scores and do not appear when using sigmoid attention without normalization in models up to 1B parameters.
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies cs.CL · 2024-04-09 · conditional · none · ref 51 · internal anchor
MiniCPM 1.2B and 2.4B models reach parity with 7B-13B LLMs via model wind-tunnel scaling and a WSD scheduler that yields a higher optimal data-to-model ratio than Chinchilla scaling.
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty cs.LG · 2024-01-26 · unverdicted · none · ref 84 · internal anchor
EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.
Spectral structural distortion reveals redundant neurons in neural networks cs.LG · 2026-05-14 · unverdicted · none · ref 16 · internal anchor
A graph-spectral importance score based on layer-wise structural distortion between pre- and post-activation neuron graphs identifies removable neurons for iterative pruning without intermediate updates, followed by recovery fine-tuning.
DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models cs.LG · 2026-05-11 · unverdicted · none · ref 39 · internal anchor
DP-LAC provides a new adaptive clipping technique for DP-SGD in federated LLM fine-tuning that improves accuracy by 6.6% on average without consuming additional privacy budget or requiring new hyperparameters.
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization cs.LG · 2026-05-05 · unverdicted · none · ref 7 · 2 links · internal anchor
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 46 · internal anchor
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods cs.LG · 2026-04-19 · unverdicted · none · ref 52 · internal anchor
ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.
Acceptance Dynamics Across Cognitive Domains in Speculative Decoding cs.AI · 2026-04-16 · unverdicted · none · ref 12 · internal anchor
Empirical measurements across four NLP domains show task type is a stronger predictor of speculative decoding acceptance than tree depth, with chat uniquely achieving expected accepted length over 1 token per step.
MAR: Efficient Large Language Models via Module-aware Architecture Refinement cs.AI · 2026-01-29 · unverdicted · none · ref 28 · internal anchor
MAR integrates SSMs and sparsification with new ATMN neurons and SBDS distillation to produce efficient LLMs that match dense-model performance at substantially lower inference energy.
DisCEdge: Distributed Context Management for Large Language Models at the Edge cs.DC · 2025-11-27 · unverdicted · none · ref 44 · internal anchor
DisCEdge manages LLM context in tokenized form replicated on edge nodes, delivering up to 14.46% faster median responses, 15% lower sync overhead, and 90% smaller client requests versus baselines while ensuring consistency.
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference cs.PF · 2025-08-22 · unverdicted · none · ref 83 · internal anchor
ShadowNPU presents shadowAttn, a co-designed sparse attention system that uses NPU pilot compute and techniques like graph bucketing and per-head sparsity to minimize CPU/GPU fallback during on-device LLM inference while maintaining accuracy.
Attribution-Guided Pruning for Insight and Control: Circuit Discovery and Targeted Correction in Small-scale LLMs cs.LG · 2025-06-16 · conditional · none · ref 41 · internal anchor
Attribution-guided pruning with contrastive relevance identifies behavior-specific circuits in small LLMs and removes as little as 0.03-0.3% of components to reduce toxicity or repetition while preserving general performance.
From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMs cs.SE · 2026-05-15 · unverdicted · none · ref 23 · internal anchor
Compact open-source LLMs can produce syntactically valid, semantically complete, and inter-model consistent DSL models from text via few-shot prompting, with some 7B-12B models matching much larger ones in quality.
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use cs.CL · 2026-05-13 · unverdicted · none · ref 60 · 3 links · internal anchor
Trains a 42M-parameter Spanish cybersecurity LLM from scratch with curriculum phases and achieves 0.23 tool-selection accuracy after SFT mixture rebalancing to 1:21 tool-use ratio.
Agentic Performance at the Edge: Insights from Benchmarking cs.AI · 2026-05-11 · unverdicted · none · ref 18 · internal anchor
Edge agentic AI quality is not a simple function of model size; robust results require joint design of model selection and tool integration, as revealed by domain-conditioned benchmarks showing accuracy-latency Pareto fronts.
OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis cs.CR · 2026-04-29 · unverdicted · none · ref 8 · internal anchor
LoRA fine-tuning of TinyLlama-1.1B on 450 SOC examples produces 68% threat classification accuracy and 58% severity accuracy on 50 held-out logs, with full code, weights, and data released.
DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs cs.CR · 2026-04-21 · unverdicted · none · ref 15 · internal anchor
DP-FLogTinyLLM combines federated learning, differential privacy, and LoRA-tuned tiny LLMs to match centralized log anomaly detection performance on Thunderbird and BGL datasets while preserving privacy.
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models cs.SE · 2026-04-09 · unverdicted · none · ref 48 · internal anchor
Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.
Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models cs.LG · 2026-05-14 · unverdicted · none · ref 43 · internal anchor
Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities cs.DC · 2026-04-24 · unverdicted · none · ref 200 · internal anchor
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge: A Review and Outlook eess.SP · 2026-04-02 · unverdicted · none · ref 114 · internal anchor
ECG foundation models for signal interpretation and medical LLMs for reasoning can be integrated into agentic systems for real-time cardiovascular intelligence on edge devices.
A Survey on Foundation Models for Personalized Federated Intelligence cs.AI · 2025-05-11 · unverdicted · none · ref 99 · internal anchor
The survey introduces personalized federated intelligence (PFI) as a framework integrating federated learning and foundation models to support privacy-aware personalization of AI models.
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices cs.DC · 2025-03-11 · unverdicted · none · ref 151 · internal anchor
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.
SNLP: Layer-Parallel Inference via Structured Newton Corrections cs.LG · 2026-05-18 · unreviewed · ref 47 · internal anchor
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks cs.CR · 2026-05-09 · unreviewed · ref 47 · internal anchor
Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery cs.RO · 2026-05-02 · unreviewed · ref 20 · internal anchor

TinyLlama: An Open-Source Small Language Model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer