hub Canonical reference

TinyLlama: An Open-Source Small Language Model

Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu · 2024 · cs.CL · arXiv 2401.02385

Canonical reference. 100% of citing Pith papers cite this work as background.

79 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 79 citing papers arXiv PDF

abstract

We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

CBD: API-Only LLM Black-Box Unlearning through Controlled Behavioral Divergence

cs.LG · 2026-06-26 · unverdicted · novelty 7.0

CBD is an API-only black-box unlearning method for LLMs that creates controlled behavioral divergence with auxiliary models and uses a Fisher-matrix-derived discriminative basis to balance forgetting target data with retained utility.

Explaining Attention with Program Synthesis

cs.LG · 2026-06-17 · unverdicted · novelty 7.0 · 2 refs

Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.

Trajectory Geometry of Transformer Representations Across Layers

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

Transformer representations form trajectories showing semantic convergence in middle-to-late layers, higher curvature on reasoning tasks, bifurcation on ambiguous tokens, and a consistent three-phase cosine similarity pattern across GPT-2, TinyLlama, and Qwen2.5.

CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

CollabSim is a new CSCW-grounded simulation framework that enables controlled multi-agent experiments to measure collaborative competence in LLM agents.

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Defines representational capacity as the upper bound on distinguishable near-orthogonal directions in transformer latent spaces, derived from embedding similarity distributions and an adjusted Johnson-Lindenstrauss formula dependent on the k/d ratio.

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

cs.LG · 2026-05-14 · conditional · novelty 7.0

A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Bayesian Filtering Transformer reframes attention as precision-weighted kriging and residual connections as Kalman updates, delivering gains on cold-start recommendation and noisy LLM fine-tuning tasks.

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

cs.CR · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

PASA is an embedding-space watermarking method for LLM text that uses semantic clusters and synchronized randomness to achieve robustness against paraphrasing while remaining distortion-free.

When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation

cs.CR · 2026-05-08 · conditional · novelty 7.0

Strict regex parsing of LLM security log outputs introduces systematic errors that can make functional models appear non-functional, with a 76-point accuracy gap recovered by fuzzy parsing.

Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

cs.DC · 2026-05-01 · unverdicted · novelty 7.0

Tempus delivers 607 GOPS at 10.677 W using fixed 16 AIE cores on Versal AI Edge, with 211.2x better platform-aware utility than spatial SOTA ARIES and zero URAM/DSP utilization.

BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.

Test Case Selection for Deep Neural Networks: A Replication Study on LLMs for Code

cs.SE · 2026-06-25 · unverdicted · novelty 6.0

Replication of TCS strategies on 17 LLM instances across three code tasks shows only partial generalization from vision DNN results, with uncertainty features aiding early failure discovery and representation features aiding accuracy estimation.

PRIME: Evaluating Prompt Resolution Under Incompatible Instructions in LLMs

cs.AI · 2026-06-21 · unverdicted · novelty 6.0

PRIME is a new evaluation framework that creates calibrated conflicts in LLM prompts and finds conflict type affects model behavior more than scale.

Tracking Representation Dynamics in Large Language Models with Persistent Homology

cs.LG · 2026-06-17 · unverdicted · novelty 6.0

Persistent homology analysis of LLM activations shows most topological reorganization occurs early in fine-tuning, with a transient peak followed by stabilization and distinct trajectories for different alignment objectives.

ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

ARIADNE routes queries to the best adapter via embedding-space centroid proximity, recovering 97.44% of upper-bound performance on 23 NLP tasks and 89.7% selection accuracy on 44 tasks without training or internal access.

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

cs.CL · 2026-06-17 · unverdicted · novelty 6.0

RegMix-D fits regression models to proxy loss trajectories to produce dynamic data mixture schedules that outperform static RegMix and DoReMi on 25B-token Pile pretraining with a 1B model.

BLADE: Scalable Bi-level Adaptive Data Selection for LLM Training

cs.LG · 2026-06-17 · unverdicted · novelty 6.0

BLADE converts influence-based bi-level data selection into a Hessian-free penalized objective with a dynamic reference model, proves first-order convergence, and reports better performance than prior methods on LLM training.

Explaining Data Mixing Scaling Laws

cs.LG · 2026-06-06 · unverdicted · novelty 6.0

A framework using capacity competition and noise reduction under an overlapping-skills assumption explains multi-domain loss behaviors and extrapolates optimal mixtures to large scales from small-scale fits with fewer parameters.

LLM Compression with Jointly Optimizing Architectural and Quantization choices

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

A differentiable NAS framework jointly optimizes LLM architecture and mixed-precision quantization for linear layers, yielding up to 1.4x faster inference or 6% higher accuracy than sequential baselines on reasoning tasks.

MOC: Multi-Order Communication in LLM-based Multi-Agent Systems

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

MOC formalizes a multi-order evidence stream and Semantic-Topological Merging algorithm that improves task performance while cutting communication costs on six datasets.

Harmonic: Hierarchical State Space Models for Efficient Long-Context Language Modeling

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

Hierarchical SSM architecture Harmonic outperforms Transformers and Mamba on long-context language modeling up to 64K tokens and removes RoPE limits at 1B scale while maintaining O(L) compute.

Rethinking the Role of Temperature in Large Language Model Distillation

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Including temperature scaling makes forward KL divergence outperform reverse KL in LLM distillation on instruction benchmarks, overturning the τ=1 preference for reverse KL.

De-attribute to Forget for LLM Unlearning

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

DareU reframes LLM unlearning as zeroing data attribution via RL rewards from an LLM classifier approximation, claiming better balance of forget quality and model utility than loss-based baselines.

Strong Teacher Not Needed? On Distillation in LLM Pretraining

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.

citing papers explorer

Showing 50 of 73 citing papers after filters.

CBD: API-Only LLM Black-Box Unlearning through Controlled Behavioral Divergence cs.LG · 2026-06-26 · unverdicted · none · ref 41 · internal anchor
CBD is an API-only black-box unlearning method for LLMs that creates controlled behavioral divergence with auxiliary models and uses a Fisher-matrix-derived discriminative basis to balance forgetting target data with retained utility.
Explaining Attention with Program Synthesis cs.LG · 2026-06-17 · unverdicted · none · ref 38 · 2 links · internal anchor
Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
Trajectory Geometry of Transformer Representations Across Layers cs.LG · 2026-06-08 · unverdicted · none · ref 9 · internal anchor
Transformer representations form trajectories showing semantic convergence in middle-to-late layers, higher curvature on reasoning tasks, bifurcation on ambiguous tokens, and a consistent three-phase cosine similarity pattern across GPT-2, TinyLlama, and Qwen2.5.
CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments cs.CL · 2026-06-04 · unverdicted · none · ref 115 · internal anchor
CollabSim is a new CSCW-grounded simulation framework that enables controlled multi-agent experiments to measure collaborative competence in LLM agents.
Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models cs.LG · 2026-06-01 · unverdicted · none · ref 29 · internal anchor
Defines representational capacity as the upper bound on distinguishable near-orthogonal directions in transformer latent spaces, derived from embedding similarity distributions and an adjusted Johnson-Lindenstrauss formula dependent on the k/d ratio.
Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise cs.LG · 2026-05-12 · unverdicted · none · ref 35 · internal anchor
Bayesian Filtering Transformer reframes attention as precision-weighted kriging and residual connections as Kalman updates, delivering gains on cold-start recommendation and noisy LLM fine-tuning tasks.
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks cs.CR · 2026-05-09 · unverdicted · none · ref 47 · 2 links · internal anchor
PASA is an embedding-space watermarking method for LLM text that uses semantic clusters and synchronized randomness to achieve robustness against paraphrasing while remaining distortion-free.
Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge cs.DC · 2026-05-01 · unverdicted · none · ref 27 · internal anchor
Tempus delivers 607 GOPS at 10.677 W using fixed 16 AIE cores on Versal AI Edge, with 211.2x better platform-aware utility than spatial SOTA ARIES and zero URAM/DSP utilization.
BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration cs.CL · 2026-04-03 · unverdicted · none · ref 29 · internal anchor
BoostTaxo introduces a boosting-style LLM framework for zero-shot taxonomy induction that uses hybrid candidate selection and constraint-aware calibration to achieve superior or comparable performance to prior methods on WordNet, DBLP, and SemEval-Sci benchmarks.
Test Case Selection for Deep Neural Networks: A Replication Study on LLMs for Code cs.SE · 2026-06-25 · unverdicted · none · ref 72 · internal anchor
Replication of TCS strategies on 17 LLM instances across three code tasks shows only partial generalization from vision DNN results, with uncertainty features aiding early failure discovery and representation features aiding accuracy estimation.
PRIME: Evaluating Prompt Resolution Under Incompatible Instructions in LLMs cs.AI · 2026-06-21 · unverdicted · none · ref 12 · internal anchor
PRIME is a new evaluation framework that creates calibrated conflicts in LLM prompts and finds conflict type affects model behavior more than scale.
Tracking Representation Dynamics in Large Language Models with Persistent Homology cs.LG · 2026-06-17 · unverdicted · none · ref 11 · internal anchor
Persistent homology analysis of LLM activations shows most topological reorganization occurs early in fine-tuning, with a transient peak followed by stabilization and distinct trajectories for different alignment objectives.
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection cs.AI · 2026-06-17 · unverdicted · none · ref 28 · internal anchor
ARIADNE routes queries to the best adapter via embedding-space centroid proximity, recovering 97.44% of upper-bound performance on 23 NLP tasks and 89.7% selection accuracy on 44 tasks without training or internal access.
RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories cs.CL · 2026-06-17 · unverdicted · none · ref 16 · internal anchor
RegMix-D fits regression models to proxy loss trajectories to produce dynamic data mixture schedules that outperform static RegMix and DoReMi on 25B-token Pile pretraining with a 1B model.
BLADE: Scalable Bi-level Adaptive Data Selection for LLM Training cs.LG · 2026-06-17 · unverdicted · none · ref 52 · internal anchor
BLADE converts influence-based bi-level data selection into a Hessian-free penalized objective with a dynamic reference model, proves first-order convergence, and reports better performance than prior methods on LLM training.
Explaining Data Mixing Scaling Laws cs.LG · 2026-06-06 · unverdicted · none · ref 13 · internal anchor
A framework using capacity competition and noise reduction under an overlapping-skills assumption explains multi-domain loss behaviors and extrapolates optimal mixtures to large scales from small-scale fits with fewer parameters.
LLM Compression with Jointly Optimizing Architectural and Quantization choices cs.LG · 2026-06-02 · unverdicted · none · ref 31 · internal anchor
A differentiable NAS framework jointly optimizes LLM architecture and mixed-precision quantization for linear layers, yielding up to 1.4x faster inference or 6% higher accuracy than sequential baselines on reasoning tasks.
MOC: Multi-Order Communication in LLM-based Multi-Agent Systems cs.AI · 2026-06-01 · unverdicted · none · ref 106 · internal anchor
MOC formalizes a multi-order evidence stream and Semantic-Topological Merging algorithm that improves task performance while cutting communication costs on six datasets.
Harmonic: Hierarchical State Space Models for Efficient Long-Context Language Modeling cs.CL · 2026-05-30 · unverdicted · none · ref 15 · internal anchor
Hierarchical SSM architecture Harmonic outperforms Transformers and Mamba on long-context language modeling up to 64K tokens and removes RoPE limits at 1B scale while maintaining O(L) compute.
Rethinking the Role of Temperature in Large Language Model Distillation cs.LG · 2026-05-29 · unverdicted · none · ref 16 · internal anchor
Including temperature scaling makes forward KL divergence outperform reverse KL in LLM distillation on instruction benchmarks, overturning the τ=1 preference for reverse KL.
De-attribute to Forget for LLM Unlearning cs.LG · 2026-05-29 · unverdicted · none · ref 13 · internal anchor
DareU reframes LLM unlearning as zeroing data attribution via RL rewards from an LLM classifier approximation, claiming better balance of forget quality and model utility than loss-based baselines.
Strong Teacher Not Needed? On Distillation in LLM Pretraining cs.LG · 2026-05-22 · unverdicted · none · ref 14 · internal anchor
Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.
Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion cs.CL · 2026-05-21 · unverdicted · none · ref 11 · internal anchor
Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.
SNLP: Layer-Parallel Inference via Structured Newton Corrections cs.LG · 2026-05-18 · unverdicted · none · ref 47 · 2 links · internal anchor
SNLP achieves up to 2.58x wall-clock speedup on 0.5B Transformers via architecture-specific Newton corrections (IDN/HCN) that enable layer-parallel inference while preserving perplexity in milder settings.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 28 · internal anchor
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning cs.SD · 2026-05-14 · unverdicted · none · ref 39 · internal anchor
SpeakerLLM unifies speaker profiling, recording-condition understanding, and structured verification reasoning in an audio-LLM via a hierarchical tokenizer and decision traces.
Common-agency Games for Multi-Objective Test-Time Alignment cs.GT · 2026-05-08 · unverdicted · none · ref 243 · internal anchor
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts cs.LG · 2026-05-07 · unverdicted · none · ref 59 · internal anchor
A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.
Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery cs.RO · 2026-05-02 · unverdicted · none · ref 20 · 2 links · internal anchor
Sentinel-VLA adds metacognitive status monitoring to VLA models for on-demand reasoning and error recovery, reporting over 30% higher real-world task success than prior SOTA.
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation cs.CL · 2026-04-19 · unverdicted · none · ref 164 · internal anchor
LLMs prompted with few-shot examples and rationales generate better reasoned distractors for MCQs than fine-tuned contrastive models across six benchmarks.
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models cs.LG · 2026-04-16 · unverdicted · none · ref 47 · internal anchor
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models cs.AR · 2026-04-13 · unverdicted · none · ref 35 · internal anchor
A CIM-based hardware-software co-design in 65nm achieves up to 7.3x higher throughput and 49.59x better energy efficiency than NVIDIA Orin Nano for LLaMA3.2-1B, averaging 336 tokens/s and 173 tokens/J under INT4 across multiple SLMs.
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring cs.RO · 2026-04-08 · unverdicted · none · ref 32 · internal anchor
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user escalation.
Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation cs.CL · 2026-02-24 · unverdicted · none · ref 25 · internal anchor
A modified divergence decouples top-K teacher probabilities from the distribution tail during distillation, yielding competitive performance on decoder models with standard compute.
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs cs.LG · 2025-10-21 · unverdicted · none · ref 51 · internal anchor
A conditional scaling law fitted on over 200 models from 80M to 3B parameters identifies architectures that deliver up to 2.1% higher accuracy and 42% higher inference throughput than LLaMA-3.2 under the same training budget.
Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding cs.CL · 2025-09-29 · unverdicted · none · ref 32 · internal anchor
Speculative Verification adds a companion model that estimates draft-target alignment via information gain to dynamically set verification length, delivering up to 2x speedup over standard speculative decoding across tested models and batch sizes.
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy cs.LG · 2025-06-30 · unverdicted · none · ref 74 · internal anchor
InvisibleInk achieves high-utility differentially private long-form LLM text generation at 4-8x the cost of non-private generation by isolating and clipping sensitive logits and sampling from a small superset of top-k private tokens without privacy cost.
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty cs.LG · 2024-01-26 · unverdicted · none · ref 84 · internal anchor
EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.
Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs cs.LG · 2026-06-18 · unverdicted · none · ref 205 · internal anchor
AIR augments activation-aware SVD compression of LLMs with an influence metric and a closed-form ALS update, claiming >18% perplexity improvement at 60% parameter retention and 90% less calibration data than SVD-LLM(W).
The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning cs.LG · 2026-06-05 · unverdicted · none · ref 14 · internal anchor
Full fine-tuning causes negative transfer and performance collapse in sub-300M SLMs on math tasks, establishing PEFT as a stability requirement.
Spike-Aware C++ INT8 Inference for Sparse Spiking Language Models on Commodity CPUs cs.NE · 2026-06-02 · unverdicted · none · ref 10 · internal anchor
A spike-aware C++ INT8 runtime for sparse spiking LMs delivers 22.63 tokens/s single-thread on Ryzen 7, beating several Q8_0 dense models in llama.cpp while cutting weights from 3.49 GB to 1.06 GB, at the cost of higher perplexity.
MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency cs.LG · 2026-06-02 · unverdicted · none · ref 70 · internal anchor
MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.
Gravity-Aware Hierarchical Routing for Lightweight SensorLLM on Human Activity Recognition eess.SP · 2026-06-01 · unverdicted · none · ref 5 · internal anchor
Introduces a lightweight gravity-aware routing head that improves macro-F1 on static classes in compressed SensorLLM for human activity recognition on the MHealth dataset.
Representation Collapse in Sequential Post-Training of Large Language Models cs.LG · 2026-05-28 · unverdicted · none · ref 49 · internal anchor
Sequential post-training of LLMs induces representation collapse that correlates with reduced plasticity, weaker generalization, and poorer calibration, with lightweight interventions tested to mitigate it.
Spectral structural distortion reveals redundant neurons in neural networks cs.LG · 2026-05-14 · unverdicted · none · ref 16 · internal anchor
A graph-spectral importance score based on layer-wise structural distortion between pre- and post-activation neuron graphs identifies removable neurons for iterative pruning without intermediate updates, followed by recovery fine-tuning.
DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models cs.LG · 2026-05-11 · unverdicted · none · ref 39 · internal anchor
DP-LAC provides a new adaptive clipping technique for DP-SGD in federated LLM fine-tuning that improves accuracy by 6.6% on average without consuming additional privacy budget or requiring new hyperparameters.
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization cs.LG · 2026-05-05 · unverdicted · none · ref 7 · 2 links · internal anchor
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 46 · internal anchor
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods cs.LG · 2026-04-19 · unverdicted · none · ref 52 · internal anchor
ADAPT is an online reweighting framework for LLM training that outperforms offline data selection and mixing methods in cross-benchmark generalization under equal compute.
Acceptance Dynamics Across Cognitive Domains in Speculative Decoding cs.AI · 2026-04-16 · unverdicted · none · ref 12 · internal anchor
Empirical measurements across four NLP domains show task type is a stronger predictor of speculative decoding acceptance than tree depth, with chat uniquely achieving expected accepted length over 1 token per step.

TinyLlama: An Open-Source Small Language Model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer