pith. sign in

hub Mixed citations

Textbooks Are All You Need II: phi-1.5 technical report

Mixed citation behavior. Most common role is background (64%).

40 Pith papers citing it
Background 64% of classified citations
abstract

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.

hub tools

citation-role summary

background 9 method 4 dataset 1

citation-polarity summary

representative citing papers

Mema: Memory-Augmented Adapter for Enhanced Vision-Language Understanding

cs.CV · 2026-02-28 · unverdicted · novelty 7.0

Mema adds a stateful memory module to vision encoders that accumulates hierarchical visual features across layers and selectively injects portions back via feedback to preserve fine-grained cues, yielding consistent gains on multimodal benchmarks.

Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Hyperfitting improves LLM generation via context-dependent rank reordering from geometric expansion in the terminal transformer block, distinct from temperature scaling, and enables efficient Late-Stage LoRA fine-tuning.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

OpenVLA: An Open-Source Vision-Language-Action Model

cs.RO · 2024-06-13 · unverdicted · novelty 6.0

OpenVLA achieves 16.5% higher task success than the 55B RT-2-X model across 29 tasks with 7x fewer parameters while enabling effective fine-tuning and quantization without performance loss.

citing papers explorer

Showing 40 of 40 citing papers.