pith. sign in

hub

Cerebras-gpt: Open compute- optimal language models trained on the cerebras wafer- scale cluster

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Spectral Condition for $\mu$P under Width-Depth Scaling

cs.LG · 2026-02-28 · unverdicted · novelty 6.0

A unified spectral condition for μP under width-depth scaling reveals a transition at k=1 vs k≥2 transformations per residual block and enables stable feature learning for practical architectures like Transformers.

The Falcon Series of Open Language Models

cs.CL · 2023-11-28 · conditional · novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

SpaDA: A Spatial Dataflow Architecture Programming Language

cs.DC · 2025-11-12 · unverdicted · novelty 5.0

SpaDA provides a concise language and multi-level compiler for spatial dataflow hardware that integrates with stencil DSLs and delivers substantial code reduction and high performance on wafer-scale engines.

citing papers explorer

Showing 12 of 12 citing papers.