pith. sign in

Mechanistic design and scaling of hybrid architectures

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

years

2026 4 2025 2

verdicts

UNVERDICTED 6

representative citing papers

Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Flash PD-SSM achieves FSA-level expressivity by discretely selecting one matrix from a trainable set of structured sparse transition matrices at each time step while preserving the runtime and memory efficiency of standard structured SSMs.

RubiConv -- Efficient Boundary-Respecting Convolutions

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RubiConv enables boundary-respecting convolutions on packed sequences using an efficient algorithm that outperforms both attention and standard FFT baselines in speed.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

cs.CL · 2025-09-17 · unverdicted · novelty 6.0

ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

cs.CL · 2025-10-06 · unverdicted · novelty 4.0

This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.

citing papers explorer

Showing 6 of 6 citing papers.

  • Towards Understanding Self-Pretraining for Sequence Classification cs.LG · 2026-05-20 · unverdicted · none · ref 92

    Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.

  • Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models cs.LG · 2026-05-18 · unverdicted · none · ref 41

    Flash PD-SSM achieves FSA-level expressivity by discretely selecting one matrix from a trainable set of structured sparse transition matrices at each time step while preserving the runtime and memory efficiency of standard structured SSMs.

  • RubiConv -- Efficient Boundary-Respecting Convolutions cs.LG · 2026-05-08 · unverdicted · none · ref 16

    RubiConv enables boundary-respecting convolutions on packed sequences using an efficient algorithm that outperforms both attention and standard FFT baselines in speed.

  • ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 51

    ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

  • ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution cs.CL · 2025-09-17 · unverdicted · none · ref 189

    ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

  • Hybrid Architectures for Language Models: Systematic Analysis and Design Insights cs.CL · 2025-10-06 · unverdicted · none · ref 40

    This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.