pith. sign in

hub

Differential transformer

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

citation-role summary

background 3

citation-polarity summary

years

2026 10 2025 5

roles

background 3

polarities

background 3

representative citing papers

IAFormer: Interaction-Aware Transformer network for collider data analysis

hep-ph · 2025-05-06 · unverdicted · novelty 7.0

IAFormer uses boost-invariant pairwise quantities and differential attention to create a sparse Transformer that achieves state-of-the-art classification on top-quark and quark-gluon jet datasets while using over an order of magnitude fewer parameters than prior Particle Transformer models.

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

FocuSFT uses an inner optimization loop to adapt fast-weight parameters into a parametric memory that sharpens attention on relevant content, then conditions outer-loop supervised fine-tuning on this representation, yielding gains on long-context benchmarks.

Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG

cs.LG · 2026-02-26 · unverdicted · novelty 6.0

Brain-OF is a multimodal foundation model for fMRI, EEG and MEG using any-resolution sampling, DINT attention with sparse MoE, and masked temporal-frequency pretraining on ~40 datasets to achieve superior downstream performance.

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

Temporal Operator Attention augments softmax attention with learnable sequence-space operators for signed temporal mixing and uses stochastic regularization to enable practical training, yielding consistent gains on time series benchmarks.

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

cs.CL · 2025-10-06 · unverdicted · novelty 4.0

This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.

citing papers explorer

Showing 15 of 15 citing papers.