Bert: Pre-training of deep bidirectional trans- formers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova · 2019

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Depth Adaptive Efficient Visual Autoregressive Modeling

cs.CV · 2026-04-19 · unverdicted · novelty 7.0

DepthVAR adaptively allocates per-token computational depth in VAR models using a cyclic rotated scheduler and dynamic layer masking to achieve 2.3-3.1x inference speedup with minimal quality loss.

FILTR: Extracting Topological Features from Pretrained 3D Models

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

FILTR predicts persistence diagrams from pretrained 3D encoders on the new DONUT benchmark, showing limited topological signals in encoders but successful approximation via learnable feed-forward.

MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

MODIX dynamically rescales positional indices in VLMs using intra-modal covariance-based entropy and inter-modal alignment scores to allocate finer granularity to informative content.

DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

DSAA improves fine-grained open-vocabulary object detection by injecting attribute priors via APA in text embeddings, modulating K/V vectors in BERT, and using an attribute-aware contrastive loss, with gains shown on the FG-OVD benchmark.

On The Application of Linear Attention in Multimodal Transformers

cs.CV · 2026-04-11 · unverdicted · novelty 4.0

Linear attention delivers significant computational savings in multimodal transformers and follows the same scaling laws as softmax attention on ViT models trained on LAION-400M with ImageNet-21K zero-shot validation.

Two-Stage Multimodal Framework for Emotion Mimicry Intensity Prediction

cs.CV · 2026-05-21 · unverdicted · novelty 3.0

A staged multimodal fusion model for predicting six continuous emotion intensities from in-the-wild video achieves 0.4722 validation and 0.57 test Pearson correlation in the EMI challenge.

citing papers explorer

Showing 6 of 6 citing papers.

Depth Adaptive Efficient Visual Autoregressive Modeling cs.CV · 2026-04-19 · unverdicted · none · ref 12
DepthVAR adaptively allocates per-token computational depth in VAR models using a cyclic rotated scheduler and dynamic layer masking to achieve 2.3-3.1x inference speedup with minimal quality loss.
FILTR: Extracting Topological Features from Pretrained 3D Models cs.CV · 2026-04-24 · unverdicted · none · ref 14
FILTR predicts persistence diagrams from pretrained 3D encoders on the new DONUT benchmark, showing limited topological signals in encoders but successful approximation via learnable feed-forward.
MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models cs.CV · 2026-04-14 · unverdicted · none · ref 10
MODIX dynamically rescales positional indices in VLMs using intra-modal covariance-based entropy and inter-modal alignment scores to allocate finer granularity to informative content.
DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection cs.CV · 2026-05-18 · unverdicted · none · ref 4
DSAA improves fine-grained open-vocabulary object detection by injecting attribute priors via APA in text embeddings, modulating K/V vectors in BERT, and using an attribute-aware contrastive loss, with gains shown on the FG-OVD benchmark.
On The Application of Linear Attention in Multimodal Transformers cs.CV · 2026-04-11 · unverdicted · none · ref 8
Linear attention delivers significant computational savings in multimodal transformers and follows the same scaling laws as softmax attention on ViT models trained on LAION-400M with ImageNet-21K zero-shot validation.
Two-Stage Multimodal Framework for Emotion Mimicry Intensity Prediction cs.CV · 2026-05-21 · unverdicted · none · ref 4
A staged multimodal fusion model for predicting six continuous emotion intensities from in-the-wild video achieves 0.4722 validation and 0.57 test Pearson correlation in the EMI challenge.

Bert: Pre-training of deep bidirectional trans- formers for language understanding

fields

years

verdicts

representative citing papers

citing papers explorer