pith. sign in

arXiv preprint arXiv:2503.05500 , year =

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.CL 4

years

2026 3 2025 1

representative citing papers

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

Should We Still Pretrain Encoders with Masked Language Modeling?

cs.CL · 2025-07-01 · accept · novelty 6.0

Controlled ablations of 38 models find MLM superior to CLM on representation benchmarks while CLM offers better data efficiency and stability; a biphasic CLM-then-MLM schedule is optimal under fixed compute and improves when initialized from pretrained CLM models.

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

cs.CL · 2026-02-17 · unverdicted · novelty 5.0

A distillation-plus-task-contrastive training regimen yields compact embedding models that match or exceed state-of-the-art performance for their size while supporting 32k-token contexts and quantization.

Ideology Prediction of German Political Texts

cs.CL · 2026-05-14 · unverdicted · novelty 4.0

Transformer models predict German political ideology on a continuous left-right scale, reaching F1 0.844 in-domain and MAE 0.172 on newspaper out-of-domain tests.

citing papers explorer

Showing 4 of 4 citing papers.

  • Is She Even Relevant? When BERT Ignores Explicit Gender Cues cs.CL · 2026-05-08 · conditional · none · ref 8

    A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

  • Should We Still Pretrain Encoders with Masked Language Modeling? cs.CL · 2025-07-01 · accept · none · ref 5

    Controlled ablations of 38 models find MLM superior to CLM on representation benchmarks while CLM offers better data efficiency and stability; a biphasic CLM-then-MLM schedule is optimal under fixed compute and improves when initialized from pretrained CLM models.

  • jina-embeddings-v5-text: Task-Targeted Embedding Distillation cs.CL · 2026-02-17 · unverdicted · none · ref 12

    A distillation-plus-task-contrastive training regimen yields compact embedding models that match or exceed state-of-the-art performance for their size while supporting 32k-token contexts and quantization.

  • Ideology Prediction of German Political Texts cs.CL · 2026-05-14 · unverdicted · none · ref 2

    Transformer models predict German political ideology on a continuous left-right scale, reaching F1 0.844 in-domain and MAE 0.172 on newspaper out-of-domain tests.