A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
hub
PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 10roles
other 1polarities
unclear 1representative citing papers
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
In the high-dimensional proportional regime, a large gradient step on a two-layer network induces a target-dependent spiked Gaussian covariance on the features, yielding a data-adaptive kernel that amplifies target-aligned eigenvalues and mixes leading eigenfunctions.
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.
KUP-BI distills continuation-style knowledge from a train-only historical library to supply an approximate post-target proxy that is fused into forecasting backbones for improved performance on public datasets.
KARITA integrates knowledge-driven augmentation and retrieval to improve classification performance under temporal shifts across clinical, legal, and scientific domains.
HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.
citing papers explorer
-
Is She Even Relevant? When BERT Ignores Explicit Gender Cues
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
-
How Many Different Outputs Can a Transformer Generate?
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
-
Forecasting Downstream Performance of LLMs With Proxy Metrics
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
-
How does feature learning reshape the function space?
In the high-dimensional proportional regime, a large gradient step on a two-layer network induces a target-dependent spiked Gaussian covariance on the features, yielding a data-adaptive kernel that amplifies target-aligned eigenvalues and mixes leading eigenfunctions.
-
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
-
torchtune: PyTorch native post-training library
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.
-
Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting
KUP-BI distills continuation-style knowledge from a train-only historical library to supply an approximate post-target proxy that is fused into forecasting backbones for improved performance on public datasets.
-
Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation
KARITA integrates knowledge-driven augmentation and retrieval to improve classification performance under temporal shifts across clinical, legal, and scientific domains.
-
Model-Agnostic Meta Learning for Class Imbalance Adaptation
HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.
- More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts