hub

PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =

Paszke, Adam, Gross, Sam, Massa, Francisco, Lerer, Adam, Bradbury, James, Chanan, Gregory

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

browse 10 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

How Many Different Outputs Can a Transformer Generate?

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

How does feature learning reshape the function space?

stat.ML · 2026-05-18 · unverdicted · novelty 6.0

In the high-dimensional proportional regime, a large gradient step on a two-layer network induces a target-dependent spiked Gaussian covariance on the features, yielding a data-adaptive kernel that amplifies target-aligned eigenvalues and mixes leading eigenfunctions.

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.

torchtune: PyTorch native post-training library

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.

Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

KUP-BI distills continuation-style knowledge from a train-only historical library to supply an approximate post-target proxy that is fused into forecasting backbones for improved performance on public datasets.

Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation

cs.CL · 2026-04-23 · unverdicted · novelty 5.0

KARITA integrates knowledge-driven augmentation and retrieval to improve classification performance under temporal shifts across clinical, legal, and scientific domains.

Model-Agnostic Meta Learning for Class Imbalance Adaptation

cs.CL · 2026-04-20 · conditional · novelty 5.0

HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.

More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

cs.CL · 2026-05-21

citing papers explorer

Showing 10 of 10 citing papers.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues cs.CL · 2026-05-08 · conditional · none · ref 40
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
How Many Different Outputs Can a Transformer Generate? cs.LG · 2026-05-21 · unverdicted · none · ref 18
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Forecasting Downstream Performance of LLMs With Proxy Metrics cs.CL · 2026-05-18 · unverdicted · none · ref 102
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
How does feature learning reshape the function space? stat.ML · 2026-05-18 · unverdicted · none · ref 4
In the high-dimensional proportional regime, a large gradient step on a two-layer network induces a target-dependent spiked Gaussian covariance on the features, yielding a data-adaptive kernel that amplifies target-aligned eigenvalues and mixes leading eigenfunctions.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 65
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
torchtune: PyTorch native post-training library cs.LG · 2026-05-20 · unverdicted · none · ref 18
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.
Beyond Extrapolation: Knowledge Utilization Paradigm with Bidirectional Inspiration for Time Series Forecasting cs.LG · 2026-05-19 · unverdicted · none · ref 48
KUP-BI distills continuation-style knowledge from a train-only historical library to supply an approximate post-target proxy that is fused into forecasting backbones for improved performance on public datasets.
Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation cs.CL · 2026-04-23 · unverdicted · none · ref 46
KARITA integrates knowledge-driven augmentation and retrieval to improve classification performance under temporal shifts across clinical, legal, and scientific domains.
Model-Agnostic Meta Learning for Class Imbalance Adaptation cs.CL · 2026-04-20 · conditional · none · ref 1
HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts cs.CL · 2026-05-21 · unreviewed · ref 17

PyTorch: An Imperative Style, High-Performance Deep Learning Library , url =

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer