hub

Neural Network Ac- ceptability Judgments

URL https://arxiv · 2019 · arXiv 1805.12471

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

cs.CL · 2020-03-23 · conditional · novelty 8.0

ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.

PRIMETIME : Limits of LLMs in Temporal Primitives

cs.NE · 2025-04-22 · unverdicted · novelty 7.0

PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

LoRA: Low-Rank Adaptation of Large Language Models

cs.CL · 2021-06-17 · accept · novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

cs.CL · 2019-10-29 · accept · novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

cs.CL · 2019-09-26 · accept · novelty 7.0

ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

cs.CL · 2018-04-20 · unverdicted · novelty 7.0

GLUE is a multi-task benchmark for general natural language understanding that includes a diagnostic test suite and finds limited gains from current multi-task learning methods over single-task training.

STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning

cs.AI · 2026-06-07 · unverdicted · novelty 6.0

STAR rethinks MoE routing as structure-aware subspace learning by adding a GHA-tracked principal subspace to standard routers, yielding more stable specialization and better performance on synthetic, language, and vision tasks.

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

cs.CR · 2026-05-06 · conditional · novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style

cs.CL · 2026-04-27 · conditional · novelty 6.0

Post-editing LLM text increases stylistic similarity to the user's own writing yet keeps it closer to LLM output than human text and lowers diversity.

HyperAdapt: Simple High-Rank Adaptation

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

cs.CL · 2019-10-09 · accept · novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

cs.CL · 2019-05-02 · accept · novelty 6.0

SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.

Convex Dataset Valuation for Post-Training

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

cs.CL · 2019-07-26 · accept · novelty 5.0

With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.

Calibrating Microgrid Simulations for Energy-Aware Computing Systems

cs.DC · 2026-03-14 · unverdicted · novelty 4.0

A self-calibrating testbed using Vessim and Kepler with real-node calibration achieves R² of 0.95 for computing node power approximation in microgrid simulations.

citing papers explorer

Showing 17 of 17 citing papers.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators cs.CL · 2020-03-23 · conditional · none · ref 10
ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.
PRIMETIME : Limits of LLMs in Temporal Primitives cs.NE · 2025-04-22 · unverdicted · none · ref 69
PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.
OPT: Open Pre-trained Transformer Language Models cs.CL · 2022-05-02 · unverdicted · none · ref 4
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
LoRA: Low-Rank Adaptation of Large Language Models cs.CL · 2021-06-17 · accept · none · ref 55
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension cs.CL · 2019-10-29 · accept · none · ref 21
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations cs.CL · 2019-09-26 · accept · none · ref 37
ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding cs.CL · 2018-04-20 · unverdicted · none · ref 48
GLUE is a multi-task benchmark for general natural language understanding that includes a diagnostic test suite and finds limited gains from current multi-task learning methods over single-task training.
STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning cs.AI · 2026-06-07 · unverdicted · none · ref 28
STAR rethinks MoE routing as structure-aware subspace learning by adding a GHA-tracked principal subspace to standard routers, yielding more stable specialization and better performance on synthetic, language, and vision tasks.
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts cs.CL · 2026-05-13 · unverdicted · none · ref 33
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference cs.CR · 2026-05-06 · conditional · none · ref 70
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style cs.CL · 2026-04-27 · conditional · none · ref 4
Post-editing LLM text increases stylistic similarity to the user's own writing yet keeps it closer to LLM output than human text and lowers diversity.
HyperAdapt: Simple High-Rank Adaptation cs.LG · 2025-09-23 · unverdicted · none · ref 39
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
HuggingFace's Transformers: State-of-the-art Natural Language Processing cs.CL · 2019-10-09 · accept · none · ref 37
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems cs.CL · 2019-05-02 · accept · none · ref 147
SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.
Convex Dataset Valuation for Post-Training cs.LG · 2026-05-15 · unverdicted · none · ref 24
A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.
RoBERTa: A Robustly Optimized BERT Pretraining Approach cs.CL · 2019-07-26 · accept · none · ref 46
With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.
Calibrating Microgrid Simulations for Energy-Aware Computing Systems cs.DC · 2026-03-14 · unverdicted · none · ref 88
A self-calibrating testbed using Vessim and Kepler with real-node calibration achieves R² of 0.95 for computing node power approximation in microgrid simulations.

Neural Network Ac- ceptability Judgments

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer