ELECTRA replaces masked language modeling with replaced token detection, yielding contextual representations that outperform BERT at equal compute and match larger models like RoBERTa with far less compute.
hub
Neural Network Ac- ceptability Judgments
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.
GLUE is a multi-task benchmark for general natural language understanding that includes a diagnostic test suite and finds limited gains from current multi-task learning methods over single-task training.
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Post-editing LLM text increases stylistic similarity to the user's own writing yet keeps it closer to LLM output than human text and lowers diversity.
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.
A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.
With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.
A self-calibrating testbed using Vessim and Kepler with real-node calibration achieves R² of 0.95 for computing node power approximation in microgrid simulations.
citing papers explorer
-
PRIMETIME : Limits of LLMs in Temporal Primitives
PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
GLUE is a multi-task benchmark for general natural language understanding that includes a diagnostic test suite and finds limited gains from current multi-task learning methods over single-task training.
-
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
-
HyperAdapt: Simple High-Rank Adaptation
HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.
-
Convex Dataset Valuation for Post-Training
A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.
-
Calibrating Microgrid Simulations for Energy-Aware Computing Systems
A self-calibrating testbed using Vessim and Kepler with real-node calibration achieves R² of 0.95 for computing node power approximation in microgrid simulations.