hub

Deep contextualized word representations

Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer · 2018 · cs.CL · arXiv 1802.05365

39 Pith papers cite this work. Polarity classification is still indexing.

39 Pith papers citing it

open full Pith review browse 39 citing papers arXiv PDF

abstract

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

A Hormone-inspired Emotion Layer for Transformer language models (HELT)

cs.NE · 2026-04-13 · unverdicted · novelty 7.0

HormoneT5 augments T5 with a hormone-inspired block that predicts six continuous emotion values and uses them to modulate responses, reporting over 85% per-hormone accuracy and human preference for emotional quality.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

GraphCodeBERT: Pre-training Code Representations with Data Flow

cs.SE · 2020-09-17 · accept · novelty 7.0

GraphCodeBERT uses data flow graphs in pre-training to capture semantic code structure and reaches state-of-the-art results on code search, clone detection, translation, and refinement.

PubMedQA: A Dataset for Biomedical Research Question Answering

cs.CL · 2019-09-13 · unverdicted · novelty 7.0

PubMedQA supplies 273k+ biomedical QA instances that require reasoning over research abstracts to produce yes/no/maybe answers.

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

cs.CL · 2019-08-27 · unverdicted · novelty 7.0

FinBERT adapts BERT to the financial domain and outperforms prior state-of-the-art methods on financial sentiment analysis tasks.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

cs.CL · 2019-06-19 · accept · novelty 7.0

XLNet is a generalized autoregressive pretraining method that learns bidirectional contexts via permutation-based factorization and outperforms BERT on 20 NLP tasks.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

cs.CL · 2019-10-29 · accept · novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

Fine-Tuning Language Models from Human Preferences

cs.CL · 2019-09-18 · unverdicted · novelty 7.0

Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

cs.CL · 2019-09-17 · unverdicted · novelty 7.0

Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.

Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting

cs.IR · 2026-05-18 · unverdicted · novelty 6.0

Introduces TableGrid Navigation (TGN) and Progressive Inference Prompting (PIP) as training-free structured prompting frameworks that improve LLM performance on table question answering over baselines on TableBench and achieve SOTA on FeTaQa.

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.

Demystifying CLIP Data

cs.CV · 2023-09-28 · accept · novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

cs.CL · 2020-02-10 · accept · novelty 6.0

Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.

CTRL: A Conditional Transformer Language Model for Controllable Generation

cs.CL · 2019-09-11 · unverdicted · novelty 6.0

CTRL is a large conditional transformer language model that uses naturally occurring control codes to steer text generation style and content.

Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis

cs.CL · 2019-06-24 · unverdicted · novelty 6.0

Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.

BoolXLLM: LLM-Assisted Explainability for Boolean Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while matching or exceeding it on two text-classification benchmarks and compressing the

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

cs.CL · 2020-02-19 · unverdicted · novelty 6.0

CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

cs.CL · 2019-10-09 · accept · novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.

citing papers explorer

Showing 39 of 39 citing papers.

A Hormone-inspired Emotion Layer for Transformer language models (HELT) cs.NE · 2026-04-13 · unverdicted · none · ref 55 · internal anchor
HormoneT5 augments T5 with a hormone-inspired block that predicts six continuous emotion values and uses them to modulate responses, reporting over 85% per-hormone accuracy and human preference for emotional quality.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 71 · internal anchor
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
GraphCodeBERT: Pre-training Code Representations with Data Flow cs.SE · 2020-09-17 · accept · none · ref 19 · internal anchor
GraphCodeBERT uses data flow graphs in pre-training to capture semantic code structure and reaches state-of-the-art results on code search, clone detection, translation, and refinement.
PubMedQA: A Dataset for Biomedical Research Question Answering cs.CL · 2019-09-13 · unverdicted · none · ref 42 · internal anchor
PubMedQA supplies 273k+ biomedical QA instances that require reasoning over research abstracts to produce yes/no/maybe answers.
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models cs.CL · 2019-08-27 · unverdicted · none · ref 24 · internal anchor
FinBERT adapts BERT to the financial domain and outperforms prior state-of-the-art methods on financial sentiment analysis tasks.
XLNet: Generalized Autoregressive Pretraining for Language Understanding cs.CL · 2019-06-19 · accept · none · ref 27 · internal anchor
XLNet is a generalized autoregressive pretraining method that learns bidirectional contexts via permutation-based factorization and outperforms BERT on 20 NLP tasks.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension cs.CL · 2019-10-29 · accept · none · ref 17
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019-10-23 · unverdicted · none · ref 53
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
Fine-Tuning Language Models from Human Preferences cs.CL · 2019-09-18 · unverdicted · none · ref 21
Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism cs.CL · 2019-09-17 · unverdicted · none · ref 25
Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.
Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting cs.IR · 2026-05-18 · unverdicted · none · ref 15 · internal anchor
Introduces TableGrid Navigation (TGN) and Progressive Inference Prompting (PIP) as training-free structured prompting frameworks that improve LLM performance on table question answering over baselines on TableBench and achieve SOTA on FeTaQa.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 68 · internal anchor
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Demystifying CLIP Data cs.CV · 2023-09-28 · accept · none · ref 105 · internal anchor
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 184 · internal anchor
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 190 · internal anchor
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
How Much Knowledge Can You Pack Into the Parameters of a Language Model? cs.CL · 2020-02-10 · accept · none · ref 63 · internal anchor
Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.
CTRL: A Conditional Transformer Language Model for Controllable Generation cs.CL · 2019-09-11 · unverdicted · none · ref 35 · internal anchor
CTRL is a large conditional transformer language model that uses naturally occurring control codes to steer text generation style and content.
Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis cs.CL · 2019-06-24 · unverdicted · none · ref 28 · internal anchor
Authors release a new 800-sentence gender-balanced profession dataset and use it to test occupational gender stereotypes in three sentiment analysis models.
BoolXLLM: LLM-Assisted Explainability for Boolean Models cs.AI · 2026-05-12 · unverdicted · none · ref 13
BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models cs.CL · 2026-04-27 · unverdicted · none · ref 10
ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while matching or exceeding it on two text-classification benchmarks and compressing the
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 100
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 45
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
CodeBERT: A Pre-Trained Model for Programming and Natural Languages cs.CL · 2020-02-19 · unverdicted · none · ref 50
CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.
HuggingFace's Transformers: State-of-the-art Natural Language Processing cs.CL · 2019-10-09 · accept · none · ref 175
Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
Knowledge-aware Pronoun Coreference Resolution cs.CL · 2019-07-08 · unverdicted · none · ref 15 · internal anchor
A neural model with knowledge attention outperforms baselines on pronoun coreference in two domains and cross-domain by incorporating KG triplets.
Patent Claim Generation by Fine-Tuning OpenAI GPT-2 cs.CL · 2019-07-01 · unverdicted · none · ref 1 · internal anchor
Fine-tunes GPT-2 on patent claims, probes training steps, analyzes conditional and unconditional sampling outputs, proposes a new sampling method, and releases an email bot for exploration.
Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments cs.LG · 2019-06-27 · unverdicted · none · ref 29 · internal anchor
Empirical comparison finds that self-supervised representations vary in capturing agent state and generalizing to new levels or textures depending on environment visuals and dynamics.
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission cs.CL · 2019-04-10 · accept · none · ref 24 · internal anchor
ClinicalBERT applies BERT-style transformers to clinical notes and outperforms baselines on 30-day readmission prediction while revealing human-judged medical concept links.
Automatic Reflection Level Classification in Hungarian Student Essays cs.CL · 2026-05-04 · unverdicted · none · ref 34
Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.
ClinQueryAgent: A Conversational Agent for Population Health Management cs.IR · 2026-04-13 · unverdicted · none · ref 170 · internal anchor
The paper introduces ClinQueryAgent, a conversational agent that converts natural language queries into database queries for population health management while keeping patient data secure, and reports its use by 128 staff across 15 NHS practices covering 148,319 patients.
Baichuan 2: Open Large-scale Language Models cs.CL · 2023-09-19 · unverdicted · none · ref 51 · internal anchor
Baichuan 2 presents 7B and 13B LLMs trained on 2.6T tokens that match or exceed similar open models on MMLU, CMMLU, GSM8K, HumanEval and excel in medicine and law.
Automatically Learning Construction Injury Precursors from Text cs.CL · 2019-07-26 · unverdicted · none · ref 23 · internal anchor
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.
Investigating Self-Attention Network for Chinese Word Segmentation cs.CL · 2019-07-26 · unverdicted · none · ref 8 · internal anchor
Self-attention networks achieve competitive results to BiLSTM-CRF on Chinese word segmentation, with BERT and word integration yielding the best reported performance on six heterogeneous domain benchmarks.
Neural Language Model Based Training Data Augmentation for Weakly Supervised Early Rumor Detection cs.CL · 2019-07-16 · unverdicted · none · ref 17 · internal anchor
Neural language model augments limited labeled rumor tweets using unlabeled event data, expanding datasets by ~200% and improving F-score by 12.1% in detection models.
Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study cs.CL · 2019-07-04 · unverdicted · none · ref 14 · internal anchor
Finetuning GPT-1 on 150000 unlabeled Reachout.com posts then feeding the features into AutoML yields a new state-of-the-art macro F1 of 0.572 for triaging risk in 1588 labeled CLPsych 2017 posts without metadata or history.
SleepNet and DreamNet: Enriching and Reconstructing Representations for Consolidated Visual Classification cs.LG · 2024-09-03 · unverdicted · none · ref 33 · internal anchor
SleepNet and DreamNet enrich visual features via supervised pre-trained encoders and reconstruct hidden states with encoder-decoder frameworks to outperform prior state-of-the-art classifiers.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 23
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
Machine Reading Comprehension: a Literature Review cs.CL · 2019-06-30 · unverdicted · none · ref 35 · internal anchor
A 2019 survey of machine reading comprehension corpora and methods.
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unreviewed · ref 19 · internal anchor

Deep contextualized word representations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer