Deep Biaffine Attention for Neural Dependency Parsing

Timothy Dozat , Christopher D. Manning

Authors on Pith no claims yet

classification 💻 cs.CL cs.NE

keywords parsergraph-basedapproachesattentionbiaffinedependencygoldbergkiperwasser

read the original abstract

This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and 2.2%---and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Relational Probing: LM-to-Graph Adaptation for Financial Prediction
cs.CL 2026-04 unverdicted novelty 6.0

Relational Probing replaces the LM output head with a trainable relation head that induces graphs from hidden states and optimizes them end-to-end for stock trend prediction, showing gains over co-occurrence baselines.
Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages
cs.CL 2026-05 unverdicted novelty 5.0

Biaffine LSTM outperforms transformer parsers like AfroXLMR and RemBERT in low-resource dependency parsing, with transformers gaining advantage as data increases and morphological complexity as a secondary predictor.