Deep Biaffine Attention for Neural Dependency Parsing

Christopher D. Manning; Timothy Dozat

arxiv: 1611.01734 · v3 · pith:ZONVMLECnew · submitted 2016-11-06 · 💻 cs.CL · cs.NE

Deep Biaffine Attention for Neural Dependency Parsing

Timothy Dozat , Christopher D. Manning This is my paper

classification 💻 cs.CL cs.NE

keywords parsergraph-basedapproachesattentionbiaffinedependencygoldbergkiperwasser

0 comments

read the original abstract

This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and 2.2%---and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models
cs.IR 2026-04 unverdicted novelty 7.0

M3DocDep is an LVLM pipeline that extracts multimodal block embeddings, scores parent-child edges with a biaffine head, decodes a valid dependency tree via MST, and produces section-path-annotated chunks, yielding rep...
Relational Probing: LM-to-Graph Adaptation for Financial Prediction
cs.CL 2026-04 unverdicted novelty 6.0

Relational Probing replaces the LM output head with a trainable relation head that induces graphs from hidden states and optimizes them end-to-end for stock trend prediction, showing gains over co-occurrence baselines.
A Generative Model for Punctuation in Dependency Trees
cs.CL 2019-06 unverdicted novelty 6.0

A generative model of latent underlying punctuation in dependency trees, trained on incomplete data via local likelihood maximization, produces plausible reconstructions across languages and beats baselines on restoration.
Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages
cs.CL 2026-05 unverdicted novelty 5.0

Biaffine LSTM outperforms transformer parsers like AfroXLMR and RemBERT in low-resource dependency parsing, with transformers gaining advantage as data increases and morphological complexity as a secondary predictor.