Learning Soft Linear Constraints with Application to Citation Field Extraction

Alexandre Passos; Andrew McCallum; David Belanger; Sam Anzaroot

arxiv: 1403.1349 · v2 · pith:TWNMIZM6new · submitted 2014-03-06 · 💻 cs.CL · cs.DL· cs.IR

Learning Soft Linear Constraints with Application to Citation Field Extraction

Sam Anzaroot , Alexandre Passos , David Belanger , Andrew McCallum This is my paper

classification 💻 cs.CL cs.DLcs.IR

keywords constraintssoftcitationtechniquechallengingextractiongiveninference

0 comments

read the original abstract

Accurately segmenting a citation string into fields for authors, titles, etc. is a challenging task because the output typically obeys various global constraints. Previous work has shown that modeling soft constraints, where the model is encouraged, but not require to obey the constraints, can substantially improve segmentation performance. On the other hand, for imposing hard constraints, dual decomposition is a popular technique for efficient prediction given existing algorithms for unconstrained inference. We extend the technique to perform prediction subject to soft constraints. Moreover, with a technique for performing inference given soft constraints, it is easy to automatically generate large families of constraints and learn their costs with a simple convex optimization problem during training. This allows us to obtain substantial gains in accuracy on a new, challenging citation extraction dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
cs.CV 2026-05 conditional novelty 7.0

WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
Source or It Didn't Happen: A Multi-Agent Framework for Citation Hallucination Detection
cs.CL 2026-05 accept novelty 7.0

CiteTracer detects citation hallucinations at 97.1% accuracy on synthetic and real-world benchmarks by combining structured extraction, multi-source retrieval, deterministic matching, and class-specialist agents.
Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing
cs.CV 2026-05 unverdicted novelty 6.0

Occlusion-aware keyframe selection via structural, cycle-consistent tracking, and vision-language criteria improves diffusion video editing robustness without manual annotations.
DisImpact: Quantifying the Physi-Social Impact of Natural Disasters Through Social Media
cs.SI 2026-05 unverdicted novelty 6.0

DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning
cs.GR 2026-05 unverdicted novelty 6.0

QuadLink generates anisotropic quad-dominant meshes from point clouds via a hybrid centroid-conditioned vertex linking model and a Tri-to-Quad data conversion operator.
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
cs.CV 2026-05 unverdicted novelty 6.0

A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topolog...
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
cs.CV 2026-05 unverdicted novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
cs.CL 2026-04 unverdicted novelty 5.0

QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
Are Researchers Being Replaced by Artificial Intelligence?
cs.CY 2026-04 unverdicted novelty 3.0

AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.