WCXB provides 2,008 annotated multi-type web pages and shows extraction systems perform well on articles but diverge on structured pages.
ReaderLM-v2: HTML to markdown with a small language model.arXiv:2503.01151, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
CONDITIONAL 2representative citing papers
SUPERGLASSES is the first VQA benchmark built from actual smart glasses data, and SUPERLENS is an agent using automatic object detection, query decoupling, and multimodal search that outperforms GPT-4o by 2.19% on it.
citing papers explorer
-
WCXB: A Multi-Type Web Content Extraction Benchmark
WCXB provides 2,008 annotated multi-type web pages and shows extraction systems perform well on articles but diverge on structured pages.
-
SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses
SUPERGLASSES is the first VQA benchmark built from actual smart glasses data, and SUPERLENS is an agent using automatic object detection, query decoupling, and multimodal search that outperforms GPT-4o by 2.19% on it.