pith. sign in

arxiv: 1810.09305 · v1 · pith:2GMPFPXQnew · submitted 2018-10-18 · 💻 cs.CL · cs.IR· cs.LG

WikiHow: A Large Scale Text Summarization Dataset

classification 💻 cs.CL cs.IRcs.LG
keywords wikihowarticlesavailabledatasetperformancesummarizationabstractionabstractive
0
0 comments X
read the original abstract

Sequence-to-sequence models have recently gained the state of the art performance in summarization. However, not too many large-scale high-quality datasets are available and almost all the available ones are mainly news articles with specific writing style. Moreover, abstractive human-style systems involving description of the content at a deeper level require data with higher levels of abstraction. In this paper, we present WikiHow, a dataset of more than 230,000 article and summary pairs extracted and constructed from an online knowledge base written by different human authors. The articles span a wide range of topics and therefore represent high diversity styles. We evaluate the performance of the existing methods on WikiHow to present its challenges and set some baselines to further improve it.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    cs.RO 2024-03 accept novelty 8.0

    BEHAVIOR-1K introduces a benchmark of 1,000 human everyday activities in realistic simulated scenes together with the OMNIGIBSON physics simulator to evaluate embodied AI.

  2. What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

    cs.CL 2019-07 unverdicted novelty 8.0

    Introduces extreme summarization as a one-sentence abstractive task, a new BBC dataset, and a topic-conditioned CNN model that outperforms extractive and abstractive baselines on automatic and human evaluations.

  3. Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

    cs.AI 2026-05 unverdicted novelty 7.0

    Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action u...

  4. Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

    cs.CV 2024-10 unverdicted novelty 7.0

    Janus decouples visual encoding into task-specific pathways inside a single autoregressive transformer to unify multimodal understanding and generation while outperforming earlier unified models.

  5. How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

    cs.CL 2026-05 unverdicted novelty 6.0

    The authors introduce a register-aware evaluation framework that compares LLM outputs to human reference corpora via Biber's lexico-grammatical features and MMD across five English registers.

  6. MUDY: Multi-Granular Dynamic Candidate Contextualization for Unsupervised Keyphrase Extraction

    cs.IR 2026-05 unverdicted novelty 6.0

    MUDY improves unsupervised keyphrase extraction by combining prompt-based scoring with candidate-aware weighting and self-attention-based multi-granular scoring to capture both local and global contextual salience, ou...

  7. Learning to Control Summaries with Score Ranking

    cs.CL 2026-04 unverdicted novelty 6.0

    A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.

  8. Retrieval-Augmented Generation with Graphs (GraphRAG)

    cs.IR 2024-12 unverdicted novelty 5.0

    A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.

  9. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

    cs.CV 2024-12 accept novelty 5.0

    DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B a...