pith. sign in

arxiv: 2301.13848 · v1 · pith:6L7Q2QPMnew · submitted 2023-01-31 · 💻 cs.CL · cs.AI· cs.LG

Benchmarking Large Language Models for News Summarization

classification 💻 cs.CL cs.AIcs.LG
keywords humanllmssummariessummarizationevaluationfindlanguagelarge
0
0 comments X
read the original abstract

Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability. Second, existing studies have been limited by low-quality references, leading to underestimates of human performance and lower few-shot and finetuning performance. To better evaluate LLMs, we perform human evaluation over high-quality summaries we collect from freelance writers. Despite major stylistic differences such as the amount of paraphrasing, we find that LMM summaries are judged to be on par with human written summaries.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

    cs.LG 2023-06 unverdicted novelty 6.0

    H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.

  2. HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

    cs.DC 2026-04 unverdicted novelty 5.0

    HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, pl...

  3. Hallucination of Multimodal Large Language Models: A Survey

    cs.CV 2024-04 accept novelty 5.0

    The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.

  4. A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs

    cs.CL 2026-06 unverdicted novelty 3.0

    A tree-of-thoughts inspired hybrid extractive-abstractive LLM prompt yields better legal case judgment summaries than standard extractive or abstractive prompts.