pith. sign in

arxiv: 1907.07323 · v1 · pith:HTJ5QSDInew · submitted 2019-07-16 · 💻 cs.CL · cs.IR· cs.LG

STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings

Pith reviewed 2026-05-24 21:01 UTC · model grok-4.3

classification 💻 cs.CL cs.IRcs.LG
keywords extractive summarizationsentence embeddingsdense layerdocument embedding transformationCASS datasetFrench court judgmentslightweight training
0
0 comments X

The pith

STRASS learns one dense layer on document embeddings to select extractive summary sentences that match human references.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STRASS as an extractive summarization method that operates inside fixed sentence embedding spaces. It forms a summary by picking the sentences whose embeddings lie closest to a transformed version of the full document embedding. A single dense layer is trained to choose the transformation that brings the automatic selection closest to a human reference summary. Because the layer is small, training runs on a CPU at low cost and inference stays linear in the number of sentences. The authors also release the French CASS dataset of court judgments and show that the method reaches performance levels comparable to existing extractive systems.

Core claim

STRASS creates an extractive summary by selecting the sentences with the closest embeddings to the document embedding. The model learns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. As the transformation is only composed of a dense layer, the training can be done on CPU, therefore, inexpensive. Moreover, inference time is short and linear according to the number of sentences. On the CASS dataset our results show that our method performs similarly to the state of the art extractive methods with effective training and inferring time.

What carries the argument

A single learned dense layer that transforms the document embedding so the nearest sentence embeddings form the extractive summary.

If this is right

  • Training requires only a CPU because the learned component is a single dense layer.
  • Inference time grows linearly with the number of sentences in the input document.
  • The method can reuse any pre-trained sentence embedding model without further modification.
  • The CASS dataset supplies a new French-language benchmark of court judgments paired with summaries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same lightweight transformation might be applied to other languages or domains once suitable embeddings exist.
  • If the underlying embedding space already aligns well with summary quality, the learned layer could be replaced by a fixed rule.
  • The approach could be combined with more expressive selection mechanisms to test whether the dense layer remains the main performance driver.

Load-bearing premise

A single dense-layer transformation of the document embedding is enough to make the closest sentences match human reference summaries inside an existing embedding space.

What would settle it

On a new test collection the ROUGE scores of STRASS summaries fall clearly below those of established extractive baselines while the embedding space and selection rule stay unchanged.

read the original abstract

This paper introduces STRASS: Summarization by TRAnsformation Selection and Scoring. It is an extractive text summarization method which leverages the semantic information in existing sentence embedding spaces. Our method creates an extractive summary by selecting the sentences with the closest embeddings to the document embedding. The model learns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. As the transformation is only composed of a dense layer, the training can be done on CPU, therefore, inexpensive. Moreover, inference time is short and linear according to the number of sentences. As a second contribution, we introduce the French CASS dataset, composed of judgments from the French Court of cassation and their corresponding summaries. On this dataset, our results show that our method performs similarly to the state of the art extractive methods with effective training and inferring time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces STRASS, an extractive summarization method that selects sentences whose embeddings are closest to a learned dense-layer transformation of the document embedding. The transformation is trained to minimize similarity between the resulting extractive summary and ground-truth summaries. The approach is claimed to achieve performance parity with state-of-the-art extractive methods on a new French CASS dataset of court judgments while requiring only CPU training and linear-time inference.

Significance. If the optimization objective is correctly specified to produce alignment with references rather than anti-alignment, and if the reported performance holds under standard evaluation, the method offers a lightweight, parameter-efficient alternative that reuses existing sentence embeddings without heavy architectures. The CASS dataset is a useful addition for non-English summarization evaluation.

major comments (2)
  1. Abstract: the training objective is stated as learning the transformation 'to minimize the similarity between the extractive summary and the ground truth summary.' Taken literally, this inverts the intended goal; gradient descent on such a loss would drive selected sentences away from the references. No equation or section in the provided text resolves the discrepancy, and the central claim that the method aligns with human summaries rests on the optimization succeeding at alignment rather than anti-alignment.
  2. Abstract: the claim that results 'perform similarly to the state of the art' is unsupported by any quantitative metrics, baselines, ROUGE scores, statistical tests, or error bars in the supplied text, rendering the central performance assertion unverifiable from the given information.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for highlighting these issues in the abstract. We address each major comment below with specific responses and indicate where revisions will be made.

read point-by-point responses
  1. Referee: Abstract: the training objective is stated as learning the transformation 'to minimize the similarity between the extractive summary and the ground truth summary.' Taken literally, this inverts the intended goal; gradient descent on such a loss would drive selected sentences away from the references. No equation or section in the provided text resolves the discrepancy, and the central claim that the method aligns with human summaries rests on the optimization succeeding at alignment rather than anti-alignment.

    Authors: We appreciate the referee identifying this error. The abstract contains a wording mistake: the objective is to maximize (not minimize) the similarity between the extractive summary embedding and the ground-truth summary. The body of the manuscript correctly formulates the loss to encourage alignment via the dense-layer transformation. We will revise the abstract to state 'maximize the similarity' and ensure consistency with the method description and equations. revision: yes

  2. Referee: Abstract: the claim that results 'perform similarly to the state of the art' is unsupported by any quantitative metrics, baselines, ROUGE scores, statistical tests, or error bars in the supplied text, rendering the central performance assertion unverifiable from the given information.

    Authors: The full manuscript includes Section 4 (Experiments) with quantitative results on the CASS dataset. This section reports ROUGE-1/2/L scores for STRASS against extractive baselines (including state-of-the-art methods), along with comparisons of training and inference efficiency. The abstract summarizes these findings at a high level, as is conventional. The supporting metrics, baselines, and scores are present in the manuscript body and tables. revision: no

Circularity Check

0 steps flagged

No circularity; derivation relies on standard supervised learning against external references

full rationale

The paper describes a method that learns a dense-layer transformation of document embeddings via optimization against ground-truth summaries in a pre-existing sentence embedding space, then evaluates extractive performance on the external CASS dataset against SOTA baselines. No equation, claim, or step reduces by construction to its own inputs; the learned map is fitted to held-out reference data rather than being tautological, and no self-citation chain or uniqueness theorem is invoked as load-bearing. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the semantic richness of pre-existing sentence embeddings and the adequacy of a single learned linear transformation; no new entities are postulated.

free parameters (1)
  • dense-layer weights
    Parameters of the single transformation layer are fitted during training to minimize embedding-space distance to reference summaries.
axioms (1)
  • domain assumption Sentence embeddings from existing models capture sufficient semantic information to support summary selection via nearest-neighbor after a learned transformation.
    Invoked in the description of how the extractive summary is formed (abstract, paragraph 2).

pith-pipeline@v0.9.0 · 5699 in / 1257 out tokens · 28705 ms · 2026-05-24T21:01:10.151245+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.