STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings
Pith reviewed 2026-05-24 21:01 UTC · model grok-4.3
The pith
STRASS learns one dense layer on document embeddings to select extractive summary sentences that match human references.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
STRASS creates an extractive summary by selecting the sentences with the closest embeddings to the document embedding. The model learns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. As the transformation is only composed of a dense layer, the training can be done on CPU, therefore, inexpensive. Moreover, inference time is short and linear according to the number of sentences. On the CASS dataset our results show that our method performs similarly to the state of the art extractive methods with effective training and inferring time.
What carries the argument
A single learned dense layer that transforms the document embedding so the nearest sentence embeddings form the extractive summary.
If this is right
- Training requires only a CPU because the learned component is a single dense layer.
- Inference time grows linearly with the number of sentences in the input document.
- The method can reuse any pre-trained sentence embedding model without further modification.
- The CASS dataset supplies a new French-language benchmark of court judgments paired with summaries.
Where Pith is reading between the lines
- The same lightweight transformation might be applied to other languages or domains once suitable embeddings exist.
- If the underlying embedding space already aligns well with summary quality, the learned layer could be replaced by a fixed rule.
- The approach could be combined with more expressive selection mechanisms to test whether the dense layer remains the main performance driver.
Load-bearing premise
A single dense-layer transformation of the document embedding is enough to make the closest sentences match human reference summaries inside an existing embedding space.
What would settle it
On a new test collection the ROUGE scores of STRASS summaries fall clearly below those of established extractive baselines while the embedding space and selection rule stay unchanged.
read the original abstract
This paper introduces STRASS: Summarization by TRAnsformation Selection and Scoring. It is an extractive text summarization method which leverages the semantic information in existing sentence embedding spaces. Our method creates an extractive summary by selecting the sentences with the closest embeddings to the document embedding. The model learns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. As the transformation is only composed of a dense layer, the training can be done on CPU, therefore, inexpensive. Moreover, inference time is short and linear according to the number of sentences. As a second contribution, we introduce the French CASS dataset, composed of judgments from the French Court of cassation and their corresponding summaries. On this dataset, our results show that our method performs similarly to the state of the art extractive methods with effective training and inferring time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces STRASS, an extractive summarization method that selects sentences whose embeddings are closest to a learned dense-layer transformation of the document embedding. The transformation is trained to minimize similarity between the resulting extractive summary and ground-truth summaries. The approach is claimed to achieve performance parity with state-of-the-art extractive methods on a new French CASS dataset of court judgments while requiring only CPU training and linear-time inference.
Significance. If the optimization objective is correctly specified to produce alignment with references rather than anti-alignment, and if the reported performance holds under standard evaluation, the method offers a lightweight, parameter-efficient alternative that reuses existing sentence embeddings without heavy architectures. The CASS dataset is a useful addition for non-English summarization evaluation.
major comments (2)
- Abstract: the training objective is stated as learning the transformation 'to minimize the similarity between the extractive summary and the ground truth summary.' Taken literally, this inverts the intended goal; gradient descent on such a loss would drive selected sentences away from the references. No equation or section in the provided text resolves the discrepancy, and the central claim that the method aligns with human summaries rests on the optimization succeeding at alignment rather than anti-alignment.
- Abstract: the claim that results 'perform similarly to the state of the art' is unsupported by any quantitative metrics, baselines, ROUGE scores, statistical tests, or error bars in the supplied text, rendering the central performance assertion unverifiable from the given information.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for highlighting these issues in the abstract. We address each major comment below with specific responses and indicate where revisions will be made.
read point-by-point responses
-
Referee: Abstract: the training objective is stated as learning the transformation 'to minimize the similarity between the extractive summary and the ground truth summary.' Taken literally, this inverts the intended goal; gradient descent on such a loss would drive selected sentences away from the references. No equation or section in the provided text resolves the discrepancy, and the central claim that the method aligns with human summaries rests on the optimization succeeding at alignment rather than anti-alignment.
Authors: We appreciate the referee identifying this error. The abstract contains a wording mistake: the objective is to maximize (not minimize) the similarity between the extractive summary embedding and the ground-truth summary. The body of the manuscript correctly formulates the loss to encourage alignment via the dense-layer transformation. We will revise the abstract to state 'maximize the similarity' and ensure consistency with the method description and equations. revision: yes
-
Referee: Abstract: the claim that results 'perform similarly to the state of the art' is unsupported by any quantitative metrics, baselines, ROUGE scores, statistical tests, or error bars in the supplied text, rendering the central performance assertion unverifiable from the given information.
Authors: The full manuscript includes Section 4 (Experiments) with quantitative results on the CASS dataset. This section reports ROUGE-1/2/L scores for STRASS against extractive baselines (including state-of-the-art methods), along with comparisons of training and inference efficiency. The abstract summarizes these findings at a high level, as is conventional. The supporting metrics, baselines, and scores are present in the manuscript body and tables. revision: no
Circularity Check
No circularity; derivation relies on standard supervised learning against external references
full rationale
The paper describes a method that learns a dense-layer transformation of document embeddings via optimization against ground-truth summaries in a pre-existing sentence embedding space, then evaluates extractive performance on the external CASS dataset against SOTA baselines. No equation, claim, or step reduces by construction to its own inputs; the learned map is fitted to held-out reference data rather than being tautological, and no self-citation chain or uniqueness theorem is invoked as load-bearing. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- dense-layer weights
axioms (1)
- domain assumption Sentence embeddings from existing models capture sufficient semantic information to support summary selection via nearest-neighbor after a learned transformation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.