pith. the verified trust layer for science. sign in

arxiv: 1901.06376 · v1 · pith:2J475E6Snew · submitted 2019-01-18 · 💻 cs.IT · math.IT

An information theoretic model for summarization, and some basic results

classification 💻 cs.IT math.IT
keywords lossdistributioninformationmodelresultssemanticsummarizationbasic
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{2J475E6S}

Prints a linked pith:2J475E6S badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

A basic information theoretic model for summarization is formulated. Here summarization is considered as the process of taking a report of $v$ binary objects, and producing from it a $j$ element subset that captures most of the important features of the original report, with importance being defined via an arbitrary set function endemic to the model. The loss of information is then measured by a weight average of variational distances, which we term the semantic loss. Our results include both cases where the probability distribution generating the $v$-length reports are known and unknown. In the case where it is known, our results demonstrate how to construct summarizers which minimize the semantic loss. For the case where the probability distribution is unknown, we show how to construct summarizers whose semantic loss when averaged uniformly over all possible distribution converges to the minimum.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.