pith. sign in

arxiv: 1409.2590 · v1 · pith:5ZML44CAnew · submitted 2014-09-09 · 💻 cs.IR

Automatic Detection of Webpages that Share the Same Web Template

classification 💻 cs.IR
keywords templatewebpagesextractiondetectionidentifyingsameanalysisanalyze
0
0 comments X
read the original abstract

Template extraction is the process of isolating the template of a given webpage. It is widely used in several disciplines, including webpages development, content extraction, block detection, and webpages indexing. One of the main goals of template extraction is identifying a set of webpages with the same template without having to load and analyze too many webpages prior to identifying the template. This work introduces a new technique to automatically discover a reduced set of webpages in a website that implement the template. This set is computed with an hyperlink analysis that computes a very small set with a high level of confidence.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.