pith. sign in

arxiv: 1906.08574 · v1 · pith:AUFXFXCNnew · submitted 2019-06-20 · 💻 cs.DB

Extracting Basic Graph Patterns from Triple Pattern Fragment Logs

Pith reviewed 2026-05-25 18:54 UTC · model grok-4.3

classification 💻 cs.DB
keywords Triple Pattern FragmentsSPARQL query logsBasic Graph PatternsLinked Dataquery reconstructionserver logs
0
0 comments X

The pith

LIFT reconstructs Basic Graph Patterns from logs of single-triple requests to TPF servers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an algorithm called LIFT that groups individual triple pattern evaluations recorded in TPF server logs back into the original multi-triple Basic Graph Patterns. TPF servers only ever see and answer one triple at a time, so providers currently have no direct view of the structure of the queries users submit. LIFT relies on the sequence and timing of those single-triple requests to perform the grouping. Experiments reported in the paper indicate that the extracted patterns match the originals with good precision and recall while producing limited noise. If the method holds, TPF hosts would gain the ability to inspect query shapes without altering the TPF protocol itself.

Core claim

LIFT extracts BGPs of executed queries from TPF server logs with good precision and good recall while generating limited noise.

What carries the argument

The LIFT algorithm, which processes the ordered sequence and timing of single-triple log entries to group them into the original multi-triple Basic Graph Patterns.

If this is right

  • TPF data providers obtain visibility into the structure of the queries their servers actually execute.
  • Query-log analysis becomes feasible for TPF without requiring clients to send full SPARQL queries.
  • Noise introduced by the reconstruction process remains low enough that downstream analyses of query shapes remain reliable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same log-based grouping technique could be tested on other fragment interfaces that break queries into smaller requests.
  • If reconstruction accuracy varies with query shape, future work could classify which BGP structures are easiest or hardest to recover.
  • Reconstructed BGPs could feed into workload-aware caching or index decisions at the TPF server without exposing raw client queries.

Load-bearing premise

The ordering and timing information present in TPF server logs is sufficient for an algorithm to correctly group and reconstruct the original multi-triple Basic Graph Patterns.

What would settle it

Run a controlled test in which known multi-triple BGPs are submitted to a TPF server, collect the resulting log, apply LIFT, and measure how often the output BGPs exactly match the submitted ones.

Figures

Figures reproduced from arXiv: 1906.08574 by Desmontils Emmanuel, Molli Pascal, Nassopoulos Georges, Serrano-Alvarado Patricia.

Figure 1
Figure 1. Figure 1: Concurrent execution of queries Q1 and Q2. TPF clients decompose SPARQL queries into a sequence of triple pattern queries [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Examples of simplified TPF logs [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: TPF log and CTP List produced by Algorithm 2 with [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CTP List and DTP Graph produced by Algorithm 3 with [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Connected components of the DTP Graph produced by the execution of [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Precision and recall by query of the TPF web site. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: LIFT deductions for Q7 and Q8. Prefix dbpo corresponds to dbpedia-owl. 4.2 Does LIFT results resist to concurrency? We implemented a tool to shuffle several TPF logs according to different param￾eters. Thus, given E(Q1), ..., E(Qn), we are able to produce different significant representations of E(Q1 k ... k Qn). We grouped queries targeting the same dataset into a set of randomly chosen queries as shown i… view at source ↗
Figure 8
Figure 8. Figure 8: Precision and recall. 4.3 Evaluation of LIFT with real SPARQL queries The goal of this experiment is to evaluate to which extent LIFT is able to deduce BGPs of an important number of real user queries. We analyzed 10 hours of one day (2015-10-30) of the log of the DBpedia SPARQL endpoint of USEWOD 2016 dataset [5]. From 380,834 http requests containing SPARQL queries, we analyzed 14,259 queries that repres… view at source ↗
Figure 9
Figure 9. Figure 9: Recurrent BGPs extracted from the TPF log of USEWOD 2016. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
read the original abstract

The Triple Pattern Fragment (TPF) approach is de-facto a new way to publish Linked Data at low cost and with high server availability. However, data providers hosting TPF servers are not able to analyze the SPARQL queries they execute because they only receive and evaluate queries with one triple pattern. In this paper, we propose LIFT: an algorithm to extract Basic Graph Patterns (BGPs) of executed queries from TPF server logs. Experiments show that LIFT extracts BGPs with good precision and good recall generating limited noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes LIFT, an algorithm to extract Basic Graph Patterns (BGPs) from Triple Pattern Fragment (TPF) server logs. TPF servers receive and evaluate only single-triple-pattern queries, so full SPARQL queries cannot be analyzed directly. LIFT reconstructs BGPs by grouping single-triple requests using ordering and timing information present in the logs. The abstract states that experiments demonstrate good precision and recall while generating limited noise.

Significance. If the reconstruction claims hold, the work would allow TPF data providers to recover query-structure information from existing logs without protocol changes. This addresses a practical gap in Linked Data publishing by enabling workload analysis, caching improvements, and server optimization that are currently unavailable.

major comments (1)
  1. [Abstract] Abstract: the central claim that LIFT extracts BGPs 'with good precision and good recall generating limited noise' is unsupported because the abstract supplies no quantitative metrics, no dataset descriptions, no baselines, and no experimental protocol. This absence makes the soundness of the reconstruction approach impossible to assess from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying this issue with the abstract. We agree that the current wording leaves the central experimental claims without sufficient supporting detail for readers to evaluate them directly from the abstract alone.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that LIFT extracts BGPs 'with good precision and good recall generating limited noise' is unsupported because the abstract supplies no quantitative metrics, no dataset descriptions, no baselines, and no experimental protocol. This absence makes the soundness of the reconstruction approach impossible to assess from the provided text.

    Authors: We accept the referee's point. While the full paper contains the requested experimental details (datasets, metrics, baselines, and protocol), the abstract does not. In the revised manuscript we will expand the abstract to include concrete precision and recall figures, the number and nature of the evaluation datasets, and a brief statement of the experimental protocol so that the central claim is directly supported by numbers rather than qualitative phrasing. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents LIFT as an algorithm that groups single-triple requests from TPF logs into BGPs using ordering and timing data. No equations, fitted parameters, predictions, or self-citations appear in the abstract or description that would reduce any claim to its own inputs by construction. The central claim rests on experimental validation of precision/recall rather than any self-referential derivation or uniqueness theorem. This is a standard algorithmic contribution without detectable circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no equations, parameters, or background assumptions that can be audited.

pith-pipeline@v0.9.0 · 5615 in / 840 out tokens · 24003 ms · 2026-05-25T18:54:14.912814+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    W. Beek, L. Rietveld, H. R. Bazoobandi, J. Wielemaker, and S. Schlobach. LOD Laundromat: A Uniform Way of Publishing Other People’s Dirty Data. InISWC Conference, 2014

  2. [2]

    M. A. Gallego, J. D. Fernández, M. A. Martínez-Prieto, and P. de la Fuente. An Empirical Study of Real-World SPARQL Queries. InUSEWOD workshop, 2011

  3. [3]

    J. Han, M. Kamber, and J. Pei.Data Mining: Concepts and Techniques . Elsevier, 2011

  4. [4]

    DetectingSPARQLQueryTemplatesforDataPrefetch- ing

    J.LoreyandF.Naumann. DetectingSPARQLQueryTemplatesforDataPrefetch- ing. In ESWC Conference, 2013

  5. [5]

    Markus, A

    L.-R. Markus, A. Saud, B. Bettina, and H. Laura. USEWOD Research Dataset.,

  6. [6]

    http://dx.doi.org/10.5258/SOTON/385344

  7. [7]

    Möller, M

    K. Möller, M. Hausenblas, R. Cyganiak, G. Grimnes, and S. Handschuh. Learning from Linked Open Data Usage: Patterns & Metrics. InWebSci10:Extending the Frontiers of Society On-Line , 2010

  8. [8]

    C. H. Mooney and J. F. Roddick. Sequential Pattern Mining–Approaches and Algorithms. ACM Computing Surveys (CSUR) , 45(2):19, 2013

  9. [9]

    Morsey, J

    M. Morsey, J. Lehmann, S. Auer, and A.-C. N. Ngomo. DBpedia SPARQL Benchmark–Performance Assessment with Real Queries on Real Data. InISWC Conference, 2011

  10. [10]

    Nassopoulos, P

    G. Nassopoulos, P. Serrano-Alvarado, P. Molli, and E. Desmontils. FETA: Feder- ated QuEry TrAcking for Linked Data. InDEXA Conference, 2016

  11. [11]

    Picalausa and S

    F. Picalausa and S. Vansummeren. What are Real SPARQL Queries Like? In SWIM Workshop, 2011

  12. [12]

    Raghuveer

    A. Raghuveer. Characterizing Machine Agent Behavior through SPARQL Query Mining. In USEWOD Workshop, 2012

  13. [13]

    Rietveld, R

    L. Rietveld, R. Hoekstra, et al. Man vs. Machine: Differences in SPARQL queries. In USEWOD Workshop, 2014

  14. [14]

    M. V. Sande, R. Verborgh, J. V. Herwegen, E. Mannens, and R. V. de Walle. Op- portunistic Linked Data Querying Through Approximate Membership Metadata. In ISWC Conference, 2015

  15. [15]

    InitialUsageAnalysisofDBpedia’s Triple Pattern Fragments

    R.Verborgh,E.Mannens,andR.VandeWalle. InitialUsageAnalysisofDBpedia’s Triple Pattern Fragments. InUSEWOD Workshop, 2015

  16. [16]

    Verborgh, M

    R. Verborgh, M. Vander Sande, O. Hartig, J. Van Herwegen, L. De Vocht, B. De Meester, G. Haesendonck, and P. Colpaert. Triple Pattern Fragments: a Low-cost Knowledge Graph Interface for the Web. Journal of Web Semantics , 37–38, Mar. 2016