archive
Every paper Pith has read. Search by title, abstract, or pith.
446 papers in cs.DB · page 9
-
eST-Miner adaptations guarantee minimum fitness in Petri nets
Discovering Process Models With Long-Term Dependencies While Providing Guarantees and Filtering Infrequent Behavior Patterns
-
Live temporal queries run on any RDF triplestore
Time travel for knowledge graphs: live queries over RDF change histories
-
SAT encodings compute answers under optimal repairs for any priority relation
Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments
-
Polynomial bicomodules model database aggregation alongside querying
Functorial aggregation
-
Lookup algorithm validates UTF-8 in under one instruction per byte
Validating UTF-8 In Less Than One Instruction Per Byte
-
OLAP engines waste 25-82% of CPU cycles on stalls
Micro-architectural Analysis of OLAP: Limitations and Opportunities
-
qwLSH speeds up LSH query workloads with cache cost models
qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
-
Visual queries retrieve scatterplots by region or similarity
SCATTERSEARCH: Visual Querying of Scatterplot Visualizations
-
Range mode enumeration runs in output-linear time
Enumerating Range Modes
-
SameAs links often break identity in Web of Data
The sameAs Problem: A Survey on Identity Management in the Web of Data
-
Algorithm plus editor builds ShEx and SHACL schemas from RDF samples
Semi Automatic Construction of ShEx and SHACL Schemas
-
Hybrid SSP model evaluates recursive Datalog queries correctly
A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation
-
Query structure decides if minimizing deletions to drop k outputs is easy or hard
Generalized Deletion Propagation on Counting Conjunctive Query Answers
-
Cluster sampling slashes KG accuracy evaluation cost by 60-80%
Efficient Knowledge Graph Accuracy Evaluation
-
Apriori prunes rules via minimum support thresholds
Association rule mining and itemset-correlation based variants
-
Aggregators from social choice preserve database constraints
Social Choice Methods for Database Aggregation
-
Crunchbase data released as 347 million RDF triples
Linked Crunchbase: A Linked Data API and RDF Data Set About Innovative Companies
-
Survey: data quality tools rarely implement general metrics
A Survey of Data Quality Measurement and Monitoring Tools
-
Adaptive noise cuts error for private trajectory range queries
A Differentially Private Algorithm for Range Queries on Trajectories
-
TigerGraph outperforms Neo4j by 100x on social network queries
In-Depth Benchmarking of Graph Database Systems with the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB)
-
LID selects query sets across difficulty levels in NN search
The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search
-
Templates cover 90% of ontology competency questions
CLaRO: a Data-driven CNL for Specifying Competency Questions
-
Blockchain log speeds genomic audit queries 10x with 25% less storage
Effcient logging and querying for Blockchain-based cross-site genomic dataset access audit
-
Random walk infers beliefs to score BI query interestingness
A Subjective Interestingness measure for Business Intelligence explorations
-
Query sequence models catch database ransomware
Hands Off my Database: Ransomware Detection in Databases through Dynamic Analysis of Query Sequences
-
DOD-ETL runs ETL workloads up to 10 times faster
DOD-ETL: Distributed On-Demand ETL for Near Real-Time Business Intelligence
-
Chow-Liu trees cut selectivity error by 10x on TPC-DS
An Approach Based on Bayesian Networks for Query Selectivity Estimation
-
Lightweight determinism enables exactly-once streaming with low overhead
Delivery, consistency, and determinism: rethinking guarantees in distributed stream processing
-
SQL features separate ad-hoc queries into coherent explorations
Detecting coherent explorations in SQL workloads
-
Panel fusion scales via partitioned min-cost flow
Scalable Panel Fusion Using Distributed Min Cost Flow
-
One model generates every key-value store design
Learning Key-Value Store Design
-
LDP mechanisms yield conditional estimates on key-value data
Conditional Analysis for Key-Value Data with Local Differential Privacy
-
Trajectory data trains model to rank navigation paths
PathRank: A Multi-Task Learning Framework to Rank Paths in Spatial Networks
-
Redefined model unifies property graph exchange across databases
Property Graph Exchange Format
-
Software automates traceable metabolomics data mining process
Computer-Aided Data Mining: Automating a Novel Knowledge Discovery and Data Mining Process Model for Metabolomics
-
Spark spatial queries run 10x faster with new scheduler
LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
-
Graphyti hits 80% in-memory speed in SEM
Graphyti: A Semi-External Memory Graph Library for FlashGraph
-
Event patterns embed type-safely for arbitrary numbers of variables
Type-safe, Polyvariadic Event Correlation
-
Sparse graph turns hash collisions into constant-time probes
HashGraph -- Scalable Hash Tables Using A Sparse Graph Data Structure
-
RDF conversion links PV and weather records across sources
Interlinking Heterogeneous Data for Smart Energy Systems
-
280 RDF datasets yield measures to characterize Semantic Web graphs
A Software Framework and Datasets for the Analysis of Graph Measures on RDF Graphs
-
Query rewriting beats critical instances for RDF rule checks
Rule Applicability on RDF Triplestore Schemas
-
Framework scores dataset snippets on query match and coverage
A Framework for Evaluating Snippet Generation for Dataset Search
-
Quantile sketches halve error bound at fixed size
Streaming Quantiles Algorithms with Small Space and Update Time
-
Alexa trivia game fills knowledge base gaps via competition
DataPop: Knowledge Base Population using Distributed Voice Enabled Devices
-
Graphical model surfaces more novel facts from tables
Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version)
-
Declarative queries over event history resolve visualization concurrency
Programming with Timespans in Interactive Visualizations
-
LDP mechanisms cut worst-case noise for mixed data types
Collecting and Analyzing Multidimensional Data with Local Differential Privacy
-
Low-complexity algorithms mine patterns from multi-source student data
Multi-source Relations for Contextual Data Mining in Learning Analytics
-
DBpedia gains 7.7M facts from 703k Wikipedia category axioms
Uncovering the Semantics of Wikipedia Categories