archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 3

cs.AI 2026-05-04 reviewed

Event languages mapped to one Temporal Datalog engine for streams
Efficient Temporal Datalog Materialisation for Composite Event Recognition

Periklis Mantenoglou
cs.DB 2026-05-04 reviewed

eBPF scheduler doubles throughput for time-sensitive DB tasks
Unfair by design: eBPF-based scheduling of mixed database workloads

Carl-Elliott Bilodeau-Savaria +3
cs.DB 2026-05-04 reviewed

2-bit vectors build ANN graphs for 16x faster search
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

Wenxuan Xiao +2
cs.DB 2026-05-04 reviewed

2-bit quantization builds ANN graphs without training
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

Wenxuan Xiao +2
cs.DB 2026-05-04 reviewed

Binary quantization builds ANN graphs for 88% recall
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

Wenxuan Xiao +2
cs.DB 2026-05-03 reviewed

Dual HNSW graphs enable fast search for any Lp metric
U-HNSW: An Efficient Graph-based Solution to ANNS Under Universal Lp Metrics

Huayi Wang +2
cs.CR 2026-05-03 reviewed

Predictions let private query streams reach near-offline utility
LAPRAS : Learning-Augmented PRivate Answering for linear query Streams

Pranay Mundra +3
cs.DC 2026-05-03 reviewed

Decentralized geohash sampling cuts geospatial stream latency
Decentralized Stratified Sampling for Low-Latency Approximate Geospatial Data Stream Processing in Edge-Cloud Architectures

Isam Mashhour Al Jawarneh +3
cs.CR 2026-05-03 reviewed

Prompt-conditioned masking traces RAG poison to exact characters
Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Huining Cui +1
cs.DB 2026-05-02 reviewed

This paper proposes Action Units as structured extensions to knowledge representations…
Actionable Understanding: Action Units for Bridging the Knowledge-Action Gap in Post-FAIR Knowledge Infrastructures

Lars Vogt
cs.DB 2026-05-02 reviewed

Lattice merges co-accessed vectors to cut authorized search cost
Don't Stir the Pot! Authorized Vector Data Retrieval via Access-Aware Indexing

Shanshan Han +2
cs.DB 2026-05-02 reviewed

Lattice method balances duplication and search for authorized vectors
Don't Stir the Pot! Authorized Vector Data Retrieval via Access-Aware Indexing

Shanshan Han +2
cs.DB 2026-05-02 reviewed

Five patterns decouple writes from reads in search engines
Write-Read Decoupling in Modern Large-Scale Search Engines: Architectures, Techniques, and Emerging Approaches

Xin Liang +6
cs.DB 2026-05-01 reviewed

Team projects built into database courses raise grades and teamwork scores
Complete Integration of Team Project-based Learning into a Database Syllabus

S. Iserte +5
cs.DB 2026-05-01 reviewed

One abstraction unifies database evolution
Living Databases: A Unified Model for Continuous Schema Evolution, Versioning, and Transformations

Amol Deshpande
cs.DB 2026-05-01 reviewed

Execution-verified renamings recover Text-to-SQL accuracy on noisy schemas
EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

Jiaqian Wang +4
cs.CR 2026-05-01 reviewed

Framework makes shuffle-DP protocols resist poisoning
Defense against Poisoning Attacks under Shuffle-DP

Siyi Wang +8
cs.DB 2026-05-01 reviewed

SPARQL multiset patterns match Datalog and relational algebra
Multiset semantics in SPARQL, Relational Algebra and Datalog

Renzo Angles +2
cs.DB 2026-04-30 reviewed

Two-phase sampling cuts online aggregation cost up to 3x
Index-Assisted Stratified Sampling for Online Aggregation

Yunnan Yu +1
cs.DB 2026-04-30 reviewed

Tailwind speeds TPC-H queries 1.38x on average
Tailwind: A Practical Framework for Query Accelerators

Geoffrey X. Yu +2
cs.CL 2026-04-30 reviewed

Templates from past queries boost Text-to-SQL accuracy 36%
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

Smit Jivani +2
cs.CV 2026-04-30 reviewed

GUI agents hit exact states only 23 percent of the time
FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

Fengxian Ji +7
cs.AI 2026-04-30 reviewed

ObjectGraph cuts agent document tokens by 95% without accuracy loss
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

Mohit Dubey +1
cs.DB 2026-04-29 reviewed

Synthetic databases reveal 3-14 percent drops in text-to-SQL accuracy
SynSQL: Synthesizing Relational Databases for Robust Evaluation of Text-to-SQL Systems

Mohammadamin Habibollah +1
cs.DB 2026-04-29 reviewed

One model unifies table discovery from text and table queries
Unified Data Discovery across Query Modalities and User Intents

Tingting Wang +6
cs.DB 2026-04-29 reviewed

Graphify turns GraphQL into single optimized Gremlin queries in linear time
Graphify: Automated Synthesis of Type-Safe Graph Backends via $O(S)$ GraphQL-to-Gremlin Transpilation

Johannes Graf
cs.SD 2026-04-29 reviewed

Non-speech audio reveals spurious correlations in speech data
A Toolkit for Detecting Spurious Correlations in Speech Datasets

Lara Gauder +5
cs.DB 2026-04-29 reviewed

LLM search aligns pivot table schemas at 88% accuracy
PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search

Yunjun Gao +3
cs.DB 2026-04-29 reviewed

LLM assistant cuts big data support tickets by 20.8%
SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms

Yu Shen +16
cs.DB 2026-04-28 reviewed

Evergreen converts verification of claims in LLM-generated semantic aggregates into…
Evergreen: Efficient Claim Verification for Semantic Aggregates

Alexander W. Lee +5
cs.DB 2026-04-28 reviewed

CacheRAG turns stateless KGQA planning into cached learning
CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Yushi Sun +1
cs.IR 2026-04-28 reviewed

Semantic search runs on 166 million clinical notes at $4k per month
Health System Scale Semantic Search Across Unstructured Clinical Notes

Faith Wavinya Mutinda +16
cs.DS 2026-04-28 reviewed

Streaming sampler approximates graphlets in constant passes
An Efficient Streaming Algorithm for Approximating Graphlet Distributions

Marco Bressan +3
cs.DB 2026-04-28 reviewed

Negative patterns raise viral classification accuracy
Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification

Wenxi Zhu +2
cs.DB 2026-04-28 reviewed

VisualNeo connects visual queries to Neo4j for graph searches
VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines

Kai Huang +7
cs.DB 2026-04-28 reviewed

Algorithms hide all sensitive cross-level utility patterns without fakes
Cross-level Privacy Preserving Utility Mining

Jiahong Cai +2
cs.LG 2026-04-28 reviewed

RL learns to clean tabular data for foundation model priors
Prior-Aligned Data Cleaning for Tabular Foundation Models

Laure Berti-Equille
cs.DC 2026-04-27 reviewed

Fixed-input lock keeps Spark policy outputs identical under repartitioning
Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

Zeyu Bai
cs.CR 2026-04-27 reviewed

Dynamic attacks slow ALEX lookups up to 2.8x
Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

Allen Jue
cs.DB 2026-04-27 reviewed

Autoencoder rewrites speed hybrid vector queries 2x on average
BoomHQ: Learning to Boost Multiple Hybrid Queries on Vector DBMSs

Ermu Qiu +6
cs.DB 2026-04-27 reviewed

Sliding window finds dense patterns exactly without gap parameters
Exact Mining of Dense Patterns via Direct Evaluation of Local Interval Frequency Using a Sliding Window

Taihei Takahashi +3
cs.IR 2026-04-27 reviewed

Late materialization slashes storage for long user sequences in DLRMs
Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

Liang Guo +10
cs.DB 2026-04-27 reviewed

IM chat turns natural language into complete data reports
DataClaw: An Autonomous Data Agent with Instant Messaging Integration

Huahang Li +5
cs.CL 2026-04-27 reviewed

RL distills agentic reasoning into private product mapping models
EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

Minhyeong Yu +1
cs.DB 2026-04-26 reviewed

SEMA-SQL mixes SQL with LLM semantics for natural language database questions
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

Yin Lin +6
cs.DB 2026-04-26 reviewed

SEMA-SQL mixes SQL with LLM reasoning for semantic database queries
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models

Yin Lin +6
cs.DS 2026-04-24 reviewed

Branchwidth approximates submodular width to within 3/2
Cuts and Gauges for Submodular Width

Matthias Lanzinger
cs.DB 2026-04-24 reviewed

Dataset released for 10,000 early AI agents on Ethereum
A dataset of early blockchain-registered AI agents on Ethereum

Yulin Liu
cs.DB 2026-04-24 reviewed

Atomic RDF Datasets can serve as standardized messages for streaming
It's Time to Standardize RDF Messages

Pieter Colpaert +1
cs.LO 2026-04-24 reviewed

Formal library verifies chase as universal model
The Chase in Lean -- Crafting a Formal Library for Existential Rule Research

Lukas Gerlach