archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 4

cs.DB 2026-04-24 reviewed

Bounded self-joins make fact relevance as easy as query evaluation
How Hard is it to Decide if a Fact is Relevant to a Query?

Meghyn Bienvenu +2
cs.DB 2026-04-24 reviewed

Unified model pivots database migration across heterogeneous systems
A Model-Driven Approach to Database Migration with a Unified Data Model

Mar\'ia J. Ort\'in +2
cs.DB 2026-04-24 reviewed

Maximal-clique index speeds filtered nearest-neighbor search
MCI: A Maximal Clique Index for Efficient Arbitrary-Filtered Approximate Nearest Neighbor Search

Xiaowei Ye +5
cs.DB 2026-04-23 reviewed

ESPRESSO scales keyword search over Solid pods with privacy
Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility Constraints

Mohamed Ragab +7
cs.LG 2026-04-23 reviewed

Best tabular embedding model varies by task and level
Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Liane Vogel +5
cs.LO 2026-04-23 reviewed

ASP(Q) first implements globally-optimal repairs for inconsistent data
Using ASP(Q) to Handle Inconsistent Prioritized Data

Meghyn Bienvenu +3
cs.DC 2026-04-23 reviewed

Delta Lake loads fastest, Iceberg saves most space
Research on the efficiency of data loading and storage in Data Lakehouse architectures for the formation of analytical data systems

Ivan Borodii +1
cs.DB 2026-04-23 reviewed

Query algebra and wrappers replace LLM agents for enterprise data
An Alternate Agentic AI Architecture (It's About the Data)

Fabian Wenz +4
cs.DB 2026-04-23 reviewed

SQLyzr adds diverse metrics and realism to text-to-SQL evaluation
A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis

Sepideh Abedini +1
cs.DL 2026-04-22 reviewed

Only 150k scientific posters shared across 86 platforms
The State of Scientific Poster Sharing and Reuse

Aydan Gasimova +5
cs.AR 2026-04-22 reviewed

FPGA level-wise batch search speeds B+ tree lookups 4.9x
Efficient Batch Search Algorithm for B+ Tree Index Structures with Level-Wise Traversal on FPGAs

Max Tzschoppe +3
cs.LO 2026-04-22 reviewed

ShEx and SHACL match on large recursive fragments via duality
Common Foundations for Recursive Shape Languages

Shqiponja Ahmetaj +11
cs.LO 2026-04-22 reviewed

ShEx and SHACL fragments match via fixpoint duality
Common Foundations for Recursive Shape Languages

Shqiponja Ahmetaj +11
cs.IR 2026-04-22 reviewed

Self-aware embeddings double RAG accuracy on versioned queries
Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge

Naizhong Xu
cs.DB 2026-04-22 reviewed

New framework checks isolation levels without database internals
Making TransactionIsolation Checking Practical

Jian Zhang +2
cs.RO 2026-04-22 reviewed

Vision-based tactile dataset scales bimanual robot data
VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation

Qianxi Hua +6
cs.DB 2026-04-22 reviewed

Low-dim stats cut noise in private power-law exponent estimates
Estimating Power-Law Exponent with Edge Differential Privacy

Adam Tan +2
cs.DB 2026-04-22 reviewed

ML model predicts query slot-time before execution
Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach

Prashant Kumar Pathak
cs.DB 2026-04-22 reviewed

LLM agent finds minimal data sets for analysis at 83% F1
An Agentic Approach to Metadata Reasoning

Jiani Zhang +4
cs.DB 2026-04-22 reviewed

Garfield cuts RFANNS index size 4.4x and raises throughput 120x
A GPU-Accelerated Framework for Multi-Attribute Range Filtered Approximate Nearest Neighbor Search

Zhonggen Li +4
cs.DB 2026-04-22 reviewed

First GPU Datalog engine uses WCOJ to avoid memory blowup
Scaling Worst-Case Optimal Datalog to GPUs

Yihao Sun +4
cs.DB 2026-04-21 reviewed

GPU pipeline speeds 3D polyhedral spatial joins by 9x
3DPipe: A Pipelined GPU Framework for Scalable Generalized Spatial Join over Polyhedral Objects

Lyuheng Yuan +3
cs.LG 2026-04-21 reviewed

RaBitQ outperforms TurboQuant on most quantization tasks
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

Jianyang Gao +7
cs.DB 2026-04-21 reviewed

Online schema alignment recovers full results in decentralized queries
Demonstrating Online Schema Alignment in Decentralized Knowledge Graphs Querying

Bryan-Elliott Tam +2
cs.DB 2026-04-21 reviewed

Monotonic embeddings prune more vertices in subgraph matching
LIVE: Learnable Monotonic Vertex Embedding for Efficient Exact Subgraph Matching (Technical Report)

Yutong Ye +7
cs.DB 2026-04-21 reviewed

Heuristic partitioning cuts multi-tenant query P95 latency from 61s to 2s
Heuristic Search Space Partitioning for Low-Latency Multi-Tenant Cloud Queries

Prashant Kumar Pathak +2
cs.AI 2026-04-21 reviewed

Tool-augmented LLMs beat static ones on warehouse graph reasoning
DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

Ahmed G.A.H Ahmed +1
cs.DB 2026-04-20 reviewed

Open data model v3 fixes wastewater surveillance data sharing
The Public Health and Environmental Surveillance Open Data Model (PHES-ODM) Version 3: An Open, Relational Data Model and Interoperability Framework for Wastewater Surveillance

Mathew Thomson +8
cs.AI 2026-04-20 reviewed

Modular adapters beat fine-tuning on hard SQL queries
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL

Salmane Chafik +2
cs.SI 2026-04-20 reviewed

Topology grouping cuts token use 50-90% in LLM social simulations
Topology-Aware LLM-Driven Social Simulation: A Unified Framework for Efficient and Realistic Agent Dynamics

Yuwei Xu +5
cs.CL 2026-04-20 reviewed

Syntactic tests flag contamination in old NL2SQL benchmarks
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks

Mohammadtaher Safarzadeh +4
cs.AI 2026-04-19 reviewed

Database probing and rule checks raise text-to-SQL accuracy 5%
PV-SQL: Synergizing Database Probing and Rule-based Verification for Text-to-SQL Agents

Yuan Tian +1
cs.DB 2026-04-19 reviewed

Branchable databases slow reads up to 4000x as agent branches deepen
BranchBench: Aligning Database Branching with Agentic Demands

Elaine Ang +5
cs.AI 2026-04-18 reviewed

New benchmark finds AI agents falter on complex personalized home tasks
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

Nikhil Verma +7
cs.AI 2026-04-18 reviewed

Agents falter in smart homes as tasks grow complex
PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

Nikhil Verma +7
cs.DB 2026-04-17 reviewed

Flipped indexing delivers 6.5x lower GPU query latency with dynamic updates
FliX: Flipped-Indexing for Scalable GPU Queries and Updates

Rosina Kharal +3
cs.SE 2026-04-17 reviewed

QMutBench gives 700k quantum mutants to benchmark tests
QMutBench: A Dataset of Quantum Circuit Mutants

E\~naut Mendiluze Usandizaga +3
cs.DB 2026-04-17 reviewed

Policy structure dictates database optimizer plans
Compliance in Databases: A Study of Structural Policies and Query Optimization

Ahana Pradhan +3
cs.DB 2026-04-17 reviewed

Agent autonomy pushes humans to supervisor roles in visual analytics
Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

Tianqi Luo +2
cs.CV 2026-04-17 reviewed

Event cameras enable lip-motion speaker ID across new views and lights
NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition

Junguang Yao +3
cs.DB 2026-04-17 reviewed

Response feedback backpropagates to refine KG-RAG by 7.34%
EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation

Zhenbo Fu +7
cs.DB 2026-04-16 reviewed

Small model attention prunes long docs to 10% for big QA
SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing

Xinzhi Wang +7
cs.AI 2026-04-16 reviewed

Layer treats LLMs and web as databases for natural language data queries
Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications

Moin Aminnaseri +19
cs.DB 2026-04-16 reviewed

SQL and Python agreement on tiny database picks correct queries
DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency

Boyan Li +3
cs.DB 2026-04-16 reviewed

Four-layer architecture unifies reconciliation and anomaly detection
Data Engineering Patterns for Cross-System Reconciliation in Regulated Enterprises: Architecture, Anomaly Detection, and Governance

Zhijun Qiu
cs.DB 2026-04-16 reviewed

PP-FP-tree finds top keyword k-core communities in public-private graphs
Efficient Community Search on Attributed Public-Private Graphs

Yuqi Chen +2
cs.DB 2026-04-16 reviewed

RELOAD is a learned query optimizer that reduces individual query performance regressions…
RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems

Seokwon Lee +5
cs.DB 2026-04-15 reviewed

PIM hardware speeds R-tree queries up to 3.66x with less energy
Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

Tasmia Jannat +2
cs.AI 2026-04-15 reviewed

Beliefs and policies declaratively control LLM pipelines
Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

Duo Lu +2
cs.CL 2026-04-15 reviewed

GLOW hybrid boosts open-world QA on incomplete KGs by 38% on average
Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs

Hussein Abdallah +3