archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 5

cs.DC 2026-04-15 reviewed

OffloadFS moves database compaction to storage nodes for 3.36x speedup
OffloadFS: Leveraging Disaggregated Storage for Computation Offloading

Sungho Moon +6
cs.CL 2026-04-15 reviewed

Benchmark reveals 9% drop for Indic languages in Text-to-SQL
IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages

Aviral Dawar +3
cs.DB 2026-04-15 reviewed

Self-healing LLM loop boosts NL-to-SQL accuracy up to 9 points
SQL Query Engine: A Self-Healing LLM Pipeline for Natural Language to PostgreSQL Translation

Muhammad Adeel Ijaz
cs.DC 2026-04-14 reviewed

DySkew cuts UDF skew delays with runtime data swaps
DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution

Chenwei Xie +10
cs.LG 2026-04-14 reviewed

Log anomalies detected directly on compressed bytes
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

Benzhao Tang +1
cs.DB 2026-04-14 reviewed

ROSE judges NL2SQL by user intent not reference SQL match
ROSE: An Intent-Centered Evaluation Metric for NL2SQL

Wenqi Pei +5
cs.RO 2026-04-14 reviewed

Panoramic 3D datasets reach 96 percent place categorization accuracy
Multi-modal panoramic 3D outdoor datasets for place categorization

Hojung Jung +4
cs.DB 2026-04-14 reviewed

Workflow builds reproducible 582k-paper chemistry corpus
Lit2Vec: A Reproducible Workflow for Building a Legally Screened Chemistry Corpus from S2ORC for Downstream Retrieval and Text Mining

Mahmoud Amiri +3
cs.CR 2026-04-14 reviewed

Three verification layers catch outsourced anonymization errors
VeriX-Anon: A Multi-Layered Framework for Mathematically Verifiable Outsourced Target-Driven Data Anonymization

Miit Daga +1
cs.DB 2026-04-13 reviewed

Benchmark reveals accuracy and efficiency gaps in NL2SQL methods
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

Shizheng Hou +5
cs.DB 2026-04-13 reviewed

GraphAlg Core simulated in matrix language with induction
Foundations of the GraphAlg Language

Daan de Graaf +2
cs.DB 2026-04-13 reviewed

Ozone unifies four traffic datasets to cut experiment setup time 85%
Ozone: A Unified Platform for Transportation Research

Ou Zheng +13
cs.DB 2026-04-13 reviewed

Ozone unifies traffic datasets to cut setup time 85%
Ozone: A Unified Platform for Transportation Research

Ou Zheng +13
cs.DB 2026-04-12 reviewed

NL queries must sometimes invent their own data target
Natural Language to What? A Vision for Intermediate Representations in NL-to-X Querying

Shengqi Li +1
cs.DB 2026-04-12 reviewed

Fine-grained GPU model scales subgraph matching to larger queries
gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Weitian Chen +5
cs.AI 2026-04-11 reviewed

Knowledge graph unifies AI artifact management across platforms
Gypscie: A Cross-Platform AI Artifact Management System

Fabio Porto +6
cs.CL 2026-04-11 reviewed

Benchmark lets dialogue clarify ambiguous table questions
ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification

Zhensheng Wang +5
cs.CV 2026-04-11 reviewed

New benchmark and alignment model raise VCOD performance on motion-heavy video
YUV20K: A Complexity-Driven Benchmark and Trajectory-Aware Alignment Model for Video Camouflaged Object Detection

Yiyu Liu +3
cs.DB 2026-04-10 reviewed

Dynamic programming chooses semantic filter positions to cut hybrid query costs
PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

Qiuyang Mang +7
cs.DB 2026-04-10 reviewed

Catalog defines 35 data error types in three categories
A Catalog of Data Errors

Divya Bhadauria +4
cs.DB 2026-04-10 reviewed

Decoupling vectors from indexes cuts storage by up to 58%
Decoupling Vector Data and Index Storage for Space Efficiency

Yuanming Ren +5
cs.DB 2026-04-10 reviewed

Decoupling vectors from indexes cuts storage by up to 59%
Decoupling Vector Data and Index Storage for Space Efficiency

Yuanming Ren +5
cs.DB 2026-04-10 reviewed

Proprietary tools top data quality metrics and LLM features
Evaluating Data Quality Tools: Measurement Capabilities and LLM Integration

Tobias Rehberger +3
cs.CL 2026-04-10 reviewed

Constraint solver matches patients to 32-72% more trials
SatIR: Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching

Cyrus Zhou +5
cs.DB 2026-04-09 reviewed

Constraint-guided LLMs generate graph queries with 31.6% F1 gains
Graph Query Generation with Constraint-guided Large Language Agents

Mengying Wang +5
cs.MM 2026-04-09 reviewed

Fine-tuned LLMs translate QoS to QoE and back with strong accuracy
QoS-QoE Translation with Large Language Model

Yingjie Yu +5
cs.DS 2026-04-09 reviewed

Color coding counts hypergraphlets faster on (α,β)-nice hypergraphs
Counting HyperGraphlets via Color Coding: a Quadratic Barrier and How to Break It

Marco Bressan +2
cs.DB 2026-04-09 reviewed

Dynamic graph method speeds up large language model training
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

Tianhao Tang +2
cs.DB 2026-04-09 reviewed

FK graph traversal yields diverse SQL workloads for optimizer training
SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

Kahan Mehta +1
cs.CR 2026-04-08 reviewed

PostRI gives DP medians higher utility with post-release error intervals
Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Thomas Humphries +4
cs.DB 2026-04-08 reviewed

Agent views help AI write complex SQL queries
AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

Minh Tam Pham +5
cs.CR 2026-04-08 reviewed

VulGD builds dynamic graph of vulnerabilities with LLM embeddings
VulGD: A LLM-Powered Dynamic Open-Access Vulnerability Graph Database

Luat Do +3
cs.DB 2026-04-08 reviewed

Small models outperform rules and LLMs in SQL query rewriting
LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO

Jiahui Li +5
cs.CL 2026-04-08 reviewed

LLMs vary SQL structure even when results match
SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

Yixi Zhou +6
cs.DB 2026-04-08 reviewed

CubeGraph stitches per-cell vector graphs for fast hybrid spatial search
CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data

Mingyu Yang +2
cs.DB 2026-04-08 reviewed

Non-forking chains cut Verkle Trie storage by 97.8%
SonicDB S6: A Storage-Efficient Verkle Trie for High-Throughput Blockchains

Luigi Crisci +3
cs.DB 2026-04-08 reviewed

Verkle Trie storage slashed 98% for 300ms blocks
SonicDB S6: A Storage-Efficient Verkle Trie for High-Throughput Blockchains

Luigi Crisci +3
cs.DB 2026-04-08 reviewed

Co-evolved evaluators find 6.8x faster database algorithms
AI-Driven Research for Databases

Audrey Cheng +7
cs.DB 2026-04-07 reviewed

Bayesian net models missingness to create probabilistic DB for queries
Database Querying under Missing Values Governed by Missingness Mechanisms

Leopoldo Bertossi +2
cs.AI 2026-04-07 reviewed

Toolkit pairs Python pipelines with AI chat for data harmonization
BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

Roque Lopez +3
cs.SE 2026-04-07 reviewed

ICT models become useful when treated as traceable open graphs
All LCA models are wrong. Are some of them useful? Towards open computational LCA in ICT

Vincent Corlay +8
cs.DB 2026-04-07 reviewed

Equivalence proofs compose correct database stores
CobbleDB: Modelling Levelled Storage by Composition

Emilie Ma (UBC) +3
cs.CR 2026-04-07 reviewed

Few central vectors poison nearly all top-k results
Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

Hanxi Li +7
cs.DB 2026-04-07 reviewed

Spatiotemporal warehouse lifts entity extraction F1 by 4.37%
STIndex: A Context-Aware Multi-Dimensional Spatiotemporal Information Extraction System

Wenxiao Zhang +7
cs.DB 2026-04-06 reviewed

PANDA derives optimal query plans from information bounds
Query Optimization and Evaluation via Information Theory: A Tutorial

Mahmoud Abo Khamis +2
cs.DB 2026-04-06 reviewed

Adaptive probing estimates high-dim similarity cardinalities
Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing

Zhonghan Chen +3
cs.DB 2026-04-04 reviewed

LLM operators bring semantic processing to text streams
VectraFlow: Long-Horizon Semantic Processing over Data and Event Streams with LLMs

Shu Chen +3
cs.DB 2026-04-04 reviewed

LLM turns changing web pages into verified JSON via embedding checks
Method for Aggregating Unstructured Data Using Large Language Models

Vsevolod Lazebnyi +3
cs.DC 2026-04-03 reviewed

Pessimistic sync cuts redundant I/Os in disaggregated KV stores
CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic Synchronization

Yuxuan Du +4
cs.DB 2026-04-03 reviewed

Workshop maps LLM-graph integration for data systems
LLM+Graph@VLDB'2025 Workshop Summary

Yixiang Fang +4