archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 2

cs.DL 2026-05-13 reviewed

Graph links 200k research repos to papers and artifacts
SemRepo: A Knowledge Graph for Research Software and Its Scholarly Ecosystem

Abdul Rafay +3
cs.DB 2026-05-13 reviewed

Benchmark shows top multimodal models lag on e-commerce
OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems

Yong Liu +7
cs.CV 2026-05-12 reviewed

3D primitives in code raise VLM spatial scores up to 17 percent
3D Primitives are a Spatial Language for VLMs

Junze Liu +10
eess.SP 2026-05-12 reviewed

Commercial 5G dataset aids AI handover and beam management
Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

Mannam Veera Narayana +3
cs.DB 2026-05-12 reviewed

Chase termination undecidable even for decidable queries
Will My Favorite Chases Terminate if Evaluating Conjunctive Queries Does? One Does Not Simply Decide This

Lucas Larroque +1
cs.DB 2026-05-12 reviewed

Separating instances pick correct NL2SQL candidate
Data-aware candidate selection in NL2SQL translation via small separating instances

Stanislav Kikot +2
cs.IR 2026-05-12 reviewed

BatchBench framework equalizes autoscaling policy tests
BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework

Venkata Krishna Prasanth Budigi +1
cs.DB 2026-05-12 reviewed

Graph queries for optimization reveal hidden data flaws
Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs

Madhulatha Mandarapu (samyama.ai) +1
cs.DB 2026-05-12 reviewed

Knowledge graphs source optimization problems via queries
Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs

Madhulatha Mandarapu (samyama.ai) +1
cs.DB 2026-05-12 reviewed

Replicas detect and repair database corruption without stopping work
PROTECT-DB: Protecting Data using Replicated State Machines: Efficient Corruption Detection & Recovery

Anant Utgikar +1
cs.AI 2026-05-12 reviewed

LLMs cannot always be correct
A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination

Vinu Ellampallil Venugopal
cs.LG 2026-05-12 reviewed

Benchmark with 40 epidemic datasets enables fair model comparisons
EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting

Madhurima Panja +4
cs.LG 2026-05-12 reviewed

Relational signals lift membership inference on tabular diffusion models
FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models

Abtin Mahyar +3
cs.DB 2026-05-11 reviewed

SHACL-DS validates named graphs faster than standard SHACL
Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph

Davan Chiem Dao +2
cs.DB 2026-05-11 reviewed

Single GPU kernel fuses IO and query steps for faster analytics
Data Path Fusion in GPU for Analytical Query Processing

Tsuyoshi Ozawa +1
cs.DB 2026-05-11 reviewed

Text2Cypher must reason across multiple graph databases
Toward Multi-Database Query Reasoning for Text2Cypher

Makbule Gulcin Ozsoy
cs.AI 2026-05-11 reviewed

Autonomous objects resolve over half of scientific data conflicts
Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge

Zeyd Boukhers +3
cs.DB 2026-05-11 reviewed

Cloud GPUs speed graph index construction by 9x at 6x lower cost
ScaleGANN: Accelerate Large-Scale ANN Indexing by Cost-effective Cloud GPUs

Lan Lu +7
cs.IR 2026-05-11 reviewed

Graph of codecs compresses data smaller and faster
OpenZL: Using Graphs to Compress Smaller and Faster

Yann Collet +12
cs.CL 2026-05-10 reviewed

Home activity benchmark shows AI question-answering gaps
HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

Shusaku Egami +7
cs.DB 2026-05-09 reviewed

Krone decomposes logs into entity-action-status units for modular anomaly detection
Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

Lei Ma +7
cs.AI 2026-05-09 reviewed

One-to-one matching boosts ontology alignment precision
Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment

Fabio Rovai
cs.LG 2026-05-09 reviewed

Urine biomarkers add reliable signal to chlamydia risk models
Machine Learning-Based Pre-Test Risk Stratification for PCR-Confirmed Chlamydia Using Patient-Reported Data and Urine Biomarkers

Mehrab Mahdian +3
cs.DB 2026-05-09 reviewed

Personalized privacy cuts infinite stream estimation error by 53.6%
Personalized w-Event Privacy for Infinite Stream Estimation

Leilei Du +6
cs.AI 2026-05-09 reviewed

Diagnosis consistency links to actual causality for AI explanations
Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations

Leopoldo Bertossi
cs.DB 2026-05-09 reviewed

LLMs fall short on natural language data prep tasks
PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?

Jingzhe Xu +3
cs.DB 2026-05-09 reviewed

Elastic scheduling meets stream deadlines at lowest cost
Elastic Scheduling of Intermittent Query Processing in a Cluster Environment

Saranya Chandrasekaran +1
cs.DB 2026-05-09 reviewed

Elastic batch scheduling meets deadlines at lower cost
Elastic Scheduling of Intermittent Query Processing in a Cluster Environment

Saranya Chandrasekaran +1
cs.DB 2026-05-08 reviewed

Heavy-light partitioning maintains arbitrary joins under updates
Maintaining Queries under Updates Using Heavy-Light Partitioning of the Input Relations

Mahmoud Abo-Khamis +4
cs.DB 2026-05-07 reviewed

SkipDisk hits 63% HNSW latency at 20% memory
Low-Latency Out-of-Core ANN Search in High-Dimensional Space

Ziwen Song +3
cs.DB 2026-05-07 reviewed

Query rewrite rules written once deploy across database engines
An Extensible and Verifiable Language for Query Rewrite Rules

Sicheng Pan +5
cs.DB 2026-05-07 reviewed

Every query reduces to Filter
Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation

Vicki Stover Hertzberg +2
cond-mat.mtrl-sci 2026-05-06 reviewed

Diversity selection builds versatile materials datasets
Building informative materials datasets beyond targeted objectives

Rafael Espinosa Casta\~neda +8
cs.DB 2026-05-06 reviewed

Caching cuts redundant CBO calls in cost-based query rewrite
Efficient Cost-Based Rewrite in a Bottom-Up Optimizer

Qi Cheng +6
cs.LG 2026-05-06 reviewed

Only solution concentration ranks consistently across electrospinning ML models
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

Mehrab Mahdian +2
cs.LG 2026-05-06 reviewed

Concentration alone has zero rank variability in electrospinning models
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

Mehrab Mahdian +2
cs.DB 2026-05-06 reviewed

Hierarchical agents clean messy time series without ground truth
AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning

Yuhan Shi +4
cs.LG 2026-05-05 reviewed

Fused soil dataset pretrains model to capture real processes
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

Kuangdai Leng +3
cs.LG 2026-05-05 reviewed

Fused soil dataset pretrains representations matching real processes
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

Kuangdai Leng +3
cs.DB 2026-05-05 reviewed

Database repairs match preferred extensions in SETAFs
Inconsistent Databases and Argumentation Frameworks with Collective Attacks

Yasir Mahmood +3
cs.DB 2026-05-05 reviewed

ConRAD introduces a framework that applies conformal risk control inside neural graph…
ConRAD: Conformal Risk-Aware Neural Databases

Sonia Horchidan +6
cs.DB 2026-05-05 reviewed

Sliced kd-trees speed up multi-dimensional queries in memory
In-memory Multidimensional Indexing Using the skd-tree

Achilleas Michalopoulos +2
cs.AI 2026-05-05 reviewed

AI agents average 45 percent on workspace tasks with 20k files
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Zirui Tang +21
cs.AI 2026-05-05 reviewed

AI agents top out at 60% on workspace file dependency tasks
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Zirui Tang +21
cs.AI 2026-05-05 reviewed

Agents reach 68.7% on workspace tasks with big file sets
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Zirui Tang +21
cs.AI 2026-05-05 reviewed

Agents hit 43% average on realistic workspace tasks
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Zirui Tang +21
cs.DB 2026-05-05 reviewed

3B model hits 85% Text-to-SQL accuracy using fine-grained rewards
FINER-SQL: Boosting Small Language Models for Text-to-SQL

Thanh Dat Hoang +6
cs.SE 2026-05-05 reviewed

AI models recover semantics from legacy database code
Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI

Christian Mancas +1
cs.IR 2026-05-05 reviewed

Unified PostgreSQL layer cuts RAG latency up to 92%
Beyond Similarity Search: A Unified Data Layer for Production RAG Systems

Venkata Krishna Prasanth Budigi +1
cs.DB 2026-05-04 reviewed

Static checker catches JDBC type errors before runtime
Static Type Checking for Database Access Code

Thomas James Kirz +3