archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 6

cs.DB 2026-04-03 reviewed

Native graph index cuts multi-vector search latency up to 14 times
Unified and Efficient Approach for Multi-Vector Similarity Search

Binhan Yang +6
cs.DB 2026-04-03 reviewed

DCO shortcuts unstable in vector search benchmarks
Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits

Zhuanglin Zheng +5
cs.DB 2026-04-03 reviewed

Clustering gives LLMs full dataset context for semantic tasks
Semantic Data Processing with Holistic Data Understanding

Youran Sun +4
cs.DB 2026-04-02 reviewed

ReCAP makes relational DBs up to 400000x faster on constrained path queries
Efficient Path Query Processing in Relational Database Systems

Diego Rivera Correa +1
cs.DB 2026-04-02 reviewed

Hybrid query system cuts cost while raising accuracy on mixed structured-text tables
OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data

Nima Shahbazi +3
cs.DB 2026-04-02 reviewed

Bucket collector speeds large-k ANN search up to 3.8x
BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector

Ziqi Yin +4
cs.CL 2026-04-02 reviewed

Hybrid memory beats state-of-the-art LLM agent methods
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

Yanchen Wu +9
cs.DB 2026-04-02 reviewed

LLM tool adds database functions 34 percent more accurately
Automating Database-Native Function Code Synthesis with LLMs

Wei Zhou +6
cs.DB 2026-03-31 reviewed

Unifying 8,000 atomistic simulations into one queryable graph
Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data

Abril Azocar Guzman +3
cs.DB 2026-03-31 reviewed

GPU bucketing delivers 240x faster hybrid searches
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing

Xinkui Zhao +5
cs.DB 2026-03-30 reviewed

Query focus cuts RAG response time by 40 percent
QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference

Jianxin Yan +8
cs.DB 2026-03-29 reviewed

Platform structures electrospinning data including failures for predictive use
Electrospinning-Data.org: A FAIR, Structured Knowledge Resource for Nanofiber Fabrication

Mehrab Mahdian +2
cs.DB 2026-03-29 reviewed

Enzyme cuts daily pipeline compute by billions of CPU seconds
Enzyme: Incremental View Maintenance for Data Engineering

Ritwik Yadav +18
cs.DB 2026-03-29 reviewed

Streaming context cuts LLM first-token latency by up to 11x
Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

Rajveer Bachkaniwala +4
cs.DB 2026-03-29 reviewed

Streaming overlaps cut LLM first response time by 11x
Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

Rajveer Bachkaniwala +4
cs.DB 2026-03-27 reviewed

Ontology encodes EU Data Act rules for SPARQL compliance checks
DAOnt: A Formal Ontology for EU Data Act Compliance

Sheyla Leyva-S\'anchez +4
cs.DB 2026-03-26 reviewed

Refutational normalization speeds up complete JSON schema checks
JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

Mohamed-Amine Baazizi +6
cs.DB 2026-03-24 reviewed

Survey maps NLIDB methods for spatial-temporal databases
Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions

Samya Acharja +1
cs.DB 2026-03-24 reviewed

Value-based quadtree cuts spatial query time by 90%
Spatial Analysis on Value-Based Quadtrees of Rasterized Vector Data

Diana Baumann +3
cs.DB 2026-03-24 reviewed

Value-based quadtree cuts point-in-polygon latency by 90%
Spatial Analysis on Value-Based Quadtrees of Rasterized Vector Data

Diana Baumann +3
cs.DB 2026-03-23 reviewed

Embedding random tests inside the DBMS finds 23 bugs with higher true positives
DIRT: Database-Integrated Random Testing

Alperen Keles +3
cs.RO 2026-03-18 reviewed

Hybrid decoding speeds robot VLA models up to 2.45x
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Zihao Zheng +10
cs.DB 2026-03-17 reviewed

Assumptions enable clear dynamic relationships in object event logs
Detecting Dynamic Relationships in Object-Centric Event Logs

Alessandro Gianola +5
cs.DB 2026-03-17 reviewed

Itemset mining groups cities by shared land use patterns
Exploring Urban Land Use Patterns by Pattern Mining and Unsupervised Learning

Zdena Dobesova +2
cs.DB 2026-03-16 reviewed

Proxy models cut AI query costs by over 100x
100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Yeounoh Chung +11
cs.CL 2026-03-16 reviewed

Fixes for one agent model improve 13 others across seven families
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI

Jinhu Qi +6
cs.DB 2026-03-15 reviewed

Catalog system converts natural language to PromQL queries in 1.1 seconds
From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

Twinkll Sisodia
cs.DS 2026-03-15 reviewed

Dynamic counters deliver sublinear error for growing stream sketches
Sublime: Sublinear Error & Space for Unbounded Skewed Streams

Navid Eslami +3
cs.DB 2026-03-13 reviewed

Jaguar evaluates queries in N to the submodular width plus epsilon
Jaguar: A Primal Algorithm for Conjunctive Query Evaluation in Submodular-Width Time

Mahmoud Abo Khamis +1
cs.DB 2026-03-13 reviewed

DSL lets LLMs produce consistent sensor triggers
A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection

Philipp Reis +5
cs.DB 2026-03-13 reviewed

One graph index covers every range for filtered ANN search
RNSG: A Range-Aware Graph Index for Efficient Range-Filtered Approximate Nearest Neighbor Search

Zhiqiu Zou +5
cs.DB 2026-03-13 reviewed

Optimal sampling for matrix, star, and chain join-project queries
Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries

Xiao Hu +1
cs.DB 2026-03-12 reviewed

Toolkit auto-generates standard APIs for materials datasets
optimade-maker: Automated generation of interoperable materials APIs from static datasets

Kristjan Eimre +7
cs.DB 2026-03-12 reviewed

Toolkit turns raw materials data into standard APIs
optimade-maker: Automated generation of interoperable materials APIs from static datasets

Kristjan Eimre +7
cs.DB 2026-03-11 reviewed

Self-evolved cycles reach 83.1% MongoDB query accuracy
Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation

Mingwei Ye +5
cs.DB 2026-03-10 reviewed

Real-time terminology queries improve LLM metadata accuracy
Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

Josef Hardi +5
cs.DB 2026-03-10 reviewed

GeoBenchr benchmarks spatiotemporal DBs on real workloads
GeoBenchr: An Application-Centric Benchmarking Suite for Spatiotemporal Database Platforms

Tim C. Rese +4
cs.DB 2026-03-10 reviewed

LLM index tuning outperforms DTA in some cases
Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor

Xiaoying Wang +3
cs.DC 2026-03-06 reviewed

OMA retains Kubernetes crash evidence past the evidence horizon
Operational Memory Architecture for Kubernetes:Preserving Causal Context Across the Evidence Horizon

Shamsher Khan
cs.CL 2026-03-05 reviewed

DEBISS corpus supplies annotated spoken debates for NLP tasks
DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

Klaywert Danillo Ferreira de Souza +3
cs.LG 2026-03-04 reviewed

LLM templates let GNNs run 28x faster on huge graphs with 98% less memory
An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

Waleed Afandi +3
cs.DB 2026-03-04 reviewed

Mined constraints create realistic SQL query test cases
SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Andrew Tremante +5
cs.LG 2026-03-04 reviewed

Synthetic pre-training produces relational in-context learner
Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Yanbo Wang +3
cs.DB 2026-03-03 reviewed

Graphs gain normal forms that cover edge dependencies
Graph-Native Normalization

Johannes Schrott +2
cs.DB 2026-03-03 reviewed

Taxonomy groups LLM database operators into five categories
Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

Yunxiang Su +7
cs.DB 2026-02-27 reviewed

Python functions generate ocean RDF without semantic web tools
A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project

Erik Johan Nystad +1
cs.AI 2026-02-27 reviewed

Item-level data required for rigorous AI evaluation
AI Evaluation Should Require Standardized Item-Level Data Releases

Han Jiang +8
cs.AI 2026-02-27 reviewed

Item-level data releases required for valid AI benchmarks
AI Evaluation Should Require Standardized Item-Level Data Releases

Han Jiang +8
cs.DB 2026-02-27 reviewed

LLM agents generate large table datasets for recognition
TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous

Ruilin Zhang +1
cs.DB 2026-02-26 reviewed

On-disk vector search matches in-memory speed
AlayaLaser: Efficient Index Layout and Search Strategy for Large-scale High-dimensional Vector Similarity Search

Weijian Chen +6