archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 1

cs.DB 2026-05-22 reviewed

CHRONOS unifies index decay, pricing and privacy in data markets
CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

Joydeep Chandra
cs.DB 2026-05-22 reviewed

Learned indexes lift RocksDB to 2.1X read throughput with few changes
A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification

Shubham Vashisth +3
cs.DB 2026-05-22 reviewed

LLM search boosts blockchain throughput 211% using 8x fewer tests
BCTuner: LLM-Guided Monte Carlo Tree Search for Efficient Blockchain Knob Tuning

Yaoyi Deng +6
cs.DB 2026-05-21 reviewed

LLMs infer conceptual schemas from table headers and values
Conceptual Schema Inference for Tabular Datasets using Large Language Models

Zhenyu Wu +2
cs.CL 2026-05-21 reviewed

BERT classifier labels 55k Ming-Qing letters from title lists
A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works

Queenie Luo
cs.SE 2026-05-21 reviewed

Flipping optimization branches reveals 21 DBMS performance bugs
Finding Performance Issues in Database Systems by Exploiting Dormant Code Paths

Jinsheng Ba +1
cs.DB 2026-05-21 reviewed

Three measures quantify database unfairness under differential privacy
Measuring Database Unfairness via Dependency Quantification Under Differential Privacy

Mariia Vologdin +2
cs.DB 2026-05-21 reviewed

Benchmark shows LLMs drop on complex geospatial questions
GS-QA: A Benchmark for Geospatial Question Answering

Majid Saeedan +3
cs.AI 2026-05-21 reviewed

Benchmark compares 12 pipelines for knowledge graph integration
Evaluation of Pipelines for Data Integration into Knowledge Graphs

Marvin Hofer +1
cs.AR 2026-05-21 reviewed

Co-design speeds vector search up to 8.4 times over CPU
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

Cheng Zou +8
cs.CR 2026-05-20 reviewed

Polars analytics run at 1.5x cost inside SGX2 enclaves
Polars inside Intel SGX2 Enclaves: An Empirical Study of Confidential Analytical Query Processing

Wei Wang +2
cs.CL 2026-05-20 reviewed

DivSkill-SQL lifts Text-to-SQL accuracy by up to 11 points
Residual Skill Optimization for Text-to-SQL Ensembles

Jiongli Zhu +10
cs.AI 2026-05-20 reviewed

EMOD 3.0 expands AOP-Wiki data model for AI and NAMs
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)

Virginia K. Hench +4
cs.LG 2026-05-20 reviewed

Local updates cut Shapley recompute cost by 1000 times
Dynamic Shapley Computation

Xuan Yang +3
cs.DB 2026-05-19 reviewed

Transaction research keeps going as systems change
Fifty Years of Transaction Processing Research (extended)

Philip A. Bernstein (Microsoft Research)
cs.DL 2026-05-19 reviewed

One in eight OpenAlex abstracts has integrity issues
One in Eight OpenAlex Abstracts Has Integrity Issues

Seorin Kim +2
cs.SE 2026-05-19 reviewed

Agent skills from expert methods beat docs for PostgreSQL tuning
A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

Hongyu Lin +6
cs.LG 2026-05-19 reviewed

Block-sphere quantizer lowers MSE and inner-product error
Block-Sphere Vector Quantization

Heesang Ann +2
cs.SE 2026-05-19 reviewed

Health data lakehouse shown usable for mixed-skill teams
OpenHealth Lake: Designing and testing a data lakehouse platform for health applications

Danilo Silva +5
cs.HC 2026-05-19 reviewed

Protocol captures synchronized multimodal meeting data
AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research

Meisam Jamshidi Seikavandi +8
cs.AI 2026-05-19 reviewed

Dataset records affect at group
GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

Meisam Jamshidi Seikavandi +12
cs.AI 2026-05-19 reviewed

Benchmark shows attention models scale better than RNNs on sequences
CogScale: Scalable Benchmark for Sequence Processing

Yannis Bendi-Ouis (Mnemosyne) +2
cs.DC 2026-05-19 reviewed

LatentBox cuts AI image storage by 78.7% using latents
LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design

Zirui Wang +6
cs.DC 2026-05-19 reviewed

Latent storage cuts AI image needs by 78.7 percent
LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design

Zirui Wang +6
cs.DB 2026-05-19 reviewed

ANNS updates run in I/O stalls for 2.68× faster inserts
Leveraging I/O Stalls for Efficient Scheduling in ANNS

Juncheng Zhang +3
cs.CV 2026-05-19 reviewed

Hierarchical rewards raise text accuracy in image generators
TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards

Mingxuan Cui +8
cs.DB 2026-05-19 reviewed

Example bundles auto-generate package queries with aggregate rules
Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond

Whanhee Cho +6
cs.DB 2026-05-18 reviewed

Packed Plan Forests encode feasible NL database plans polynomially
Feasible Plan Generation with Ambiguity-Boundedness in Cross-Model Query Processing

Subhasis Dasgupta +1
cs.LG 2026-05-18 reviewed

Two-level router cuts log QA latency 55%
LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems

Mert Coskuner +2
cs.OS 2026-05-18 reviewed

Vector search cuts SSD reads by verifying attributes after retrieval
PipeANN-Filter: An Efficient Filtered Vector Search System on SSD

Hao Guo +2
cs.DB 2026-05-18 reviewed

DHNs capture unary negation fragment and counting extensions
Expressive Power of Deep Homomorphism Networks over Relational Databases

Moritz Sch\"onherr +5
cs.LG 2026-05-18 reviewed

Agentic planner cuts big data query latency by 23%
Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics

Mahdi Naser-Moghadasi
cs.DL 2026-05-17 reviewed

Open-source ranking matches JCR for journals and conferences
General Science Ranking (GSR): An Open-Source, Citation-Normalized Journal and Conference Classification System for Computer Science and Medicine

Zhikai Yu
cs.DL 2026-05-17 reviewed

Open-source ranking places CS conferences with journals
General Science Ranking (GSR): An Open-Source, Citation-Normalized Journal and Conference Classification System for Computer Science and Medicine

Zhikai Yu
cs.LG 2026-05-17 reviewed

Coordinate heterogeneity predicts binary quantization recall
Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings

Wenxuan Xiao
cs.LG 2026-05-17 reviewed

Fixed rotation and scalar quantizer keeps IVF recall stable in streaming data
IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer

Tarun Sharma
cs.LG 2026-05-17 reviewed

Codebook-free layer keeps ANN recall stable under streaming
IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer

Tarun Sharma
cs.DB 2026-05-16 reviewed

BBRes finds maximum defective cliques faster with early branch termination
Revisiting the Maximum Defective Clique Problem: Faster Branching and a Tighter Upper Bound

Kewu Yang +3
cs.LG 2026-05-15 reviewed

MetaEns selects better outlier ensembles with fewer models
Automatic Unsupervised Ensemble Outlier Model Selection--Extended Version

Hong-Phuc Phan +5
cs.CL 2026-05-15 reviewed

One framework turns utility numbers into readable bills with carbon totals
A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation

Pavan Manjunath +1
cs.DB 2026-05-15 reviewed

Hybrid LM-GNN narrows gap to RDL on relational prediction
Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Jingcheng Wu +6
cs.DB 2026-05-15 reviewed

Enriched ontology lifts AUC in database lineage prediction
Relational Database Data Lineage Ontology

Jakub Dutkiewicz +2
cs.DB 2026-05-15 reviewed

Smaller vector indexes let GPUs beat CPUs for both SQL and search
To GPU or Not to GPU: Vector Search in Relational Engines

Vasilis Mageirakos +5
cs.DB 2026-05-15 reviewed

Fairness optimization cuts bias in RAG retrieval
Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation

Yingqi Zhao +3
cs.LG 2026-05-15 reviewed

Gaussian attention lifts relational graph performance by up to 13.8%
Gaussian Relational Graph Transformer

Zezhong Ding +3
cs.DS 2026-05-14 reviewed

Hybrid sketches match best space bounds for dynamic graph connectivity
Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs

Quinten De Man +4
cs.DB 2026-05-14 reviewed

Retrieval augments schema graphs for relational database predictions
From Schema to Signal: Retrieval-Augmented Modeling for Relational Data Analytics

Lingze Zeng +5
cs.CV 2026-05-13 reviewed

Stage-wise DPO reduces hallucinations in vision-language models
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift

Qinwu Xu
cs.AR 2026-05-13 reviewed

FPGA lock agents boost OLTP throughput 51X over CPUs
FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration

Shien Zhu +1
cs.LO 2026-05-13 reviewed

ELbotpreceq extends DL-Lite with reachability in NL
A Horn extension of DL-Lite with NL data complexity

Janos Arpasi +2