archive
Every paper Pith has read. Search by title, abstract, or pith.
446 papers in cs.DB · page 1
-
CHRONOS unifies index decay, pricing and privacy in data markets
CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces
-
Learned indexes lift RocksDB to 2.1X read throughput with few changes
A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification
-
LLM search boosts blockchain throughput 211% using 8x fewer tests
BCTuner: LLM-Guided Monte Carlo Tree Search for Efficient Blockchain Knob Tuning
-
LLMs infer conceptual schemas from table headers and values
Conceptual Schema Inference for Tabular Datasets using Large Language Models
-
BERT classifier labels 55k Ming-Qing letters from title lists
A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works
-
Flipping optimization branches reveals 21 DBMS performance bugs
Finding Performance Issues in Database Systems by Exploiting Dormant Code Paths
-
Three measures quantify database unfairness under differential privacy
Measuring Database Unfairness via Dependency Quantification Under Differential Privacy
-
Benchmark shows LLMs drop on complex geospatial questions
GS-QA: A Benchmark for Geospatial Question Answering
-
Benchmark compares 12 pipelines for knowledge graph integration
Evaluation of Pipelines for Data Integration into Knowledge Graphs
-
Co-design speeds vector search up to 8.4 times over CPU
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
-
Polars analytics run at 1.5x cost inside SGX2 enclaves
Polars inside Intel SGX2 Enclaves: An Empirical Study of Confidential Analytical Query Processing
-
DivSkill-SQL lifts Text-to-SQL accuracy by up to 11 points
Residual Skill Optimization for Text-to-SQL Ensembles
-
EMOD 3.0 expands AOP-Wiki data model for AI and NAMs
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)
-
Local updates cut Shapley recompute cost by 1000 times
Dynamic Shapley Computation
-
Transaction research keeps going as systems change
Fifty Years of Transaction Processing Research (extended)
-
One in eight OpenAlex abstracts has integrity issues
One in Eight OpenAlex Abstracts Has Integrity Issues
-
Agent skills from expert methods beat docs for PostgreSQL tuning
A Case for Agentic Tuning: From Documentation to Action in PostgreSQL
-
Block-sphere quantizer lowers MSE and inner-product error
Block-Sphere Vector Quantization
-
Health data lakehouse shown usable for mixed-skill teams
OpenHealth Lake: Designing and testing a data lakehouse platform for health applications
-
Protocol captures synchronized multimodal meeting data
AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research
-
Dataset records affect at group
GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction
-
Benchmark shows attention models scale better than RNNs on sequences
CogScale: Scalable Benchmark for Sequence Processing
-
LatentBox cuts AI image storage by 78.7% using latents
LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design
-
Latent storage cuts AI image needs by 78.7 percent
LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design
-
ANNS updates run in I/O stalls for 2.68× faster inserts
Leveraging I/O Stalls for Efficient Scheduling in ANNS
-
Hierarchical rewards raise text accuracy in image generators
TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards
-
Example bundles auto-generate package queries with aggregate rules
Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond
-
Packed Plan Forests encode feasible NL database plans polynomially
Feasible Plan Generation with Ambiguity-Boundedness in Cross-Model Query Processing
-
Two-level router cuts log QA latency 55%
LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems
-
Vector search cuts SSD reads by verifying attributes after retrieval
PipeANN-Filter: An Efficient Filtered Vector Search System on SSD
-
DHNs capture unary negation fragment and counting extensions
Expressive Power of Deep Homomorphism Networks over Relational Databases
-
Agentic planner cuts big data query latency by 23%
Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics
-
Open-source ranking matches JCR for journals and conferences
General Science Ranking (GSR): An Open-Source, Citation-Normalized Journal and Conference Classification System for Computer Science and Medicine
-
Open-source ranking places CS conferences with journals
General Science Ranking (GSR): An Open-Source, Citation-Normalized Journal and Conference Classification System for Computer Science and Medicine
-
Coordinate heterogeneity predicts binary quantization recall
Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings
-
Fixed rotation and scalar quantizer keeps IVF recall stable in streaming data
IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer
-
Codebook-free layer keeps ANN recall stable under streaming
IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer
-
BBRes finds maximum defective cliques faster with early branch termination
Revisiting the Maximum Defective Clique Problem: Faster Branching and a Tighter Upper Bound
-
MetaEns selects better outlier ensembles with fewer models
Automatic Unsupervised Ensemble Outlier Model Selection--Extended Version
-
One framework turns utility numbers into readable bills with carbon totals
A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation
-
Hybrid LM-GNN narrows gap to RDL on relational prediction
Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks
-
Enriched ontology lifts AUC in database lineage prediction
Relational Database Data Lineage Ontology
-
Smaller vector indexes let GPUs beat CPUs for both SQL and search
To GPU or Not to GPU: Vector Search in Relational Engines
-
Fairness optimization cuts bias in RAG retrieval
Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation
-
Gaussian attention lifts relational graph performance by up to 13.8%
Gaussian Relational Graph Transformer
-
Hybrid sketches match best space bounds for dynamic graph connectivity
Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs
-
Retrieval augments schema graphs for relational database predictions
From Schema to Signal: Retrieval-Augmented Modeling for Relational Data Analytics
-
Stage-wise DPO reduces hallucinations in vision-language models
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
-
FPGA lock agents boost OLTP throughput 51X over CPUs
FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration
-
ELbotpreceq extends DL-Lite with reachability in NL
A Horn extension of DL-Lite with NL data complexity