archive
Every paper Pith has read. Search by title, abstract, or pith.
446 papers in cs.DB · page 8
-
Wikipedia tables contradict across languages
Factual Inconsistencies in Multilingual Wikipedia Tables
-
Meta-learning enables real-time self-healing in databases
Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery
-
Serverless functions scale big spatiotemporal queries via parallel subqueries
Towards Serverless Processing of Spatiotemporal Big Data Queries
-
LLMs let researchers query MIMIC-IV in plain English
M3: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis
-
Contaminated synthetic data beats clean synthetic data for ML training
PuckTrick: A Library for Making Synthetic Data More Realistic
-
Localized subsets raise imputation accuracy in text tables
LDI: Localized Data Imputation for Text-Rich Tables
-
Hermes packs aggregates into ciphertexts for constant-time FHE queries
Hermes: Efficient Global Homomorphic Aggregation over Mutable Packed Ciphertexts
-
SCT framework yields consistent LLM agents for stakeholder views
Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation
-
LSM-VEC cuts memory 66% for dynamic billion-scale vector search
LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search
-
Model classifies personal data for GDPR-compliant reuse
Enabling the Reuse of Personal Data in Research: A Classification Model for Legal Compliance
-
AI models classify food processing levels from nutrient and text data
Informatics for Food Processing
-
MojoFrame reaches 4.6x speedup on TPC-H queries
MojoFrame: Dataframe Library in Mojo Language
-
Multi-vector index tuning cuts search latency up to 8 times
MINT: Multi-Vector Search Index Tuning
-
TurboQuant hits near-optimal quantization within 2.7x bound
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
-
vMODB Unifies Events and Data for ACID Microservices
vMODB: Unifying Event and Data Management for Distributed Asynchronous Applications
-
Learned index speeds up distributed spatial queries
LiLIS: A Lightweight Distributed Learned Index Framework for Spatial Decision Analysis
-
Knowledge graph retrieval lifts LLM cell annotation scores
ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation
-
Compass lifts compound AI goodput 2.4 to 5.1 times
Compass: SLO-aware Query Planner for Compound AI Serving at Scale
-
Partial rewards let 14B model top 400B+ ones on Text2SQL
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
-
FlockMTL adds LLM functions and RAG to DuckDB
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB
-
Query syntax decides when database scores align
Causality-Based Scores Alignment in Explainable Data Management
-
NeurBench adds drift factor to benchmark learned DB components
NeurBench: A Benchmark Suite for Learned Database Components with Drift Modeling
-
New graph RAG combinations beat prior leaders on QA tasks
In-depth Analysis of Graph-based RAG in a Unified Framework
-
Knapsack method lets 1.6B model beat larger LLMs at schema linking
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
-
Satisfiability of GNTC is 2ExpTime-complete
Guarded Negation Transitive Closure Logic
-
UCPDL+ matches UNFO* and contains ICPDL plus CQ
A Common Ancestor of PDL, Conjunctive Queries, and Unary Negation First-order
-
Benchmark finds no GNN wins on both accuracy and speed for graphs
OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks
-
Model repository reuses ER classifiers across sources with few labels
Efficient Model Repository for Entity Resolution: Construction, Search, and Integration
-
Transformed dependencies let chase answer queries without full model
Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality
-
The paper proposes four operations—drill-down
Advancing Object-Centric Process Mining with Multi-Dimensional Data Operations
-
Algorithm finds minimal-width positive first-order sentence via rewriting
Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity
-
Matrix Profile leads on multidimensional anomaly detection
Matrix Profile for Anomaly Detection on Multidimensional Time Series
-
Signed measures give selectivity models OOD generalization bounds
A Practical Theory of Generalization in Selectivity Learning
-
Metadata-lake catalogs virtual data-lakes
DatAasee -- A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake
-
Enterprise text-to-SQL benchmark shows 10.8% SOTA accuracy
BEAVER: An Enterprise Benchmark for Text-to-SQL
-
CycleTrajectory enriches GPS tracks with 5.64% matching error
CycleTrajectory: An End-to-End Pipeline for Enriching and Analyzing GPS Trajectories to Understand Cycling Behavior and Environment
-
Automated system finds data dependencies to speed queries 35%
Enabling Data Dependency-based Query Optimization
-
Optimizations speed navigational graph queries by orders of magnitude
Optimizing Navigational Graph Queries
-
CHESS hits 71.10% on BIRD text-to-SQL with 83% fewer LLM calls
CHESS: Contextual Harnessing for Efficient SQL Synthesis
-
Graph metrics fix entity clusters on data with duplicates
Graph-based Active Learning for Entity Cluster Repair
-
Parallel two-stage method scales symbolic time series approximation
Parallel Two-Stage Approach for Joint Symbolic Approximation of Time Series
-
Algorithm answers spatial queries with added connectivity rules
QQESPM: A Quantitative and Qualitative Spatial Pattern Matching Algorithm
-
New indicator ties privacy parameter choice to real dataset risks
Within-Dataset Disclosure Risk for Differential Privacy
-
Auto-FP solved by modeling as HPO or NAS problem
Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
-
Graph of whaling routes helps normalize catch-based population maps
WhaleVis: Visualizing the History of Commercial Whaling
-
Fine-tuned CLIP matches human tastes for AI images better than before
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
-
Convex program over variances scales marginal query release to 100 attributes
ResidualPlanner+: a scalable matrix mechanism for marginals and beyond
-
Multi-master protocol raises geo-database throughput 7x
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
-
Direct access to aggregate query answers holds under same conditions
Direct Access for Answers to Conjunctive Queries with Aggregation
-
CRCW PRAMs evaluate queries in constant time with work O(T^{1+ε})
Work-Efficient Query Evaluation in Constant Time with PRAMs