archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 8

cs.CL 2025-07-24 reviewed

Wikipedia tables contradict across languages
Factual Inconsistencies in Multilingual Wikipedia Tables

Silvia Cappa +5
cs.DB 2025-07-18 reviewed

Meta-learning enables real-time self-healing in databases
Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery

Joydeep Chandra +1
cs.DB 2025-07-08 reviewed

Serverless functions scale big spatiotemporal queries via parallel subqueries
Towards Serverless Processing of Spatiotemporal Big Data Queries

Diana Baumann +2
cs.IR 2025-06-27 reviewed

LLMs let researchers query MIMIC-IV in plain English
M3: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

Rafi Al Attrach +5
cs.LG 2025-06-23 reviewed

Contaminated synthetic data beats clean synthetic data for ML training
PuckTrick: A Library for Making Synthetic Data More Realistic

Alessandra Agostini +2
cs.DB 2025-06-19 reviewed

Localized subsets raise imputation accuracy in text tables
LDI: Localized Data Imputation for Text-Rich Tables

Soroush Omidvartehrani +1
cs.CR 2025-06-03 reviewed

Hermes packs aggregates into ciphertexts for constant-time FHE queries
Hermes: Efficient Global Homomorphic Aggregation over Mutable Packed Ciphertexts

Dongfang Zhao
cs.MA 2025-05-23 reviewed

SCT framework yields consistent LLM agents for stakeholder views
Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation

Sola Kim +2
cs.DB 2025-05-22 reviewed

LSM-VEC cuts memory 66% for dynamic billion-scale vector search
LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

Shurui Zhong +2
cs.CY 2025-05-21 reviewed

Model classifies personal data for GDPR-compliant reuse
Enabling the Reuse of Personal Data in Research: A Classification Model for Legal Compliance

Eduard Mata i Noguera +2
cs.CL 2025-05-20 reviewed

AI models classify food processing levels from nutrient and text data
Informatics for Food Processing

Gordana Ispirova +2
cs.DB 2025-05-07 reviewed

MojoFrame reaches 4.6x speedup on TPC-H queries
MojoFrame: Dataframe Library in Mojo Language

Shengya Huang +3
cs.DB 2025-04-28 reviewed

Multi-vector index tuning cuts search latency up to 8 times
MINT: Multi-Vector Search Index Tuning

Jiongli Zhu +5
cs.LG 2025-04-28 reviewed

TurboQuant hits near-optimal quantization within 2.7x bound
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh +3
cs.DB 2025-04-28 reviewed

vMODB Unifies Events and Data for ACID Microservices
vMODB: Unifying Event and Data Management for Distributed Asynchronous Applications

Rodrigo Laigner +1
cs.DB 2025-04-26 reviewed

Learned index speeds up distributed spatial queries
LiLIS: A Lightweight Distributed Learned Index Framework for Spatial Decision Analysis

Zhongpu Chen +2
cs.CL 2025-04-24 reviewed

Knowledge graph retrieval lifts LLM cell annotation scores
ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation

Dezheng Han +5
cs.DB 2025-04-23 reviewed

Compass lifts compound AI goodput 2.4 to 5.1 times
Compass: SLO-aware Query Planner for Compound AI Serving at Scale

Banruo Liu +4
cs.LG 2025-04-21 reviewed

Partial rewards let 14B model top 400B+ ones on Text2SQL
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Simone Papicchio +3
cs.DB 2025-04-01 reviewed

FlockMTL adds LLM functions and RAG to DuckDB
Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Anas Dorbani +3
cs.DB 2025-03-18 reviewed

Query syntax decides when database scores align
Causality-Based Scores Alignment in Explainable Data Management

Felipe Azua +1
cs.DB 2025-03-18 reviewed

NeurBench adds drift factor to benchmark learned DB components
NeurBench: A Benchmark Suite for Learned Database Components with Drift Modeling

Zhanhao Zhao +7
cs.IR 2025-03-06 reviewed

New graph RAG combinations beat prior leaders on QA tasks
In-depth Analysis of Graph-based RAG in a Unified Framework

Yingli Zhou +10
cs.CL 2025-02-18 reviewed

Knapsack method lets 1.6B model beat larger LLMs at schema linking
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

Zheng Yuan +6
cs.LO 2025-01-25 reviewed

Satisfiability of GNTC is 2ExpTime-complete
Guarded Negation Transitive Closure Logic

Diego Figueira +2
cs.LO 2025-01-20 reviewed

UCPDL+ matches UNFO* and contains ICPDL plus CQ
A Common Ancestor of PDL, Conjunctive Queries, and Unary Negation First-order

Diego Figueira +1
cs.LG 2025-01-01 reviewed

Benchmark finds no GNN wins on both accuracy and speed for graphs
OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks

Haoyang Li +6
cs.DB 2024-12-12 reviewed

Model repository reuses ER classifiers across sources with few labels
Efficient Model Repository for Entity Resolution: Construction, Search, and Integration

Victor Christen +1
cs.AI 2024-12-12 reviewed

Transformed dependencies let chase answer queries without full model
Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality

Efthymia Tsamoura +1
cs.DB 2024-11-30 reviewed

The paper proposes four operations—drill-down
Advancing Object-Centric Process Mining with Multi-Dimensional Data Operations

Shahrzad Khayatbashi +2
cs.LO 2024-11-15 reviewed

Algorithm finds minimal-width positive first-order sentence via rewriting
Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity

Hubie Chen +1
cs.LG 2024-09-14 reviewed

Matrix Profile leads on multidimensional anomaly detection
Matrix Profile for Anomaly Detection on Multidimensional Time Series

Chin-Chia Michael Yeh +13
stat.ML 2024-09-11 reviewed

Signed measures give selectivity models OOD generalization bounds
A Practical Theory of Generalization in Selectivity Learning

Peizhi Wu +3
cs.DB 2024-09-09 reviewed

Metadata-lake catalogs virtual data-lakes
DatAasee -- A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

Christian Himpe
cs.CL 2024-09-03 reviewed

Enterprise text-to-SQL benchmark shows 10.8% SOTA accuracy
BEAVER: An Enterprise Benchmark for Text-to-SQL

Peter Baile Chen +8
cs.DB 2024-06-14 reviewed

CycleTrajectory enriches GPS tracks with 5.64% matching error
CycleTrajectory: An End-to-End Pipeline for Enriching and Analyzing GPS Trajectories to Understand Cycling Behavior and Environment

Meihui Wang +3
cs.DB 2024-06-11 reviewed

Automated system finds data dependencies to speed queries 35%
Enabling Data Dependency-based Query Optimization

Daniel Lindner +2
cs.DB 2024-06-08 reviewed

Optimizations speed navigational graph queries by orders of magnitude
Optimizing Navigational Graph Queries

Thomas Mulder +2
cs.LG 2024-05-27 reviewed

CHESS hits 71.10% on BIRD text-to-SQL with 83% fewer LLM calls
CHESS: Contextual Harnessing for Efficient SQL Synthesis

Shayan Talaei +4
cs.LG 2024-01-26 reviewed

Graph metrics fix entity clusters on data with duplicates
Graph-based Active Learning for Entity Cluster Repair

Victor Christen +4
cs.DS 2023-12-30 reviewed

Parallel two-stage method scales symbolic time series approximation
Parallel Two-Stage Approach for Joint Symbolic Approximation of Time Series

Xinye Chen
cs.DB 2023-12-14 reviewed

Algorithm answers spatial queries with added connectivity rules
QQESPM: A Quantitative and Qualitative Spatial Pattern Matching Algorithm

Carlos Minervino +3
cs.DB 2023-10-19 reviewed

New indicator ties privacy parameter choice to real dataset risks
Within-Dataset Disclosure Risk for Differential Privacy

Zhiru Zhu +1
cs.LG 2023-10-04 reviewed

Auto-FP solved by modeling as HPO or NAS problem
Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

Danrui Qi +3
cs.DB 2023-08-08 reviewed

Graph of whaling routes helps normalize catch-based population maps
WhaleVis: Visualizing the History of Commercial Whaling

Ameya Patil +3
cs.CV 2023-06-15 reviewed

Fine-tuned CLIP matches human tastes for AI images better than before
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu +6
cs.DB 2023-05-14 reviewed

Convex program over variances scales marginal query release to 100 attributes
ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Yingtai Xiao +5
cs.DB 2023-04-19 reviewed

Multi-master protocol raises geo-database throughput 7x
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

Weixing Zhou +13
cs.DB 2023-03-09 reviewed

Direct access to aggregate query answers holds under same conditions
Direct Access for Answers to Conjunctive Queries with Aggregation

Idan Eldar +2
cs.DB 2023-01-19 reviewed

CRCW PRAMs evaluate queries in constant time with work O(T^{1+ε})
Work-Efficient Query Evaluation in Constant Time with PRAMs

Jens Keppeler +2