archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 7

cs.IR 2026-02-26 reviewed

Tree structure lifts document QA accuracy 6-61 percent
MoDora: Tree-Based Semi-Structured Document Analysis System

Bangrui Xu +10
cs.CR 2026-02-26 reviewed

DPSQL+ adds minimum frequency rule to private SQL queries
DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

Tomoya Matsumoto +3
cs.DB 2026-02-25 reviewed

Thresholds on modal operators decompose fuzzy contexts into independent subcontexts
Decomposition of contexts into independent subcontexts based on thresholds

Roberto G. Arag\'on +2
cs.DB 2026-02-25 reviewed

Fuzzy contexts split into independent subcontexts via lattice blocks
Independent subcontexts and blocks of concept lattices. Definitions and relationships to decompose fuzzy contexts

Roberto G. Arag\'on +2
cs.DB 2026-02-25 reviewed

Text-to-SQL benchmarks miss big-data cost penalties
Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?

Germ\'an T. Eizaguirre +2
cs.DB 2026-02-21 reviewed

SmartNIC offloads Parquet decoding to speed lake queries
Should I Hide My Duck in the Lake?

Jonas Dann +1
cs.CY 2026-02-19 reviewed

172 open datasets found in learning analytics papers
Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE

Valdemar \v{S}v\'abensk\'y +3
cs.AI 2026-02-19 reviewed

Sonar-TS searches time series with SQL then verifies with code
Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

Zhao Tan +7
cs.DB 2026-02-17 reviewed

Algorithm maps workflow nets to POWL models preserving behavior
Hierarchical Decomposition of Separable Workflow-Nets

Humam Kourani +2
cs.DB 2026-02-12 reviewed

Algorithm decides SPARQL pattern satisfiability on Façade-X
Towards a theory of Fa\c{c}ade-X data access: satisfiability of SPARQL basic graph patterns

Luigi Asprino +1
cs.LG 2026-02-09 reviewed

Variable local prompts cut conflicts in federated vision learning
SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning

Yicheng Di +4
cs.DC 2026-02-09 reviewed

Original papers outperform tutorials for system design mastery
The Computer System Trail

Sushant Kumar Gupta
cs.DB 2026-02-07 reviewed

KRONE turns flat logs into hierarchies for 10% better anomaly F1
KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Lei Ma +7
cs.AI 2026-02-02 reviewed

LLMs need intrinsic strategies for data insight agency
Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

Wei Liu +4
cs.DC 2026-01-29 reviewed

Primary access hints speed Ethereum replay 25x
Ira: Efficient Transaction Replay for Distributed Systems

Adithya Bhat +2
cs.DB 2026-01-28 reviewed

Context packs raise LLM schema matching accuracy
ConStruM: A Structure-Guided LLM Framework for Context-Aware Schema Matching

Houming Chen +2
cs.IR 2026-01-26 reviewed

HyEm lets Euclidean indexes retrieve hyperbolic ontology embeddings
HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing

Ou Deng +3
cs.AI 2026-01-25 reviewed

Self-refinement and voting reach 86 percent SQL accuracy
LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting

Yu-Jie Yang +2
cs.DB 2026-01-23 reviewed

New predict operator lets SQL run LLM calls inside the database
iPDB -- Optimizing Semantic SQL Queries

Udesh Kumarasinghe +4
cs.DB 2026-01-15 reviewed

Algorithm turns math database schemes into relational apps
Translating database mathematical schemes into relational database software applications with MatBase

Christian Mancas +1
cs.DB 2026-01-14 reviewed

Natural language turns into SQL for cross-domain data exploration
TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models

Jun-Peng Zhu +12
cs.DB 2026-01-09 reviewed

Honeynet dataset logs 132k attacks across four Azure regions
Descriptor: Multi-Regional Cloud Honeypot Dataset (MURHCAD)

Enrique Feito-Casares +2
cs.DB 2026-01-09 reviewed

Online tool mines MLCS from sequences up to length 5000
OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences

Zhi Wang +5
cs.DB 2026-01-08 reviewed

Temporal attribution tracks dataflow dependencies lightly over time
Toward Temporal Attribution Analytics in Dataflows

Chrysanthi Kosyfaki +3
cs.DB 2025-12-11 reviewed

PANDAExpress drops polylog factor from query runtime
PANDAExpress: a Simpler and Faster PANDA Algorithm

Mahmoud Abo Khamis +2
cs.DB 2025-12-07 reviewed

OSM+ delivers billion-vertex global road graph for city experiments
OSM+: Billion-Level OpenStreetMap Dataset for City-wide Experiments

Guanjie Zheng +8
cs.DB 2025-12-05 reviewed

LLMs auto-swapped for cheaper models on repeated tasks
Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Nils Strassenburg +2
cs.DB 2025-12-02 reviewed

PyTorch I/O tweaks yield 3x faster distributed GPU queries
PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage

Jigao Luo +3
cs.DB 2025-12-01 reviewed

LLM agents repair 8771 invalid MOF database entries
LitMOF: An LLM Multi-Agent for Literature-Validated Metal-Organic Frameworks Database Correction and Expansion

Honghui Kim +2
cs.DB 2025-11-29 reviewed

Algorithm converts math data models to entity-relationship diagrams
MatBase algorithm for translating (E)MDM schemes into E-R data models

Christian Mancas +1
cs.DC 2025-11-27 reviewed

Tokenized context speeds edge LLM responses by up to 14%
DisCEdge: Distributed Context Management for Large Language Models at the Edge

Mohammadreza Malekabbasi +2
cs.AI 2025-11-27 reviewed

Neural query models match path counting after relaxation
Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation

Yannick Brunink +3
cs.DB 2025-11-26 reviewed

Hybrid index cuts tail latency 98% under mixed workloads
HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads

Xinyi Zhang +3
cs.DB 2025-11-19 reviewed

Answer-set programs compute sufficient explanations for database queries
Sufficient Explanations in Databases and their Connections to Database Repairs

Leopoldo Bertossi +1
cs.PL 2025-11-19 reviewed

Compiler derives pruning rules for tree queries
Bonsai: Compiling Queries to Pruned Tree Traversals

Alexander J Root +6
cs.DB 2025-11-18 reviewed

Gradient descent finds join orders with cost matching or beating discrete search
Gradient-Based Join Ordering

Tim Schwabe +1
cs.DB 2025-11-12 reviewed

GNN-PE scales exact subgraph matching to distributed clusters
Efficient Distributed Exact Subgraph Matching via GNN-PE: Load Balancing, Cache Optimization, and Query Plan Ranking

Yu Wang +3
cs.DB 2025-11-10 reviewed

AISQL speeds semantic queries 2-70x via cost-aware planning and cascades
Cortex AISQL: A Production SQL Engine for Unstructured Data

Pawe{\l} Liskowski +13
cs.DS 2025-10-31 reviewed

Learned models break entropy barrier in static functions
Learned Static Function Data Structures

Stefan Hermann +3
cs.DB 2025-10-29 reviewed

DGAI separates vectors from graphs for 8x faster ANN updates
DGAI: Decoupled On-Disk Graph-Based ANN Index for Efficient Updates and Queries

Jiahao Lou +8
cs.DB 2025-10-20 reviewed

Orchestration reaches 89.8% Text-to-SQL accuracy on Spider
DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

Boyan Li +4
cs.CL 2025-10-12 reviewed

Database feedback and memory improve multi-turn SQL accuracy
MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

Taicheng Guo +6
cs.DB 2025-10-11 reviewed

Algorithms mine low-utility sequences faster with less memory
Efficient Mining of Low-Utility Sequential Patterns

Jian Zhu +3
cs.DB 2025-09-18 reviewed

Partial orders from event data yield sound process models
Revealing Inherent Concurrency in Event Data: A Partial Order Approach to Process Discovery

Humam Kourani +2
cs.DB 2025-09-16 reviewed

ScaleDoc filters 85% of LLM calls on large document sets
ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

Hengrui Zhang +3
cs.DB 2025-09-12 reviewed

TurtleKV adapts key-value stores to shifting read and write demands
Dynamic read & write optimization with TurtleKV

Tony Astolfi +5
cs.HC 2025-09-05 reviewed

Users cannot tell acted idle animations from genuine ones
Evaluating Idle Animation Believability: a User Perspective

Eneko Atxa Landa +4
cs.DB 2025-08-30 reviewed

Optimizer picks best LLM sort paths at runtime
Access Paths for Efficient Ordering with Large Language Models

Fuheng Zhao +9
cs.DB 2025-08-07 reviewed

Versioned views and when-then rules make prompts adaptive in LLM pipelines
Making Prompts First-Class Citizens for Adaptive LLM Pipelines

Ugur Cetintemel +5
cs.LG 2025-08-05 reviewed

Merged T1D data yields 149 million glucose readings from 2510 subjects
Presenting DiaData for Research on Type 1 Diabetes

Beyza Cinar +1