pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 8

  1. cs.CL 2025-07-24 reviewed
    Wikipedia tables contradict across languages

    Factual Inconsistencies in Multilingual Wikipedia Tables

    Silvia Cappa +5

  2. cs.DB 2025-07-18 reviewed
    Meta-learning enables real-time self-healing in databases

    Efficient and Scalable Self-Healing Databases Using Meta-Learning and Dependency-Driven Recovery

    Joydeep Chandra +1

  3. cs.DB 2025-07-08 reviewed
    Serverless functions scale big spatiotemporal queries via parallel subqueries

    Towards Serverless Processing of Spatiotemporal Big Data Queries

    Diana Baumann +2

  4. cs.IR 2025-06-27 reviewed
    LLMs let researchers query MIMIC-IV in plain English

    M3: Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

    Rafi Al Attrach +5

  5. cs.LG 2025-06-23 reviewed
    Contaminated synthetic data beats clean synthetic data for ML training

    PuckTrick: A Library for Making Synthetic Data More Realistic

    Alessandra Agostini +2

  6. cs.DB 2025-06-19 reviewed
    Localized subsets raise imputation accuracy in text tables

    LDI: Localized Data Imputation for Text-Rich Tables

    Soroush Omidvartehrani +1

  7. cs.CR 2025-06-03 reviewed
    Hermes packs aggregates into ciphertexts for constant-time FHE queries

    Hermes: Efficient Global Homomorphic Aggregation over Mutable Packed Ciphertexts

    Dongfang Zhao

  8. cs.MA 2025-05-23 reviewed
    SCT framework yields consistent LLM agents for stakeholder views

    Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation

    Sola Kim +2

  9. cs.DB 2025-05-22 reviewed
    LSM-VEC cuts memory 66% for dynamic billion-scale vector search

    LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

    Shurui Zhong +2

  10. cs.CY 2025-05-21 reviewed
    Model classifies personal data for GDPR-compliant reuse

    Enabling the Reuse of Personal Data in Research: A Classification Model for Legal Compliance

    Eduard Mata i Noguera +2

  11. cs.CL 2025-05-20 reviewed
    AI models classify food processing levels from nutrient and text data

    Informatics for Food Processing

    Gordana Ispirova +2

  12. cs.DB 2025-05-07 reviewed
    MojoFrame reaches 4.6x speedup on TPC-H queries

    MojoFrame: Dataframe Library in Mojo Language

    Shengya Huang +3

  13. cs.DB 2025-04-28 reviewed
    Multi-vector index tuning cuts search latency up to 8 times

    MINT: Multi-Vector Search Index Tuning

    Jiongli Zhu +5

  14. cs.LG 2025-04-28 reviewed
    TurboQuant hits near-optimal quantization within 2.7x bound

    TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

    Amir Zandieh +3

  15. cs.DB 2025-04-28 reviewed
    vMODB Unifies Events and Data for ACID Microservices

    vMODB: Unifying Event and Data Management for Distributed Asynchronous Applications

    Rodrigo Laigner +1

  16. cs.DB 2025-04-26 reviewed
    Learned index speeds up distributed spatial queries

    LiLIS: A Lightweight Distributed Learned Index Framework for Spatial Decision Analysis

    Zhongpu Chen +2

  17. cs.CL 2025-04-24 reviewed
    Knowledge graph retrieval lifts LLM cell annotation scores

    ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation

    Dezheng Han +5

  18. cs.DB 2025-04-23 reviewed
    Compass lifts compound AI goodput 2.4 to 5.1 times

    Compass: SLO-aware Query Planner for Compound AI Serving at Scale

    Banruo Liu +4

  19. cs.LG 2025-04-21 reviewed
    Partial rewards let 14B model top 400B+ ones on Text2SQL

    Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

    Simone Papicchio +3

  20. cs.DB 2025-04-01 reviewed
    FlockMTL adds LLM functions and RAG to DuckDB

    Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

    Anas Dorbani +3

  21. cs.DB 2025-03-18 reviewed
    Query syntax decides when database scores align

    Causality-Based Scores Alignment in Explainable Data Management

    Felipe Azua +1

  22. cs.DB 2025-03-18 reviewed
    NeurBench adds drift factor to benchmark learned DB components

    NeurBench: A Benchmark Suite for Learned Database Components with Drift Modeling

    Zhanhao Zhao +7

  23. cs.IR 2025-03-06 reviewed
    New graph RAG combinations beat prior leaders on QA tasks

    In-depth Analysis of Graph-based RAG in a Unified Framework

    Yingli Zhou +10

  24. cs.CL 2025-02-18 reviewed
    Knapsack method lets 1.6B model beat larger LLMs at schema linking

    Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

    Zheng Yuan +6

  25. cs.LO 2025-01-25 reviewed
    Satisfiability of GNTC is 2ExpTime-complete

    Guarded Negation Transitive Closure Logic

    Diego Figueira +2

  26. cs.LO 2025-01-20 reviewed
    UCPDL+ matches UNFO* and contains ICPDL plus CQ

    A Common Ancestor of PDL, Conjunctive Queries, and Unary Negation First-order

    Diego Figueira +1

  27. cs.LG 2025-01-01 reviewed
    Benchmark finds no GNN wins on both accuracy and speed for graphs

    OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks

    Haoyang Li +6

  28. cs.DB 2024-12-12 reviewed
    Model repository reuses ER classifiers across sources with few labels

    Efficient Model Repository for Entity Resolution: Construction, Search, and Integration

    Victor Christen +1

  29. cs.AI 2024-12-12 reviewed
    Transformed dependencies let chase answer queries without full model

    Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality

    Efthymia Tsamoura +1

  30. cs.DB 2024-11-30 reviewed
    The paper proposes four operations—drill-down

    Advancing Object-Centric Process Mining with Multi-Dimensional Data Operations

    Shahrzad Khayatbashi +2

  31. cs.LO 2024-11-15 reviewed
    Algorithm finds minimal-width positive first-order sentence via rewriting

    Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity

    Hubie Chen +1

  32. cs.LG 2024-09-14 reviewed
    Matrix Profile leads on multidimensional anomaly detection

    Matrix Profile for Anomaly Detection on Multidimensional Time Series

    Chin-Chia Michael Yeh +13

  33. stat.ML 2024-09-11 reviewed
    Signed measures give selectivity models OOD generalization bounds

    A Practical Theory of Generalization in Selectivity Learning

    Peizhi Wu +3

  34. cs.DB 2024-09-09 reviewed
    Metadata-lake catalogs virtual data-lakes

    DatAasee -- A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

    Christian Himpe

  35. cs.CL 2024-09-03 reviewed
    Enterprise text-to-SQL benchmark shows 10.8% SOTA accuracy

    BEAVER: An Enterprise Benchmark for Text-to-SQL

    Peter Baile Chen +8

  36. cs.DB 2024-06-14 reviewed
    CycleTrajectory enriches GPS tracks with 5.64% matching error

    CycleTrajectory: An End-to-End Pipeline for Enriching and Analyzing GPS Trajectories to Understand Cycling Behavior and Environment

    Meihui Wang +3

  37. cs.DB 2024-06-11 reviewed
    Automated system finds data dependencies to speed queries 35%

    Enabling Data Dependency-based Query Optimization

    Daniel Lindner +2

  38. cs.DB 2024-06-08 reviewed
    Optimizations speed navigational graph queries by orders of magnitude

    Optimizing Navigational Graph Queries

    Thomas Mulder +2

  39. cs.LG 2024-05-27 reviewed
    CHESS hits 71.10% on BIRD text-to-SQL with 83% fewer LLM calls

    CHESS: Contextual Harnessing for Efficient SQL Synthesis

    Shayan Talaei +4

  40. cs.LG 2024-01-26 reviewed
    Graph metrics fix entity clusters on data with duplicates

    Graph-based Active Learning for Entity Cluster Repair

    Victor Christen +4

  41. cs.DS 2023-12-30 reviewed
    Parallel two-stage method scales symbolic time series approximation

    Parallel Two-Stage Approach for Joint Symbolic Approximation of Time Series

    Xinye Chen

  42. cs.DB 2023-12-14 reviewed
    Algorithm answers spatial queries with added connectivity rules

    QQESPM: A Quantitative and Qualitative Spatial Pattern Matching Algorithm

    Carlos Minervino +3

  43. cs.DB 2023-10-19 reviewed
    New indicator ties privacy parameter choice to real dataset risks

    Within-Dataset Disclosure Risk for Differential Privacy

    Zhiru Zhu +1

  44. cs.LG 2023-10-04 reviewed
    Auto-FP solved by modeling as HPO or NAS problem

    Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

    Danrui Qi +3

  45. cs.DB 2023-08-08 reviewed
    Graph of whaling routes helps normalize catch-based population maps

    WhaleVis: Visualizing the History of Commercial Whaling

    Ameya Patil +3

  46. cs.CV 2023-06-15 reviewed
    Fine-tuned CLIP matches human tastes for AI images better than before

    Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

    Xiaoshi Wu +6

  47. cs.DB 2023-05-14 reviewed
    Convex program over variances scales marginal query release to 100 attributes

    ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

    Yingtai Xiao +5

  48. cs.DB 2023-04-19 reviewed
    Multi-master protocol raises geo-database throughput 7x

    GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

    Weixing Zhou +13

  49. cs.DB 2023-03-09 reviewed
    Direct access to aggregate query answers holds under same conditions

    Direct Access for Answers to Conjunctive Queries with Aggregation

    Idan Eldar +2

  50. cs.DB 2023-01-19 reviewed
    CRCW PRAMs evaluate queries in constant time with work O(T^{1+ε})

    Work-Efficient Query Evaluation in Constant Time with PRAMs

    Jens Keppeler +2