pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 5

  1. cs.DC 2026-04-15 reviewed
    OffloadFS moves database compaction to storage nodes for 3.36x speedup

    OffloadFS: Leveraging Disaggregated Storage for Computation Offloading

    Sungho Moon +6

  2. cs.CL 2026-04-15 reviewed
    Benchmark reveals 9% drop for Indic languages in Text-to-SQL

    IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages

    Aviral Dawar +3

  3. cs.DB 2026-04-15 reviewed
    Self-healing LLM loop boosts NL-to-SQL accuracy up to 9 points

    SQL Query Engine: A Self-Healing LLM Pipeline for Natural Language to PostgreSQL Translation

    Muhammad Adeel Ijaz

  4. cs.DC 2026-04-14 reviewed
    DySkew cuts UDF skew delays with runtime data swaps

    DySkew: Dynamic Data Redistribution for Skew-Resilient Snowpark UDF Execution

    Chenwei Xie +10

  5. cs.LG 2026-04-14 reviewed
    Log anomalies detected directly on compressed bytes

    CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

    Benzhao Tang +1

  6. cs.DB 2026-04-14 reviewed
    ROSE judges NL2SQL by user intent not reference SQL match

    ROSE: An Intent-Centered Evaluation Metric for NL2SQL

    Wenqi Pei +5

  7. cs.RO 2026-04-14 reviewed
    Panoramic 3D datasets reach 96 percent place categorization accuracy

    Multi-modal panoramic 3D outdoor datasets for place categorization

    Hojung Jung +4

  8. cs.DB 2026-04-14 reviewed
    Workflow builds reproducible 582k-paper chemistry corpus

    Lit2Vec: A Reproducible Workflow for Building a Legally Screened Chemistry Corpus from S2ORC for Downstream Retrieval and Text Mining

    Mahmoud Amiri +3

  9. cs.CR 2026-04-14 reviewed
    Three verification layers catch outsourced anonymization errors

    VeriX-Anon: A Multi-Layered Framework for Mathematically Verifiable Outsourced Target-Driven Data Anonymization

    Miit Daga +1

  10. cs.DB 2026-04-13 reviewed
    Benchmark reveals accuracy and efficiency gaps in NL2SQL methods

    NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

    Shizheng Hou +5

  11. cs.DB 2026-04-13 reviewed
    GraphAlg Core simulated in matrix language with induction

    Foundations of the GraphAlg Language

    Daan de Graaf +2

  12. cs.DB 2026-04-13 reviewed
    Ozone unifies four traffic datasets to cut experiment setup time 85%

    Ozone: A Unified Platform for Transportation Research

    Ou Zheng +13

  13. cs.DB 2026-04-13 reviewed
    Ozone unifies traffic datasets to cut setup time 85%

    Ozone: A Unified Platform for Transportation Research

    Ou Zheng +13

  14. cs.DB 2026-04-12 reviewed
    NL queries must sometimes invent their own data target

    Natural Language to What? A Vision for Intermediate Representations in NL-to-X Querying

    Shengqi Li +1

  15. cs.DB 2026-04-12 reviewed
    Fine-grained GPU model scales subgraph matching to larger queries

    gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

    Weitian Chen +5

  16. cs.AI 2026-04-11 reviewed
    Knowledge graph unifies AI artifact management across platforms

    Gypscie: A Cross-Platform AI Artifact Management System

    Fabio Porto +6

  17. cs.CL 2026-04-11 reviewed
    Benchmark lets dialogue clarify ambiguous table questions

    ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification

    Zhensheng Wang +5

  18. cs.CV 2026-04-11 reviewed
    New benchmark and alignment model raise VCOD performance on motion-heavy video

    YUV20K: A Complexity-Driven Benchmark and Trajectory-Aware Alignment Model for Video Camouflaged Object Detection

    Yiyu Liu +3

  19. cs.DB 2026-04-10 reviewed
    Dynamic programming chooses semantic filter positions to cut hybrid query costs

    PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

    Qiuyang Mang +7

  20. cs.DB 2026-04-10 reviewed
    Catalog defines 35 data error types in three categories

    A Catalog of Data Errors

    Divya Bhadauria +4

  21. cs.DB 2026-04-10 reviewed
    Decoupling vectors from indexes cuts storage by up to 58%

    Decoupling Vector Data and Index Storage for Space Efficiency

    Yuanming Ren +5

  22. cs.DB 2026-04-10 reviewed
    Decoupling vectors from indexes cuts storage by up to 59%

    Decoupling Vector Data and Index Storage for Space Efficiency

    Yuanming Ren +5

  23. cs.DB 2026-04-10 reviewed
    Proprietary tools top data quality metrics and LLM features

    Evaluating Data Quality Tools: Measurement Capabilities and LLM Integration

    Tobias Rehberger +3

  24. cs.CL 2026-04-10 reviewed
    Constraint solver matches patients to 32-72% more trials

    SatIR: Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching

    Cyrus Zhou +5

  25. cs.DB 2026-04-09 reviewed
    Constraint-guided LLMs generate graph queries with 31.6% F1 gains

    Graph Query Generation with Constraint-guided Large Language Agents

    Mengying Wang +5

  26. cs.MM 2026-04-09 reviewed
    Fine-tuned LLMs translate QoS to QoE and back with strong accuracy

    QoS-QoE Translation with Large Language Model

    Yingjie Yu +5

  27. cs.DS 2026-04-09 reviewed
    Color coding counts hypergraphlets faster on (α,β)-nice hypergraphs

    Counting HyperGraphlets via Color Coding: a Quadratic Barrier and How to Break It

    Marco Bressan +2

  28. cs.DB 2026-04-09 reviewed
    Dynamic graph method speeds up large language model training

    GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

    Tianhao Tang +2

  29. cs.DB 2026-04-09 reviewed
    FK graph traversal yields diverse SQL workloads for optimizer training

    SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

    Kahan Mehta +1

  30. cs.CR 2026-04-08 reviewed
    PostRI gives DP medians higher utility with post-release error intervals

    Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

    Thomas Humphries +4

  31. cs.DB 2026-04-08 reviewed
    Agent views help AI write complex SQL queries

    AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

    Minh Tam Pham +5

  32. cs.CR 2026-04-08 reviewed
    VulGD builds dynamic graph of vulnerabilities with LLM embeddings

    VulGD: A LLM-Powered Dynamic Open-Access Vulnerability Graph Database

    Luat Do +3

  33. cs.DB 2026-04-08 reviewed
    Small models outperform rules and LLMs in SQL query rewriting

    LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO

    Jiahui Li +5

  34. cs.CL 2026-04-08 reviewed
    LLMs vary SQL structure even when results match

    SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

    Yixi Zhou +6

  35. cs.DB 2026-04-08 reviewed
    CubeGraph stitches per-cell vector graphs for fast hybrid spatial search

    CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data

    Mingyu Yang +2

  36. cs.DB 2026-04-08 reviewed
    Non-forking chains cut Verkle Trie storage by 97.8%

    SonicDB S6: A Storage-Efficient Verkle Trie for High-Throughput Blockchains

    Luigi Crisci +3

  37. cs.DB 2026-04-08 reviewed
    Verkle Trie storage slashed 98% for 300ms blocks

    SonicDB S6: A Storage-Efficient Verkle Trie for High-Throughput Blockchains

    Luigi Crisci +3

  38. cs.DB 2026-04-08 reviewed
    Co-evolved evaluators find 6.8x faster database algorithms

    AI-Driven Research for Databases

    Audrey Cheng +7

  39. cs.DB 2026-04-07 reviewed
    Bayesian net models missingness to create probabilistic DB for queries

    Database Querying under Missing Values Governed by Missingness Mechanisms

    Leopoldo Bertossi +2

  40. cs.AI 2026-04-07 reviewed
    Toolkit pairs Python pipelines with AI chat for data harmonization

    BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

    Roque Lopez +3

  41. cs.SE 2026-04-07 reviewed
    ICT models become useful when treated as traceable open graphs

    All LCA models are wrong. Are some of them useful? Towards open computational LCA in ICT

    Vincent Corlay +8

  42. cs.DB 2026-04-07 reviewed
    Equivalence proofs compose correct database stores

    CobbleDB: Modelling Levelled Storage by Composition

    Emilie Ma (UBC) +3

  43. cs.CR 2026-04-07 reviewed
    Few central vectors poison nearly all top-k results

    Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

    Hanxi Li +7

  44. cs.DB 2026-04-07 reviewed
    Spatiotemporal warehouse lifts entity extraction F1 by 4.37%

    STIndex: A Context-Aware Multi-Dimensional Spatiotemporal Information Extraction System

    Wenxiao Zhang +7

  45. cs.DB 2026-04-06 reviewed
    PANDA derives optimal query plans from information bounds

    Query Optimization and Evaluation via Information Theory: A Tutorial

    Mahmoud Abo Khamis +2

  46. cs.DB 2026-04-06 reviewed
    Adaptive probing estimates high-dim similarity cardinalities

    Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing

    Zhonghan Chen +3

  47. cs.DB 2026-04-04 reviewed
    LLM operators bring semantic processing to text streams

    VectraFlow: Long-Horizon Semantic Processing over Data and Event Streams with LLMs

    Shu Chen +3

  48. cs.DB 2026-04-04 reviewed
    LLM turns changing web pages into verified JSON via embedding checks

    Method for Aggregating Unstructured Data Using Large Language Models

    Vsevolod Lazebnyi +3

  49. cs.DC 2026-04-03 reviewed
    Pessimistic sync cuts redundant I/Os in disaggregated KV stores

    CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic Synchronization

    Yuxuan Du +4

  50. cs.DB 2026-04-03 reviewed
    Workshop maps LLM-graph integration for data systems

    LLM+Graph@VLDB'2025 Workshop Summary

    Yixiang Fang +4