pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 2

  1. cs.DL 2026-05-13 reviewed
    Graph links 200k research repos to papers and artifacts

    SemRepo: A Knowledge Graph for Research Software and Its Scholarly Ecosystem

    Abdul Rafay +3

  2. cs.DB 2026-05-13 reviewed
    Benchmark shows top multimodal models lag on e-commerce

    OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems

    Yong Liu +7

  3. cs.CV 2026-05-12 reviewed
    3D primitives in code raise VLM spatial scores up to 17 percent

    3D Primitives are a Spatial Language for VLMs

    Junze Liu +10

  4. eess.SP 2026-05-12 reviewed
    Commercial 5G dataset aids AI handover and beam management

    Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

    Mannam Veera Narayana +3

  5. cs.DB 2026-05-12 reviewed
    Chase termination undecidable even for decidable queries

    Will My Favorite Chases Terminate if Evaluating Conjunctive Queries Does? One Does Not Simply Decide This

    Lucas Larroque +1

  6. cs.DB 2026-05-12 reviewed
    Separating instances pick correct NL2SQL candidate

    Data-aware candidate selection in NL2SQL translation via small separating instances

    Stanislav Kikot +2

  7. cs.IR 2026-05-12 reviewed
    BatchBench framework equalizes autoscaling policy tests

    BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework

    Venkata Krishna Prasanth Budigi +1

  8. cs.DB 2026-05-12 reviewed
    Graph queries for optimization reveal hidden data flaws

    Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs

    Madhulatha Mandarapu (samyama.ai) +1

  9. cs.DB 2026-05-12 reviewed
    Knowledge graphs source optimization problems via queries

    Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs

    Madhulatha Mandarapu (samyama.ai) +1

  10. cs.DB 2026-05-12 reviewed
    Replicas detect and repair database corruption without stopping work

    PROTECT-DB: Protecting Data using Replicated State Machines: Efficient Corruption Detection & Recovery

    Anant Utgikar +1

  11. cs.AI 2026-05-12 reviewed
    LLMs cannot always be correct

    A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination

    Vinu Ellampallil Venugopal

  12. cs.LG 2026-05-12 reviewed
    Benchmark with 40 epidemic datasets enables fair model comparisons

    EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting

    Madhurima Panja +4

  13. cs.LG 2026-05-12 reviewed
    Relational signals lift membership inference on tabular diffusion models

    FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models

    Abtin Mahyar +3

  14. cs.DB 2026-05-11 reviewed
    SHACL-DS validates named graphs faster than standard SHACL

    Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph

    Davan Chiem Dao +2

  15. cs.DB 2026-05-11 reviewed
    Single GPU kernel fuses IO and query steps for faster analytics

    Data Path Fusion in GPU for Analytical Query Processing

    Tsuyoshi Ozawa +1

  16. cs.DB 2026-05-11 reviewed
    Text2Cypher must reason across multiple graph databases

    Toward Multi-Database Query Reasoning for Text2Cypher

    Makbule Gulcin Ozsoy

  17. cs.AI 2026-05-11 reviewed
    Autonomous objects resolve over half of scientific data conflicts

    Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge

    Zeyd Boukhers +3

  18. cs.DB 2026-05-11 reviewed
    Cloud GPUs speed graph index construction by 9x at 6x lower cost

    ScaleGANN: Accelerate Large-Scale ANN Indexing by Cost-effective Cloud GPUs

    Lan Lu +7

  19. cs.IR 2026-05-11 reviewed
    Graph of codecs compresses data smaller and faster

    OpenZL: Using Graphs to Compress Smaller and Faster

    Yann Collet +12

  20. cs.CL 2026-05-10 reviewed
    Home activity benchmark shows AI question-answering gaps

    HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

    Shusaku Egami +7

  21. cs.DB 2026-05-09 reviewed
    Krone decomposes logs into entity-action-status units for modular anomaly detection

    Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

    Lei Ma +7

  22. cs.AI 2026-05-09 reviewed
    One-to-one matching boosts ontology alignment precision

    Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment

    Fabio Rovai

  23. cs.LG 2026-05-09 reviewed
    Urine biomarkers add reliable signal to chlamydia risk models

    Machine Learning-Based Pre-Test Risk Stratification for PCR-Confirmed Chlamydia Using Patient-Reported Data and Urine Biomarkers

    Mehrab Mahdian +3

  24. cs.DB 2026-05-09 reviewed
    Personalized privacy cuts infinite stream estimation error by 53.6%

    Personalized w-Event Privacy for Infinite Stream Estimation

    Leilei Du +6

  25. cs.AI 2026-05-09 reviewed
    Diagnosis consistency links to actual causality for AI explanations

    Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations

    Leopoldo Bertossi

  26. cs.DB 2026-05-09 reviewed
    LLMs fall short on natural language data prep tasks

    PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?

    Jingzhe Xu +3

  27. cs.DB 2026-05-09 reviewed
    Elastic scheduling meets stream deadlines at lowest cost

    Elastic Scheduling of Intermittent Query Processing in a Cluster Environment

    Saranya Chandrasekaran +1

  28. cs.DB 2026-05-09 reviewed
    Elastic batch scheduling meets deadlines at lower cost

    Elastic Scheduling of Intermittent Query Processing in a Cluster Environment

    Saranya Chandrasekaran +1

  29. cs.DB 2026-05-08 reviewed
    Heavy-light partitioning maintains arbitrary joins under updates

    Maintaining Queries under Updates Using Heavy-Light Partitioning of the Input Relations

    Mahmoud Abo-Khamis +4

  30. cs.DB 2026-05-07 reviewed
    SkipDisk hits 63% HNSW latency at 20% memory

    Low-Latency Out-of-Core ANN Search in High-Dimensional Space

    Ziwen Song +3

  31. cs.DB 2026-05-07 reviewed
    Query rewrite rules written once deploy across database engines

    An Extensible and Verifiable Language for Query Rewrite Rules

    Sicheng Pan +5

  32. cs.DB 2026-05-07 reviewed
    Every query reduces to Filter

    Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation

    Vicki Stover Hertzberg +2

  33. cond-mat.mtrl-sci 2026-05-06 reviewed
    Diversity selection builds versatile materials datasets

    Building informative materials datasets beyond targeted objectives

    Rafael Espinosa Casta\~neda +8

  34. cs.DB 2026-05-06 reviewed
    Caching cuts redundant CBO calls in cost-based query rewrite

    Efficient Cost-Based Rewrite in a Bottom-Up Optimizer

    Qi Cheng +6

  35. cs.LG 2026-05-06 reviewed
    Only solution concentration ranks consistently across electrospinning ML models

    Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

    Mehrab Mahdian +2

  36. cs.LG 2026-05-06 reviewed
    Concentration alone has zero rank variability in electrospinning models

    Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

    Mehrab Mahdian +2

  37. cs.DB 2026-05-06 reviewed
    Hierarchical agents clean messy time series without ground truth

    AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning

    Yuhan Shi +4

  38. cs.LG 2026-05-05 reviewed
    Fused soil dataset pretrains model to capture real processes

    LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

    Kuangdai Leng +3

  39. cs.LG 2026-05-05 reviewed
    Fused soil dataset pretrains representations matching real processes

    LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

    Kuangdai Leng +3

  40. cs.DB 2026-05-05 reviewed
    Database repairs match preferred extensions in SETAFs

    Inconsistent Databases and Argumentation Frameworks with Collective Attacks

    Yasir Mahmood +3

  41. cs.DB 2026-05-05 reviewed
    ConRAD introduces a framework that applies conformal risk control inside neural graph…

    ConRAD: Conformal Risk-Aware Neural Databases

    Sonia Horchidan +6

  42. cs.DB 2026-05-05 reviewed
    Sliced kd-trees speed up multi-dimensional queries in memory

    In-memory Multidimensional Indexing Using the skd-tree

    Achilleas Michalopoulos +2

  43. cs.AI 2026-05-05 reviewed
    AI agents average 45 percent on workspace tasks with 20k files

    Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

    Zirui Tang +21

  44. cs.AI 2026-05-05 reviewed
    AI agents top out at 60% on workspace file dependency tasks

    Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

    Zirui Tang +21

  45. cs.AI 2026-05-05 reviewed
    Agents reach 68.7% on workspace tasks with big file sets

    Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

    Zirui Tang +21

  46. cs.AI 2026-05-05 reviewed
    Agents hit 43% average on realistic workspace tasks

    Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

    Zirui Tang +21

  47. cs.DB 2026-05-05 reviewed
    3B model hits 85% Text-to-SQL accuracy using fine-grained rewards

    FINER-SQL: Boosting Small Language Models for Text-to-SQL

    Thanh Dat Hoang +6

  48. cs.SE 2026-05-05 reviewed
    AI models recover semantics from legacy database code

    Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI

    Christian Mancas +1

  49. cs.IR 2026-05-05 reviewed
    Unified PostgreSQL layer cuts RAG latency up to 92%

    Beyond Similarity Search: A Unified Data Layer for Production RAG Systems

    Venkata Krishna Prasanth Budigi +1

  50. cs.DB 2026-05-04 reviewed
    Static checker catches JDBC type errors before runtime

    Static Type Checking for Database Access Code

    Thomas James Kirz +3