pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 6

  1. cs.DB 2026-04-03 reviewed
    Native graph index cuts multi-vector search latency up to 14 times

    Unified and Efficient Approach for Multi-Vector Similarity Search

    Binhan Yang +6

  2. cs.DB 2026-04-03 reviewed
    DCO shortcuts unstable in vector search benchmarks

    Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits

    Zhuanglin Zheng +5

  3. cs.DB 2026-04-03 reviewed
    Clustering gives LLMs full dataset context for semantic tasks

    Semantic Data Processing with Holistic Data Understanding

    Youran Sun +4

  4. cs.DB 2026-04-02 reviewed
    ReCAP makes relational DBs up to 400000x faster on constrained path queries

    Efficient Path Query Processing in Relational Database Systems

    Diego Rivera Correa +1

  5. cs.DB 2026-04-02 reviewed
    Hybrid query system cuts cost while raising accuracy on mixed structured-text tables

    OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data

    Nima Shahbazi +3

  6. cs.DB 2026-04-02 reviewed
    Bucket collector speeds large-k ANN search up to 3.8x

    BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector

    Ziqi Yin +4

  7. cs.CL 2026-04-02 reviewed
    Hybrid memory beats state-of-the-art LLM agent methods

    Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

    Yanchen Wu +9

  8. cs.DB 2026-04-02 reviewed
    LLM tool adds database functions 34 percent more accurately

    Automating Database-Native Function Code Synthesis with LLMs

    Wei Zhou +6

  9. cs.DB 2026-03-31 reviewed
    Unifying 8,000 atomistic simulations into one queryable graph

    Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data

    Abril Azocar Guzman +3

  10. cs.DB 2026-03-31 reviewed
    GPU bucketing delivers 240x faster hybrid searches

    GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing

    Xinkui Zhao +5

  11. cs.DB 2026-03-30 reviewed
    Query focus cuts RAG response time by 40 percent

    QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference

    Jianxin Yan +8

  12. cs.DB 2026-03-29 reviewed
    Platform structures electrospinning data including failures for predictive use

    Electrospinning-Data.org: A FAIR, Structured Knowledge Resource for Nanofiber Fabrication

    Mehrab Mahdian +2

  13. cs.DB 2026-03-29 reviewed
    Enzyme cuts daily pipeline compute by billions of CPU seconds

    Enzyme: Incremental View Maintenance for Data Engineering

    Ritwik Yadav +18

  14. cs.DB 2026-03-29 reviewed
    Streaming context cuts LLM first-token latency by up to 11x

    Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

    Rajveer Bachkaniwala +4

  15. cs.DB 2026-03-29 reviewed
    Streaming overlaps cut LLM first response time by 11x

    Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

    Rajveer Bachkaniwala +4

  16. cs.DB 2026-03-27 reviewed
    Ontology encodes EU Data Act rules for SPARQL compliance checks

    DAOnt: A Formal Ontology for EU Data Act Compliance

    Sheyla Leyva-S\'anchez +4

  17. cs.DB 2026-03-26 reviewed
    Refutational normalization speeds up complete JSON schema checks

    JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

    Mohamed-Amine Baazizi +6

  18. cs.DB 2026-03-24 reviewed
    Survey maps NLIDB methods for spatial-temporal databases

    Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions

    Samya Acharja +1

  19. cs.DB 2026-03-24 reviewed
    Value-based quadtree cuts spatial query time by 90%

    Spatial Analysis on Value-Based Quadtrees of Rasterized Vector Data

    Diana Baumann +3

  20. cs.DB 2026-03-24 reviewed
    Value-based quadtree cuts point-in-polygon latency by 90%

    Spatial Analysis on Value-Based Quadtrees of Rasterized Vector Data

    Diana Baumann +3

  21. cs.DB 2026-03-23 reviewed
    Embedding random tests inside the DBMS finds 23 bugs with higher true positives

    DIRT: Database-Integrated Random Testing

    Alperen Keles +3

  22. cs.RO 2026-03-18 reviewed
    Hybrid decoding speeds robot VLA models up to 2.45x

    HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

    Zihao Zheng +10

  23. cs.DB 2026-03-17 reviewed
    Assumptions enable clear dynamic relationships in object event logs

    Detecting Dynamic Relationships in Object-Centric Event Logs

    Alessandro Gianola +5

  24. cs.DB 2026-03-17 reviewed
    Itemset mining groups cities by shared land use patterns

    Exploring Urban Land Use Patterns by Pattern Mining and Unsupervised Learning

    Zdena Dobesova +2

  25. cs.DB 2026-03-16 reviewed
    Proxy models cut AI query costs by over 100x

    100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

    Yeounoh Chung +11

  26. cs.CL 2026-03-16 reviewed
    Fixes for one agent model improve 13 others across seven families

    Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI

    Jinhu Qi +6

  27. cs.DB 2026-03-15 reviewed
    Catalog system converts natural language to PromQL queries in 1.1 seconds

    From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

    Twinkll Sisodia

  28. cs.DS 2026-03-15 reviewed
    Dynamic counters deliver sublinear error for growing stream sketches

    Sublime: Sublinear Error & Space for Unbounded Skewed Streams

    Navid Eslami +3

  29. cs.DB 2026-03-13 reviewed
    Jaguar evaluates queries in N to the submodular width plus epsilon

    Jaguar: A Primal Algorithm for Conjunctive Query Evaluation in Submodular-Width Time

    Mahmoud Abo Khamis +1

  30. cs.DB 2026-03-13 reviewed
    DSL lets LLMs produce consistent sensor triggers

    A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection

    Philipp Reis +5

  31. cs.DB 2026-03-13 reviewed
    One graph index covers every range for filtered ANN search

    RNSG: A Range-Aware Graph Index for Efficient Range-Filtered Approximate Nearest Neighbor Search

    Zhiqiu Zou +5

  32. cs.DB 2026-03-13 reviewed
    Optimal sampling for matrix, star, and chain join-project queries

    Towards Output-Optimal Uniform Sampling and Approximate Counting for Join-Project Queries

    Xiao Hu +1

  33. cs.DB 2026-03-12 reviewed
    Toolkit auto-generates standard APIs for materials datasets

    optimade-maker: Automated generation of interoperable materials APIs from static datasets

    Kristjan Eimre +7

  34. cs.DB 2026-03-12 reviewed
    Toolkit turns raw materials data into standard APIs

    optimade-maker: Automated generation of interoperable materials APIs from static datasets

    Kristjan Eimre +7

  35. cs.DB 2026-03-11 reviewed
    Self-evolved cycles reach 83.1% MongoDB query accuracy

    Draft-Refine-Optimize: Self-Evolved Learning for Natural Language to MongoDB Query Generation

    Mingwei Ye +5

  36. cs.DB 2026-03-10 reviewed
    Real-time terminology queries improve LLM metadata accuracy

    Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

    Josef Hardi +5

  37. cs.DB 2026-03-10 reviewed
    GeoBenchr benchmarks spatiotemporal DBs on real workloads

    GeoBenchr: An Application-Centric Benchmarking Suite for Spatiotemporal Database Platforms

    Tim C. Rese +4

  38. cs.DB 2026-03-10 reviewed
    LLM index tuning outperforms DTA in some cases

    Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor

    Xiaoying Wang +3

  39. cs.DC 2026-03-06 reviewed
    OMA retains Kubernetes crash evidence past the evidence horizon

    Operational Memory Architecture for Kubernetes:Preserving Causal Context Across the Evidence Horizon

    Shamsher Khan

  40. cs.CL 2026-03-05 reviewed
    DEBISS corpus supplies annotated spoken debates for NLP tasks

    DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

    Klaywert Danillo Ferreira de Souza +3

  41. cs.LG 2026-03-04 reviewed
    LLM templates let GNNs run 28x faster on huge graphs with 98% less memory

    An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

    Waleed Afandi +3

  42. cs.DB 2026-03-04 reviewed
    Mined constraints create realistic SQL query test cases

    SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

    Andrew Tremante +5

  43. cs.LG 2026-03-04 reviewed
    Synthetic pre-training produces relational in-context learner

    Relational In-Context Learning via Synthetic Pre-training with Structural Prior

    Yanbo Wang +3

  44. cs.DB 2026-03-03 reviewed
    Graphs gain normal forms that cover edge dependencies

    Graph-Native Normalization

    Johannes Schrott +2

  45. cs.DB 2026-03-03 reviewed
    Taxonomy groups LLM database operators into five categories

    Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis

    Yunxiang Su +7

  46. cs.DB 2026-02-27 reviewed
    Python functions generate ocean RDF without semantic web tools

    A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project

    Erik Johan Nystad +1

  47. cs.AI 2026-02-27 reviewed
    Item-level data required for rigorous AI evaluation

    AI Evaluation Should Require Standardized Item-Level Data Releases

    Han Jiang +8

  48. cs.AI 2026-02-27 reviewed
    Item-level data releases required for valid AI benchmarks

    AI Evaluation Should Require Standardized Item-Level Data Releases

    Han Jiang +8

  49. cs.DB 2026-02-27 reviewed
    LLM agents generate large table datasets for recognition

    TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous

    Ruilin Zhang +1

  50. cs.DB 2026-02-26 reviewed
    On-disk vector search matches in-memory speed

    AlayaLaser: Efficient Index Layout and Search Strategy for Large-scale High-dimensional Vector Similarity Search

    Weijian Chen +6