pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

446 papers in cs.DB · page 7

  1. cs.IR 2026-02-26 reviewed
    Tree structure lifts document QA accuracy 6-61 percent

    MoDora: Tree-Based Semi-Structured Document Analysis System

    Bangrui Xu +10

  2. cs.CR 2026-02-26 reviewed
    DPSQL+ adds minimum frequency rule to private SQL queries

    DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

    Tomoya Matsumoto +3

  3. cs.DB 2026-02-25 reviewed
    Thresholds on modal operators decompose fuzzy contexts into independent subcontexts

    Decomposition of contexts into independent subcontexts based on thresholds

    Roberto G. Arag\'on +2

  4. cs.DB 2026-02-25 reviewed
    Fuzzy contexts split into independent subcontexts via lattice blocks

    Independent subcontexts and blocks of concept lattices. Definitions and relationships to decompose fuzzy contexts

    Roberto G. Arag\'on +2

  5. cs.DB 2026-02-25 reviewed
    Text-to-SQL benchmarks miss big-data cost penalties

    Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?

    Germ\'an T. Eizaguirre +2

  6. cs.DB 2026-02-21 reviewed
    SmartNIC offloads Parquet decoding to speed lake queries

    Should I Hide My Duck in the Lake?

    Jonas Dann +1

  7. cs.CY 2026-02-19 reviewed
    172 open datasets found in learning analytics papers

    Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE

    Valdemar \v{S}v\'abensk\'y +3

  8. cs.AI 2026-02-19 reviewed
    Sonar-TS searches time series with SQL then verifies with code

    Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

    Zhao Tan +7

  9. cs.DB 2026-02-17 reviewed
    Algorithm maps workflow nets to POWL models preserving behavior

    Hierarchical Decomposition of Separable Workflow-Nets

    Humam Kourani +2

  10. cs.DB 2026-02-12 reviewed
    Algorithm decides SPARQL pattern satisfiability on Façade-X

    Towards a theory of Fa\c{c}ade-X data access: satisfiability of SPARQL basic graph patterns

    Luigi Asprino +1

  11. cs.LG 2026-02-09 reviewed
    Variable local prompts cut conflicts in federated vision learning

    SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning

    Yicheng Di +4

  12. cs.DC 2026-02-09 reviewed
    Original papers outperform tutorials for system design mastery

    The Computer System Trail

    Sushant Kumar Gupta

  13. cs.DB 2026-02-07 reviewed
    KRONE turns flat logs into hierarchies for 10% better anomaly F1

    KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

    Lei Ma +7

  14. cs.AI 2026-02-02 reviewed
    LLMs need intrinsic strategies for data insight agency

    Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

    Wei Liu +4

  15. cs.DC 2026-01-29 reviewed
    Primary access hints speed Ethereum replay 25x

    Ira: Efficient Transaction Replay for Distributed Systems

    Adithya Bhat +2

  16. cs.DB 2026-01-28 reviewed
    Context packs raise LLM schema matching accuracy

    ConStruM: A Structure-Guided LLM Framework for Context-Aware Schema Matching

    Houming Chen +2

  17. cs.IR 2026-01-26 reviewed
    HyEm lets Euclidean indexes retrieve hyperbolic ontology embeddings

    HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing

    Ou Deng +3

  18. cs.AI 2026-01-25 reviewed
    Self-refinement and voting reach 86 percent SQL accuracy

    LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting

    Yu-Jie Yang +2

  19. cs.DB 2026-01-23 reviewed
    New predict operator lets SQL run LLM calls inside the database

    iPDB -- Optimizing Semantic SQL Queries

    Udesh Kumarasinghe +4

  20. cs.DB 2026-01-15 reviewed
    Algorithm turns math database schemes into relational apps

    Translating database mathematical schemes into relational database software applications with MatBase

    Christian Mancas +1

  21. cs.DB 2026-01-14 reviewed
    Natural language turns into SQL for cross-domain data exploration

    TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models

    Jun-Peng Zhu +12

  22. cs.DB 2026-01-09 reviewed
    Honeynet dataset logs 132k attacks across four Azure regions

    Descriptor: Multi-Regional Cloud Honeypot Dataset (MURHCAD)

    Enrique Feito-Casares +2

  23. cs.DB 2026-01-09 reviewed
    Online tool mines MLCS from sequences up to length 5000

    OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences

    Zhi Wang +5

  24. cs.DB 2026-01-08 reviewed
    Temporal attribution tracks dataflow dependencies lightly over time

    Toward Temporal Attribution Analytics in Dataflows

    Chrysanthi Kosyfaki +3

  25. cs.DB 2025-12-11 reviewed
    PANDAExpress drops polylog factor from query runtime

    PANDAExpress: a Simpler and Faster PANDA Algorithm

    Mahmoud Abo Khamis +2

  26. cs.DB 2025-12-07 reviewed
    OSM+ delivers billion-vertex global road graph for city experiments

    OSM+: Billion-Level OpenStreetMap Dataset for City-wide Experiments

    Guanjie Zheng +8

  27. cs.DB 2025-12-05 reviewed
    LLMs auto-swapped for cheaper models on repeated tasks

    Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

    Nils Strassenburg +2

  28. cs.DB 2025-12-02 reviewed
    PyTorch I/O tweaks yield 3x faster distributed GPU queries

    PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage

    Jigao Luo +3

  29. cs.DB 2025-12-01 reviewed
    LLM agents repair 8771 invalid MOF database entries

    LitMOF: An LLM Multi-Agent for Literature-Validated Metal-Organic Frameworks Database Correction and Expansion

    Honghui Kim +2

  30. cs.DB 2025-11-29 reviewed
    Algorithm converts math data models to entity-relationship diagrams

    MatBase algorithm for translating (E)MDM schemes into E-R data models

    Christian Mancas +1

  31. cs.DC 2025-11-27 reviewed
    Tokenized context speeds edge LLM responses by up to 14%

    DisCEdge: Distributed Context Management for Large Language Models at the Edge

    Mohammadreza Malekabbasi +2

  32. cs.AI 2025-11-27 reviewed
    Neural query models match path counting after relaxation

    Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation

    Yannick Brunink +3

  33. cs.DB 2025-11-26 reviewed
    Hybrid index cuts tail latency 98% under mixed workloads

    HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads

    Xinyi Zhang +3

  34. cs.DB 2025-11-19 reviewed
    Answer-set programs compute sufficient explanations for database queries

    Sufficient Explanations in Databases and their Connections to Database Repairs

    Leopoldo Bertossi +1

  35. cs.PL 2025-11-19 reviewed
    Compiler derives pruning rules for tree queries

    Bonsai: Compiling Queries to Pruned Tree Traversals

    Alexander J Root +6

  36. cs.DB 2025-11-18 reviewed
  37. cs.DB 2025-11-12 reviewed
    GNN-PE scales exact subgraph matching to distributed clusters

    Efficient Distributed Exact Subgraph Matching via GNN-PE: Load Balancing, Cache Optimization, and Query Plan Ranking

    Yu Wang +3

  38. cs.DB 2025-11-10 reviewed
    AISQL speeds semantic queries 2-70x via cost-aware planning and cascades

    Cortex AISQL: A Production SQL Engine for Unstructured Data

    Pawe{\l} Liskowski +13

  39. cs.DS 2025-10-31 reviewed
    Learned models break entropy barrier in static functions

    Learned Static Function Data Structures

    Stefan Hermann +3

  40. cs.DB 2025-10-29 reviewed
    DGAI separates vectors from graphs for 8x faster ANN updates

    DGAI: Decoupled On-Disk Graph-Based ANN Index for Efficient Updates and Queries

    Jiahao Lou +8

  41. cs.DB 2025-10-20 reviewed
    Orchestration reaches 89.8% Text-to-SQL accuracy on Spider

    DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

    Boyan Li +4

  42. cs.CL 2025-10-12 reviewed
    Database feedback and memory improve multi-turn SQL accuracy

    MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

    Taicheng Guo +6

  43. cs.DB 2025-10-11 reviewed
    Algorithms mine low-utility sequences faster with less memory

    Efficient Mining of Low-Utility Sequential Patterns

    Jian Zhu +3

  44. cs.DB 2025-09-18 reviewed
    Partial orders from event data yield sound process models

    Revealing Inherent Concurrency in Event Data: A Partial Order Approach to Process Discovery

    Humam Kourani +2

  45. cs.DB 2025-09-16 reviewed
    ScaleDoc filters 85% of LLM calls on large document sets

    ScaleDoc: Scaling LLM-based Predicates over Large Document Collections

    Hengrui Zhang +3

  46. cs.DB 2025-09-12 reviewed
    TurtleKV adapts key-value stores to shifting read and write demands

    Dynamic read & write optimization with TurtleKV

    Tony Astolfi +5

  47. cs.HC 2025-09-05 reviewed
    Users cannot tell acted idle animations from genuine ones

    Evaluating Idle Animation Believability: a User Perspective

    Eneko Atxa Landa +4

  48. cs.DB 2025-08-30 reviewed
    Optimizer picks best LLM sort paths at runtime

    Access Paths for Efficient Ordering with Large Language Models

    Fuheng Zhao +9

  49. cs.DB 2025-08-07 reviewed
    Versioned views and when-then rules make prompts adaptive in LLM pipelines

    Making Prompts First-Class Citizens for Adaptive LLM Pipelines

    Ugur Cetintemel +5

  50. cs.LG 2025-08-05 reviewed
    Merged T1D data yields 149 million glucose readings from 2510 subjects

    Presenting DiaData for Research on Type 1 Diabetes

    Beyza Cinar +1