Optimizing LLM Queries in Relational Data Analytics Workloads

Amog Kamsetty; Asim Biswal; Audrey Cheng; Ion Stoica; Joseph E. Gonzalez; Liana Patel; Luis Gaspar Schroeder; Matei Zaharia; Shiyi Cao; Shu Liu

arxiv: 2403.05821 · v2 · pith:PD6LJPDAnew · submitted 2024-03-09 · 💻 cs.LG · cs.DB

Optimizing LLM Queries in Relational Data Analytics Workloads

Shu Liu , Asim Biswal , Amog Kamsetty , Audrey Cheng , Luis Gaspar Schroeder , Liana Patel , Shiyi Cao , Xiangxi Mo

show 3 more authors

Ion Stoica Joseph E. Gonzalez Matei Zaharia

This is my paper

classification 💻 cs.LG cs.DB

keywords dataanalyticsmodelscostlanguagelargellmsopenai

0 comments

read the original abstract

Batch data analytics is a growing application for Large Language Models (LLMs). LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. However, LLM inference is highly costly and slow: for example, an NVIDIA L4 GPU running Llama3-8B can only process 6 KB of text per second, taking about a day to handle 15 GB of data; processing a similar amount of data costs around $10K on OpenAI's GPT-4o. In this paper, we propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads. Our key contribution is developing efficient algorithms for reordering the rows and the fields within each row of an input table to maximize key-value (KV) cache reuse when performing LLM serving. As such, our approach can be easily applied to existing analytics systems and serving platforms. Our evaluation shows that our solution can yield up to 3.4x improvement in job completion time on a benchmark of diverse LLM-based queries using Llama 3 models. Our solution also achieves a 32% cost savings under OpenAI and Anthropic pricing models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models
cs.DB 2026-04 unverdicted novelty 6.0

SEMA-SQL formalizes Hybrid Relational Algebra to let users pose natural language questions answered by automatically generated queries that combine relational operators with LLM semantic reasoning, cutting LLM calls b...
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models
cs.DB 2026-04 unverdicted novelty 6.0

SEMA-SQL automates natural language to efficient hybrid queries combining relational algebra with LLM semantic operations via a new Hybrid Relational Algebra abstraction.
SGLang: Efficient Execution of Structured Language Model Programs
cs.AI 2023-12 conditional novelty 6.0

SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.