AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.
CoRRabs/2405.04674(2024)
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9representative citing papers
The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.
SEMA-SQL automates natural language to efficient hybrid queries combining relational algebra with LLM semantic operations via a new Hybrid Relational Algebra abstraction.
HoldUp uses LLM-guided clustering to provide holistic dataset context for semantic operators, yielding up to 33% higher classification accuracy and 30% higher scoring accuracy than row-by-row LLM processing across 15 datasets.
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
MoDora introduces local-alignment aggregation, a Component-Correlation Tree, and question-type-aware retrieval to improve accuracy on semi-structured document QA by 5.97-61.07% over baselines.
Snowflake's Cortex AISQL adds native semantic operations to SQL via AI-aware optimization, adaptive model cascades, and semantic join rewriting, delivering 2-70x speedups in production workloads.
A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.
LLMs show performance degradation in multi-instance processing driven more strongly by instance count than by context length.
citing papers explorer
-
AnnoRetrieve: Efficient Structured Retrieval for Unstructured Document Analysis
AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.
-
Large Language Model-Enhanced Relational Operators: Taxonomy, Benchmark, and Analysis
The authors define a taxonomy for LLM-enhanced relational operators categorized into Select, Match, Impute, Cluster and Order, and release LROBench to evaluate single and multi-operator queries on semantic database processing.
-
SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models
SEMA-SQL automates natural language to efficient hybrid queries combining relational algebra with LLM semantic operations via a new Hybrid Relational Algebra abstraction.
-
Semantic Data Processing with Holistic Data Understanding
HoldUp uses LLM-guided clustering to provide holistic dataset context for semantic operators, yielding up to 33% higher classification accuracy and 30% higher scoring accuracy than row-by-row LLM processing across 15 datasets.
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
MoDora: Tree-Based Semi-Structured Document Analysis System
MoDora introduces local-alignment aggregation, a Component-Correlation Tree, and question-type-aware retrieval to improve accuracy on semi-structured document QA by 5.97-61.07% over baselines.
-
Cortex AISQL: A Production SQL Engine for Unstructured Data
Snowflake's Cortex AISQL adds native semantic operations to SQL via AI-aware optimization, adaptive model cascades, and semantic join rewriting, delivering 2-70x speedups in production workloads.
-
In-depth Analysis of Graph-based RAG in a Unified Framework
A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.
-
Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length
LLMs show performance degradation in multi-instance processing driven more strongly by instance count than by context length.