hub Canonical reference

SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen · 2024 · arXiv 0146.2024

Canonical reference. 100% of citing Pith papers cite this work as background.

44 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 44 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8

citation-polarity summary

background 8

representative citing papers

ReSequel: Robust LLM-assisted Query Rewriting and Optimization using Templatization and Sampling

cs.DB · 2026-06-18 · conditional · novelty 7.0

ReSequel uses LLMs guided by metadata-derived templates and sampling-based verification to rewrite SQL queries, delivering up to 16x workload speedups over native DBMSs and 22x over prior LLM baselines across eight benchmarks and three systems.

A Fast Gaussian Mechanism under Continual Observation, with Applications

cs.DS · 2026-06-10 · unverdicted · novelty 7.0

A new data structure samples any entry of the noise vector in constant time while exactly reproducing the binary tree Gaussian mechanism distribution, applied to DP CountSketches for improved range counting and join size estimation.

Arbitrage-free Data Pricing

cs.GT · 2026-06-09 · unverdicted · novelty 7.0

The paper shows that arbitrage-free information pricing is computationally hard in general, provides a branch-and-bound algorithm, and proves that for threshold utilities arbitrage-freeness reduces to Blackwell dominance, unifying prior query and model pricing results.

Generative Conversational Recommender System

cs.IR · 2026-05-21 · unverdicted · novelty 7.0

A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.

DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.

Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.

U-HNSW: An Efficient Graph-based Solution to ANNS Under Universal Lp Metrics

cs.DB · 2026-05-03 · unverdicted · novelty 7.0

U-HNSW is the first graph-based index for approximate nearest neighbor search under all Lp metrics (0 < p <= 2) simultaneously, using L1/L2 HNSW graphs plus early-termination verification to beat MLSH query times.

Limitations of LTI Koopman Modeling for Nonlinear Control Systems

math.OC · 2026-04-28 · unverdicted · novelty 7.0

Exact LTI Koopman models for nonlinear control systems require affine linear dynamics under controllability and coordinate inclusion assumptions.

SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

cs.DB · 2026-04-13 · conditional · novelty 7.0 · 2 refs

NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing

cs.DB · 2026-03-31 · unverdicted · novelty 7.0

GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.

Sublime: Sublinear Error & Space for Unbounded Skewed Streams

cs.DS · 2026-03-15 · unverdicted · novelty 7.0 · 2 refs

Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.

An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

cs.LG · 2026-03-04 · unverdicted · novelty 7.0

KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.

Learned Static Function Data Structures

cs.DS · 2025-10-31 · accept · novelty 7.0

Learned static functions combine per-key ML-predicted prefix codes with classic static function storage to compress static key-value mappings beyond zero-order entropy limits.

Dynamic read & write optimization with TurtleKV

cs.DB · 2025-09-12 · conditional · novelty 7.0

TurtleKV uses a balanced TurtleTree on-disk structure and flexible memory tuning knobs to deliver strong performance across inserts, mixed workloads, point queries, and scans in YCSB tests, matching or beating SplinterDB, RocksDB, and WiredTiger.

Diffusion and Flow Matching Models for Tabular Data: A Survey

cs.LG · 2025-02-24 · unverdicted · novelty 7.0

First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.

EcoTable: Cost-effective Table Integration in Data Lakes for Natural Language Queries

cs.DB · 2026-06-25 · unverdicted · novelty 6.0 · 2 refs

EcoTable is the first NL-based data integration framework that builds a join-likelihood graph, uses two-stage schema linking and Steiner tree search to find paths, then generates transformations with LLMs, reporting >30% accuracy gain and 5x lower cost on four real-world datasets.

Can Aggregate Invariants Accelerate Continuous Subgraph Matching? Limits, Laws, and a Dynamic Spectral Index

cs.AI · 2026-06-23 · unverdicted · novelty 6.0

Spectral aggregate tests prune up to 51% of candidates in CSM but leave enumeration intermediates unchanged beyond initial bindings across tested workloads.

Disk-Based Interval Indexes Under the Increasing Ending Time Assumption

cs.DB · 2026-06-22 · unverdicted · novelty 6.0

CEB and TIDE are two-layer append-only B+-tree indexes for intervals under the increasing ending time assumption that claim smaller size, faster insertions, and superior query performance over prior art.

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

Formulates pre-hoc fine-tuning prediction as stochastic estimation, proves lower bound on optimization variance decay rate, and introduces a three-regime predictability phase diagram.

SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID Tokenizers

cs.IR · 2026-06-09 · accept · novelty 6.0

SIDInspector provides a standardized adapter contract and mapping-level probes for Semantic-ID tokenizers, with empirical contrasts showing high aliasing in GRID-style exports and superior prefix alignment from deterministic controls on Musical items.

ANNS-AMP: Accelerating Approximate Nearest Neighbor Search via Adaptive Mixed-Precision Computing

cs.PF · 2026-06-05 · unverdicted · novelty 6.0

ANNS-AMP adapts distance-computation precision to vector-space regions via a lightweight cluster-level predictor and a bit-serial accelerator, delivering 163.76x/10.57x/2.06x average speedups and 1100x/39.41x/6.66x energy reductions versus CPU/GPU/custom baselines with <2.7% accuracy loss.

ANN Search: Recall What Matters

cs.IR · 2026-06-03 · conditional · novelty 6.0

ANN search quality is better assessed by 1/Ratio@k than Recall@k because the former tracks downstream task utility more closely while allowing substantially lower computational cost.

Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

LMs compare unit quantities via number-specific and unit-specific heuristics rather than unified scale conversion, evidenced by degraded accuracy near boundaries, linear surrogate predictions, and causal subspace interventions.

citing papers explorer

Showing 39 of 39 citing papers after filters.

ReSequel: Robust LLM-assisted Query Rewriting and Optimization using Templatization and Sampling cs.DB · 2026-06-18 · conditional · none · ref 108
ReSequel uses LLMs guided by metadata-derived templates and sampling-based verification to rewrite SQL queries, delivering up to 16x workload speedups over native DBMSs and 22x over prior LLM baselines across eight benchmarks and three systems.
A Fast Gaussian Mechanism under Continual Observation, with Applications cs.DS · 2026-06-10 · unverdicted · none · ref 42
A new data structure samples any entry of the noise vector in constant time while exactly reproducing the binary tree Gaussian mechanism distribution, applied to DP CountSketches for improved range counting and join size estimation.
Arbitrage-free Data Pricing cs.GT · 2026-06-09 · unverdicted · none · ref 7
The paper shows that arbitrage-free information pricing is computationally hard in general, provides a branch-and-bound algorithm, and proves that for threshold utilities arbitrage-freeness reduces to Blackwell dominance, unifying prior query and model pricing results.
Generative Conversational Recommender System cs.IR · 2026-05-21 · unverdicted · none · ref 40
A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.
DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG cs.AI · 2026-05-18 · unverdicted · none · ref 4
DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 24
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
U-HNSW: An Efficient Graph-based Solution to ANNS Under Universal Lp Metrics cs.DB · 2026-05-03 · unverdicted · none · ref 24
U-HNSW is the first graph-based index for approximate nearest neighbor search under all Lp metrics (0 < p <= 2) simultaneously, using L1/L2 HNSW graphs plus early-termination verification to beat MLSH query times.
Limitations of LTI Koopman Modeling for Nonlinear Control Systems math.OC · 2026-04-28 · unverdicted · none · ref 52
Exact LTI Koopman models for nonlinear control systems require affine linear dynamics under controllability and coordinate inclusion assumptions.
SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces cs.AI · 2026-04-16 · unverdicted · none · ref 31
SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions cs.DB · 2026-04-13 · conditional · none · ref 56 · 2 links
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing cs.DB · 2026-03-31 · unverdicted · none · ref 27
GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.
Sublime: Sublinear Error & Space for Unbounded Skewed Streams cs.DS · 2026-03-15 · unverdicted · none · ref 17 · 2 links
Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.
An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs cs.LG · 2026-03-04 · unverdicted · none · ref 16
KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.
EcoTable: Cost-effective Table Integration in Data Lakes for Natural Language Queries cs.DB · 2026-06-25 · unverdicted · none · ref 19 · 2 links
EcoTable is the first NL-based data integration framework that builds a join-likelihood graph, uses two-stage schema linking and Steiner tree search to find paths, then generates transformations with LLMs, reporting >30% accuracy gain and 5x lower cost on four real-world datasets.
Can Aggregate Invariants Accelerate Continuous Subgraph Matching? Limits, Laws, and a Dynamic Spectral Index cs.AI · 2026-06-23 · unverdicted · none · ref 17
Spectral aggregate tests prune up to 51% of candidates in CSM but leave enumeration intermediates unchanged beyond initial bindings across tested workloads.
Disk-Based Interval Indexes Under the Increasing Ending Time Assumption cs.DB · 2026-06-22 · unverdicted · none · ref 1
CEB and TIDE are two-layer append-only B+-tree indexes for intervals under the increasing ending time assumption that claim smaller size, faster insertions, and superior query performance over prior art.
A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction cs.LG · 2026-06-16 · unverdicted · none · ref 174
Formulates pre-hoc fine-tuning prediction as stochastic estimation, proves lower bound on optimization variance decay rate, and introduces a three-regime predictability phase diagram.
SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID Tokenizers cs.IR · 2026-06-09 · accept · none · ref 36
SIDInspector provides a standardized adapter contract and mapping-level probes for Semantic-ID tokenizers, with empirical contrasts showing high aliasing in GRID-style exports and superior prefix alignment from deterministic controls on Musical items.
ANNS-AMP: Accelerating Approximate Nearest Neighbor Search via Adaptive Mixed-Precision Computing cs.PF · 2026-06-05 · unverdicted · none · ref 4
ANNS-AMP adapts distance-computation precision to vector-space regions via a lightweight cluster-level predictor and a bit-serial accelerator, delivering 163.76x/10.57x/2.06x average speedups and 1100x/39.41x/6.66x energy reductions versus CPU/GPU/custom baselines with <2.7% accuracy loss.
ANN Search: Recall What Matters cs.IR · 2026-06-03 · conditional · none · ref 66
ANN search quality is better assessed by 1/Ratio@k than Recall@k because the former tracks downstream task utility more closely while allowing substantially lower computational cost.
Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics cs.CL · 2026-06-02 · unverdicted · none · ref 9
LMs compare unit quantities via number-specific and unit-specific heuristics rather than unified scale conversion, evidenced by degraded accuracy near boundaries, linear surrogate predictions, and causal subspace interventions.
SOLANET: Distributed Neighbor Graph Construction on GPU-Accelerated Systems cs.DC · 2026-05-26 · unverdicted · none · ref 20
SOLANET is a distributed GPU toolkit for neighbor graph construction that reports 11X speedup on 512 APUs for 1B points and 6.9X for 2B points.
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation cs.IR · 2026-05-11 · unverdicted · none · ref 65
LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
Generalized Category Discovery in Federated Graph Learning cs.LG · 2026-05-05 · unverdicted · none · ref 25
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization cs.DB · 2026-04-09 · unverdicted · none · ref 53 · 2 links
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer cs.IR · 2026-02-27 · unverdicted · none · ref 34
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.
Towards Federated Long-Tailed Graph Learning: An Energy-Guided Dual Decoupling Approach cs.AI · 2026-06-23 · unverdicted · none · ref 11
FedEPD decouples topological purification from semantic recalibration using energy-guided pruning and prototype injection to improve minority performance in federated long-tailed graph learning.
TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins cs.LG · 2026-06-16 · unverdicted · none · ref 107
TUNEAHEAD predicts fine-tuning performance from meta-features and short probes, reporting RMSE 1.47 and 95.1% of predictions within 3 points on 370 held-out runs of Qwen2.5-7B.
HRNN: A Hybrid Graph Index for Approximate Reverse k-Nearest Neighbor Search on High-Dimensional Vectors cs.DB · 2026-06-02 · unverdicted · none · ref 42
HRNN combines a navigation graph, ranked KNN graph, and reverse-neighbor lists with proxy-based candidate generation and materialized kNN-radii to achieve up to 10x higher throughput for approximate RkNN on datasets up to 10M vectors.
SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching cs.LG · 2026-05-29 · unverdicted · none · ref 5
SemStruct models tables as heterogeneous graphs with GNNs on frozen PLM embeddings to incorporate row co-occurrences for schema matching and reports SOTA results on Valentine and SOTAB-SM benchmarks.
Co-Designing Graph-based Approximate Nearest Neighbor Search at Billion Scale for Processing-in-Memory cs.AR · 2026-05-25 · unverdicted · none · ref 47
Co-design of 14.5x compacted index, asynchronous scheduler, and multiplication-free kernel for PIM-based graph ANNS delivers up to 20x CPU and 17.1x GPU throughput on billion-scale benchmarks.
TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning cs.IR · 2026-05-12 · unverdicted · none · ref 18
TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 8
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
LogCopilot: Automating Log Aggregation Analysis through Large Language Models cs.SE · 2026-06-13 · unverdicted · none · ref 43
LogCopilot is an LLM framework that builds a hierarchical knowledge base from logs and generates/executes LogQL queries from natural language instructions, reporting 76.8% average accuracy across four datasets.
Bridging Short Videos and Live Streams: Reasoning-Guided Multimodal LLMs for Cross-Domain Representation Learning cs.IR · 2026-06-03 · unverdicted · none · ref 18
RGCD-Rep distills cross-domain reasoning from a frozen MLLM teacher and learns decomposed transferable item representations via two-stage training, yielding gains in offline experiments and production A/B tests on a live streaming platform.
Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection cs.LG · 2026-05-25 · unverdicted · none · ref 28
CoAD unifies outlier exposure classification and masked autoencoder reconstruction in a cooperative loop to detect subtle and prolonged time series anomalies.
A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification cs.DB · 2026-05-22 · unverdicted · none · ref 44
MountDB extends RocksDB with Memtable-level model reuse and a block-aware learned disk index, reporting up to 1.5X write and 2.1X read throughput over state-of-the-art on large-scale workloads.
To GPU or Not to GPU: Vector Search in Relational Engines cs.DB · 2026-05-15 · conditional · none · ref 36
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.
PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection cs.LG · 2026-06-18 · unverdicted · none · ref 56
PaAno+ extends the original PaAno with multiscale feature extraction, cross-variable fusion attention, and a temporal patch sorting pretext task to report state-of-the-art results on the TSB-AD benchmark for univariate and multivariate anomaly detection.

SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer