ReSequel uses LLMs guided by metadata-derived templates and sampling-based verification to rewrite SQL queries, delivering up to 16x workload speedups over native DBMSs and 22x over prior LLM baselines across eight benchmarks and three systems.
hub Canonical reference
SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 6polarities
background 6representative citing papers
A new data structure samples any entry of the noise vector in constant time while exactly reproducing the binary tree Gaussian mechanism distribution, applied to DP CountSketches for improved range counting and join size estimation.
The paper shows that arbitrage-free information pricing is computationally hard in general, provides a branch-and-bound algorithm, and proves that for threshold utilities arbitrage-freeness reduces to Blackwell dominance, unifying prior query and model pricing results.
A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.
DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
U-HNSW is the first graph-based index for approximate nearest neighbor search under all Lp metrics (0 < p <= 2) simultaneously, using L1/L2 HNSW graphs plus early-termination verification to beat MLSH query times.
Exact LTI Koopman models for nonlinear control systems require affine linear dynamics under controllability and coordinate inclusion assumptions.
SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.
Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.
KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.
Learned static functions combine per-key ML-predicted prefix codes with classic static function storage to compress static key-value mappings beyond zero-order entropy limits.
TurtleKV uses a balanced TurtleTree on-disk structure and flexible memory tuning knobs to deliver strong performance across inserts, mixed workloads, point queries, and scans in YCSB tests, matching or beating SplinterDB, RocksDB, and WiredTiger.
First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.
EcoTable is the first NL-based data integration framework that builds a join-likelihood graph, uses two-stage schema linking and Steiner tree search to find paths, then generates transformations with LLMs, reporting >30% accuracy gain and 5x lower cost on four real-world datasets.
Spectral aggregate tests prune up to 51% of candidates in CSM but leave enumeration intermediates unchanged beyond initial bindings across tested workloads.
CEB and TIDE are two-layer append-only B+-tree indexes for intervals under the increasing ending time assumption that claim smaller size, faster insertions, and superior query performance over prior art.
Formulates pre-hoc fine-tuning prediction as stochastic estimation, proves lower bound on optimization variance decay rate, and introduces a three-regime predictability phase diagram.
SIDInspector provides a standardized adapter contract and mapping-level probes for Semantic-ID tokenizers, with empirical contrasts showing high aliasing in GRID-style exports and superior prefix alignment from deterministic controls on Musical items.
ANNS-AMP adapts distance-computation precision to vector-space regions via a lightweight cluster-level predictor and a bit-serial accelerator, delivering 163.76x/10.57x/2.06x average speedups and 1100x/39.41x/6.66x energy reductions versus CPU/GPU/custom baselines with <2.7% accuracy loss.
ANN search quality is better assessed by 1/Ratio@k than Recall@k because the former tracks downstream task utility more closely while allowing substantially lower computational cost.
LMs compare unit quantities via number-specific and unit-specific heuristics rather than unified scale conversion, evidenced by degraded accuracy near boundaries, linear surrogate predictions, and causal subspace interventions.
citing papers explorer
No citing papers match the current filters.