hub Canonical reference

SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

· 2024 · arXiv 0146.2024

Canonical reference. 100% of citing Pith papers cite this work as background.

21 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8

citation-polarity summary

background 8

representative citing papers

Generative Conversational Recommender System

cs.IR · 2026-05-21 · unverdicted · novelty 7.0

A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.

DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.

Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.

SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

cs.DB · 2026-04-13 · conditional · novelty 7.0 · 2 refs

NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing

cs.DB · 2026-03-31 · unverdicted · novelty 7.0

GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.

Sublime: Sublinear Error & Space for Unbounded Skewed Streams

cs.DS · 2026-03-15 · unverdicted · novelty 7.0 · 2 refs

Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.

An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

cs.LG · 2026-03-04 · unverdicted · novelty 7.0

KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.

Learned Static Function Data Structures

cs.DS · 2025-10-31 · accept · novelty 7.0

Learned static functions combine per-key ML-predicted prefix codes with classic static function storage to compress static key-value mappings beyond zero-order entropy limits.

Dynamic read & write optimization with TurtleKV

cs.DB · 2025-09-12 · conditional · novelty 7.0

TurtleKV uses a balanced TurtleTree on-disk structure and flexible memory tuning knobs to deliver strong performance across inserts, mixed workloads, point queries, and scans in YCSB tests, matching or beating SplinterDB, RocksDB, and WiredTiger.

Diffusion and Flow Matching Models for Tabular Data: A Survey

cs.LG · 2025-02-24 · unverdicted · novelty 7.0

First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

cs.IR · 2026-05-11 · unverdicted · novelty 6.0

LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.

Generalized Category Discovery in Federated Graph Learning

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

cs.DB · 2026-04-09 · unverdicted · novelty 6.0 · 2 refs

GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.

Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer

cs.IR · 2026-02-27 · unverdicted · novelty 6.0

SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.

BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation

cs.HC · 2025-07-22 · unverdicted · novelty 6.0

BDIViz is a visual analytics system that uses an ensemble of matching algorithms plus LLM validation and interactive heatmaps to improve accuracy and reduce time in biomedical schema matching.

TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning

cs.IR · 2026-05-12 · unverdicted · novelty 5.0

TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.

TabEmb: Joint Semantic-Structure Embedding for Table Annotation

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.

Efficient Model Repository for Entity Resolution: Construction, Search, and Integration

cs.DB · 2024-12-12 · unverdicted · novelty 5.0

MoRER builds an ER model repository via feature distribution clustering of tasks, achieving competitive results with limited labels versus active learning, transfer learning, and self-supervised methods on three multi-source datasets.

A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification

cs.DB · 2026-05-22 · unverdicted · novelty 4.0

MountDB extends RocksDB with Memtable-level model reuse and a block-aware learned disk index, reporting up to 1.5X write and 2.1X read throughput over state-of-the-art on large-scale workloads.

To GPU or Not to GPU: Vector Search in Relational Engines

cs.DB · 2026-05-15 · conditional · novelty 4.0

Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.

citing papers explorer

Showing 21 of 21 citing papers.

Generative Conversational Recommender System cs.IR · 2026-05-21 · unverdicted · none · ref 40
A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.
DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG cs.AI · 2026-05-18 · unverdicted · none · ref 4
DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 24
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces cs.AI · 2026-04-16 · unverdicted · none · ref 31
SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions cs.DB · 2026-04-13 · conditional · none · ref 56 · 2 links
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing cs.DB · 2026-03-31 · unverdicted · none · ref 27
GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.
Sublime: Sublinear Error & Space for Unbounded Skewed Streams cs.DS · 2026-03-15 · unverdicted · none · ref 17 · 2 links
Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.
An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs cs.LG · 2026-03-04 · unverdicted · none · ref 16
KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.
Learned Static Function Data Structures cs.DS · 2025-10-31 · accept · none · ref 84
Learned static functions combine per-key ML-predicted prefix codes with classic static function storage to compress static key-value mappings beyond zero-order entropy limits.
Dynamic read & write optimization with TurtleKV cs.DB · 2025-09-12 · conditional · none · ref 42
TurtleKV uses a balanced TurtleTree on-disk structure and flexible memory tuning knobs to deliver strong performance across inserts, mixed workloads, point queries, and scans in YCSB tests, matching or beating SplinterDB, RocksDB, and WiredTiger.
Diffusion and Flow Matching Models for Tabular Data: A Survey cs.LG · 2025-02-24 · unverdicted · none · ref 123
First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation cs.IR · 2026-05-11 · unverdicted · none · ref 65
LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
Generalized Category Discovery in Federated Graph Learning cs.LG · 2026-05-05 · unverdicted · none · ref 25
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization cs.DB · 2026-04-09 · unverdicted · none · ref 53 · 2 links
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer cs.IR · 2026-02-27 · unverdicted · none · ref 34
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.
BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation cs.HC · 2025-07-22 · unverdicted · none · ref 15
BDIViz is a visual analytics system that uses an ensemble of matching algorithms plus LLM validation and interactive heatmaps to improve accuracy and reduce time in biomedical schema matching.
TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning cs.IR · 2026-05-12 · unverdicted · none · ref 18
TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 8
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Efficient Model Repository for Entity Resolution: Construction, Search, and Integration cs.DB · 2024-12-12 · unverdicted · none · ref 49
MoRER builds an ER model repository via feature distribution clustering of tasks, achieving competitive results with limited labels versus active learning, transfer learning, and self-supervised methods on three multi-source datasets.
A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification cs.DB · 2026-05-22 · unverdicted · none · ref 44
MountDB extends RocksDB with Memtable-level model reuse and a block-aware learned disk index, reporting up to 1.5X write and 2.1X read throughput over state-of-the-art on large-scale workloads.
To GPU or Not to GPU: Vector Search in Relational Engines cs.DB · 2026-05-15 · conditional · none · ref 36
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.

SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer