A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.
hub Canonical reference
SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 6polarities
background 6representative citing papers
DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.
Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.
KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.
Learned static functions combine per-key ML-predicted prefix codes with classic static function storage to compress static key-value mappings beyond zero-order entropy limits.
TurtleKV uses a balanced TurtleTree on-disk structure and flexible memory tuning knobs to deliver strong performance across inserts, mixed workloads, point queries, and scans in YCSB tests, matching or beating SplinterDB, RocksDB, and WiredTiger.
First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.
LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.
BDIViz is a visual analytics system that uses an ensemble of matching algorithms plus LLM validation and interactive heatmaps to improve accuracy and reduce time in biomedical schema matching.
TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
MoRER builds an ER model repository via feature distribution clustering of tasks, achieving competitive results with limited labels versus active learning, transfer learning, and self-supervised methods on three multi-source datasets.
MountDB extends RocksDB with Memtable-level model reuse and a block-aware learned disk index, reporting up to 1.5X write and 2.1X read throughput over state-of-the-art on large-scale workloads.
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.
citing papers explorer
-
Generative Conversational Recommender System
A single autoregressive model for conversational recommendation that uses semantic item IDs, predicts response intent and target first, then generates the response, reporting up to 29% Recall@1 gains.
-
DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG
DARE-EEG is a self-supervised EEG foundation model that enforces mask-invariance via contrastive mask alignment and momentum anchor alignment, plus conv-linear-probing for heterogeneous setups, achieving SOTA accuracy and cross-dataset portability.
-
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning
A rule-based strikingness measure is added to TKGR metrics to weight rare events higher, revealing that models weaken on striking events and ensemble gains come mostly from trivial fits.
-
SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces
SynHAT uses a novel two-stage spatio-temporal diffusion framework with Latent Spatio-Temporal U-Net to synthesize realistic human activity traces, outperforming baselines by 52% on spatial and 33% on temporal metrics across four cities.
-
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
-
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing
GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.
-
Sublime: Sublinear Error & Space for Unbounded Skewed Streams
Sublime generalizes Count-Min and Count Sketch with dynamically elongating counters and expanding counter arrays to deliver sublinear error growth and lower memory use on skewed unbounded streams.
-
An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs
KG-WISE decomposes GNN models and uses LLM-generated query templates for partial loading of relevant components, achieving up to 28x faster inference and 98% lower memory on KGs with up to 42 million nodes while preserving accuracy.
-
Learned Static Function Data Structures
Learned static functions combine per-key ML-predicted prefix codes with classic static function storage to compress static key-value mappings beyond zero-order entropy limits.
-
Dynamic read & write optimization with TurtleKV
TurtleKV uses a balanced TurtleTree on-disk structure and flexible memory tuning knobs to deliver strong performance across inserts, mixed workloads, point queries, and scans in YCSB tests, matching or beating SplinterDB, RocksDB, and WiredTiger.
-
Diffusion and Flow Matching Models for Tabular Data: A Survey
First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.
-
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation
LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
-
Generalized Category Discovery in Federated Graph Learning
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
-
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
-
Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.
-
BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation
BDIViz is a visual analytics system that uses an ensemble of matching algorithms plus LLM validation and interactive heatmaps to improve accuracy and reduce time in biomedical schema matching.
-
TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning
TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.
-
TabEmb: Joint Semantic-Structure Embedding for Table Annotation
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
-
Efficient Model Repository for Entity Resolution: Construction, Search, and Integration
MoRER builds an ER model repository via feature distribution clustering of tasks, achieving competitive results with limited labels versus active learning, transfer learning, and self-supervised methods on three multi-source datasets.
-
A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification
MountDB extends RocksDB with Memtable-level model reuse and a block-aware learned disk index, reporting up to 1.5X write and 2.1X read throughput over state-of-the-art on large-scale workloads.
-
To GPU or Not to GPU: Vector Search in Relational Engines
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.