Deep Sets

Alexander Smola; Barnabas Poczos; Manzil Zaheer; Ruslan Salakhutdinov; Satwik Kottur; Siamak Ravanbakhsh

arxiv: 1703.06114 · v3 · pith:TVNBXLHNnew · submitted 2017-03-10 · 💻 cs.LG · stat.ML

Deep Sets

Manzil Zaheer , Satwik Kottur , Siamak Ravanbakhsh , Barnabas Poczos , Ruslan Salakhutdinov , Alexander Smola This is my paper

classification 💻 cs.LG stat.ML

keywords functionssetscitedeepinvariantpermutationdefineddetection

0 comments

read the original abstract

We study the problem of designing models for machine learning tasks defined on \emph{sets}. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics \cite{poczos13aistats}, to anomaly detection in piezometer data of embankment dams \cite{Jung15Exploration}, to cosmology \cite{Ntampaka16Dynamical,Ravanbakhsh16ICML1}. Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We also derive the necessary and sufficient conditions for permutation equivariance in deep models. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging
hep-ex 2026-05 unverdicted novelty 7.0

PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.
Higgs Physics with the XFEL Compton $\boldsymbol{\gamma\gamma}$ Collider Concept at $\boldsymbol{\sqrt{s}=125}$ GeV
hep-ph 2026-05 unverdicted novelty 7.0

An XFEL Compton gamma-gamma collider at 125 GeV with a set transformer deep learning classifier on particle-flow point clouds can achieve high-precision Higgs measurements across hadronic, semi-leptonic, and leptonic ...
Modeling isotropic polyconvex hyperelasticity by neural networks -- sufficient and necessary criteria for compressible and incompressible materials
cs.CE 2026-03 conditional novelty 7.0

CSSV-NNs and inc-CSSV-NNs provide universal approximation of frame-indifferent isotropic polyconvex hyperelastic energies, showing Ball's criterion is sufficient but not necessary.
Fermi Sets: Universal and interpretable neural architectures for fermions
cond-mat.str-el 2026-01 unverdicted novelty 7.0

Fermi Sets achieve universal approximation of fermionic wavefunctions using K antisymmetric bases times symmetric neural networks, where K equals 1 in 1D, 2 in 2D, and grows linearly with particle number in higher dimensions.
Monotone and Separable Set Functions: Characterizations and Neural Models
cs.LG 2025-10 unverdicted novelty 7.0

Characterizes monotone separating set functions with dimension bounds, proves non-existence on infinite domains, and introduces a Holder-stable neural model with a weak version of the property for universal monotone a...
Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction
astro-ph.CO 2026-05 unverdicted novelty 6.0

Velocityformer achieves 35% higher velocity correlation than linear theory by matching graph transformer inductive bias to the line-of-sight broken symmetry and conditioning on long-wavelength physics, while training ...
It Just Takes Two: Scaling Amortized Inference to Large Sets
cs.LG 2026-05 unverdicted novelty 6.0

A mean-pool deep set trained on sets of size at most two produces an encoder that generalizes to arbitrary sizes, decoupling representation learning from posterior modeling and making training cost independent of depl...
Tokenised Flow Matching for Hierarchical Simulation Based Inference
cs.LG 2026-04 unverdicted novelty 6.0

TFMPE combines likelihood factorisation with tokenised flow matching to enable efficient hierarchical SBI from single-site simulations, producing well-calibrated posteriors at lower computational cost on a new benchma...
Temporally Extended Mixture-of-Experts Models
cs.LG 2026-04 unverdicted novelty 6.0

Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.
Diffusion-Based Point-Cloud Generation of Heavy-Ion Events
hep-ph 2026-04 unverdicted novelty 6.0

A two-stage score-driven diffusion model with Point-Edge Transformer generates realistic high-multiplicity heavy-ion events as point clouds.
Signal Decomposition Reveals Structure in Insider Threat Detection under Sparse Temporal Data
cs.CR 2026-02 unverdicted novelty 6.0

Separating presence from magnitude in sparse temporal audit data lets a dual-channel autoencoder focus learning on anomalous activity for insider threat detection.
Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications
eess.SP 2026-01 unverdicted novelty 6.0

A hierarchical GNN-RL framework for joint beamforming and trajectory optimization in multi-UAV systems outperforms baselines in sum rate, convergence, and generalization.
Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power
cs.LG 2025-12 unverdicted novelty 6.0

Enforcing equivariance reduces expressive power in 2-layer ReLU networks but enlarging the model compensates with proven size bounds and yields lower hypothesis space dimensionality for better generalization.
Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space
stat.ML 2025-10 unverdicted novelty 6.0

Proposes Latent Interacting Particle Systems with an efficient parameterization of twist potentials to enable approximate posterior inference for coupled continuous-time hidden Markov models via twisted sequential Mon...
Thermodynamically consistent machine learning model for excess Gibbs energy
cs.LG 2025-09 unverdicted novelty 6.0

HANNA is a thermodynamically consistent ML model for predicting excess Gibbs energy from molecular structures, trained on various binary mixture data and extended to multi-component mixtures using geometric projection.
First evidence for mixing-induced $CP$ violation in B$^0_\mathrm{s}$ $\to$ J/$\psi\,\phi$(1020) decays in pp collisions at $\sqrt{s} = $ 13 TeV
hep-ex 2024-12 unverdicted novelty 6.0

First evidence for non-zero φ_s in B_s^0 → J/ψ φ decays at 3.2σ from CMS 13 TeV data combined with prior 8 TeV result.
Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging
cs.LG 2025-10 unverdicted novelty 5.0

SAL-T enhances the linformer with spatially aware kinematic partitioning and convolutions to match full-attention transformer performance on jet tagging while keeping linear complexity and lower latency.
The LHCb Experiment
hep-ex 2026-05 unverdicted novelty 2.0

This review summarizes the historical motivation, detector design, experimental techniques, and major physics results of the LHCb experiment at the LHC.
A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios
cs.LG 2025-12 accept novelty 2.0

A synthesis of diffusion-based simulation-based inference methods that address model misspecification, irregular observations, and missing data in scientific applications.
Software and computing for Run 3 of the ATLAS experiment at the LHC
hep-ex 2024-04 unverdicted novelty 2.0

ATLAS reports on its Run 3 software infrastructure for data management, workflows, databases, validation, and physics analysis tools at the LHC.