DeTox-Fed uses federated graph neural networks on local conversation graphs to detect toxic discussions in the Fediverse while keeping all raw data and labels on individual instances.
hub Mixed citations
Hybrid cloud and hpc approach to high-performance dataframes
Mixed citation behavior. Most common role is background (57%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Establishes component-wise identifiability guarantees for partially shared causal latents in multimodal nonlinear mixing and introduces a differentiable Wasserstein-based module for recovery.
A 14-code content model for local post-hoc AI explanations, derived from 325 user statements and validated by experts with high reliability scores.
Introduces the diagnosis-driven CE video summarization task, the VideoCAP dataset with 240 annotated videos, and the DiCE framework that outperforms prior methods by screening candidates then weaving them into diagnostic contexts.
The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
M-CaStLe generalizes local stencil-based causal discovery to the multivariate case and decomposes resulting graphs into reaction and spatial components for interpretation in space-time gridded data.
KubePACS formulates spot instance selection as a multi-objective ILP problem solved with GSS, integrated with Karpenter, and reports 55% average higher performance per dollar than prior tools.
A composite multi-proxy framework detects harmful drift in label-free risk decision systems and enables graduated governance alerts.
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
AAFLOW is a unified distributed runtime that models agentic workflows as operators with a zero-copy data plane using Apache Arrow and Cylon, achieving up to 4.64x pipeline speedup through improved data flow and batching.
Snapchat's deployed system detects emerging topical trends in short videos via multimodal extraction, time-series burst detection, and LLM consolidation, achieving high precision per six months of human evaluation and improving content freshness in production.
VISPA adds voluntary energy monitoring, green-window job scheduling, and user feedback systems to a physics cluster to cut greenhouse-gas emissions through greater resource awareness.
Reveal-to-Revise integrates cross-modal attention fusion, Grad-CAM++ attribution, and bias feedback in a conditional attention WGAN-GP to report high accuracy, F1, and fairness metrics on multimodal MNIST variants and toxic text tasks.
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.
VulGD is a dynamic open-access graph database that aggregates vulnerability data from multiple sources and uses LLM embeddings to enable more accurate risk assessment and threat prioritization.
citing papers explorer
-
DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks
DeTox-Fed uses federated graph neural networks on local conversation graphs to detect toxic discussions in the Fediverse while keeping all raw data and labels on individual instances.
-
Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing
Establishes component-wise identifiability guarantees for partially shared causal latents in multimodal nonlinear mixing and introduces a differentiable Wasserstein-based module for recovery.
-
What Should Explanations Contain? A Human-Centered Explanation Content Model for Local, Post-Hoc Explanations
A 14-code content model for local post-hoc AI explanations, derived from 325 user statements and validated by experts with high reliability scores.
-
Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos
Introduces the diagnosis-driven CE video summarization task, the VideoCAP dataset with 240 annotated videos, and the DiCE framework that outperforms prior methods by screening candidates then weaving them into diagnostic contexts.
-
Rashomon Sets and Model Multiplicity in Federated Learning
The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.
-
Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
-
On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
-
M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data
M-CaStLe generalizes local stencil-based causal discovery to the multivariate case and decomposes resulting graphs into reaction and spatial components for interpretation in space-time gridded data.
-
KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances
KubePACS formulates spot instance selection as a multi-objective ILP problem solved with GSS, integrated with Karpenter, and reports 55% average higher performance per dollar than prior tools.
-
Label-Free Detection of Governance Evidence Degradation in Risk Decision Systems
A composite multi-proxy framework detects harmful drift in label-free risk decision systems and enables graduated governance alerts.
-
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
-
AAFLOW: Scalable Patterns for Agentic AI Workflows
AAFLOW is a unified distributed runtime that models agentic workflows as operators with a zero-copy data plane using Apache Arrow and Cylon, achieving up to 4.64x pipeline speedup through improved data flow and batching.
-
LLM-Enhanced Topical Trend Detection at Snapchat
Snapchat's deployed system detects emerging topical trends in short videos via multimodal extraction, time-series burst detection, and LLM consolidation, achieving high precision per six months of human evaluation and improving content freshness in production.
-
Enabling users to work sustainably on shared institute computing resources
VISPA adds voluntary energy monitoring, green-window job scheduling, and user feedback systems to a physics cluster to cut greenhouse-gas emissions through greater resource awareness.
-
Reveal-to-Revise: Explainable Bias-Aware Generative Modeling with Multimodal Attention
Reveal-to-Revise integrates cross-modal attention fusion, Grad-CAM++ attribution, and bias feedback in a conditional attention WGAN-GP to report high accuracy, F1, and fairness metrics on multimodal MNIST variants and toxic text tasks.
-
Towards the Anonymization of the Language Modeling
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
-
AI-Driven Security Alert Screening and Alert Fatigue Mitigation in Security Operations Centers: A Comprehensive Survey
A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.
-
VulGD: A LLM-Powered Dynamic Open-Access Vulnerability Graph Database
VulGD is a dynamic open-access graph database that aggregates vulnerability data from multiple sources and uses LLM embeddings to enable more accurate risk assessment and threat prioritization.