DeTox-Fed uses federated graph neural networks on local conversation graphs to detect toxic discussions in the Fediverse while keeping all raw data and labels on individual instances.
hub Mixed citations
Bernstam, Martin J Citardi, and Hua Xu
Mixed citation behavior. Most common role is background (57%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Establishes component-wise identifiability guarantees for partially shared causal latents in multimodal nonlinear mixing and introduces a differentiable Wasserstein-based module for recovery.
A 14-code content model for local post-hoc AI explanations, derived from 325 user statements and validated by experts with high reliability scores.
Introduces the diagnosis-driven CE video summarization task, the VideoCAP dataset with 240 annotated videos, and the DiCE framework that outperforms prior methods by screening candidates then weaving them into diagnostic contexts.
The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
M-CaStLe generalizes local stencil-based causal discovery to the multivariate case and decomposes resulting graphs into reaction and spatial components for interpretation in space-time gridded data.
A composite multi-proxy framework detects harmful drift in label-free risk decision systems and enables graduated governance alerts.
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
Secondary-structure-aware GNN using energy-filtered hydrogen-bond edges improves protein representation learning on standard benchmarks.
AAFLOW is a unified distributed runtime that models agentic workflows as operators with a zero-copy data plane using Apache Arrow and Cylon, achieving up to 4.64x pipeline speedup through improved data flow and batching.
KubePACS formulates spot instance selection as an ILP problem solved via Golden Section Search, integrating performance benchmarks and Spot Placement Scores to achieve 55% average higher performance per dollar than prior tools.
DATER is a new conceptual framework that analyzes six modern data architectures by historical context, defining features, and conformance to technical requirement dimensions.
PairWise is an open-source tool that finds visually aligned street-level image pairs by integrating feature matching with semantic segmentation mask alignment and outputs quantitative alignment metrics for filtering.
Introduces a cyclic-dynamics dataset for industrial MTSAD and benchmarks federated anomaly detection methods on it and a public dataset.
Snapchat's deployed system detects emerging topical trends in short videos via multimodal extraction, time-series burst detection, and LLM consolidation, achieving high precision per six months of human evaluation and improving content freshness in production.
VISPA adds voluntary energy monitoring, green-window job scheduling, and user feedback systems to a physics cluster to cut greenhouse-gas emissions through greater resource awareness.
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
This paper evaluates LLM voting ensembles for filtering noise in Wikidata-sourced mathematical concepts in Mathswitch, using MathWorld identifiers as positive control and grouping disagreements into three categories.
A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.
citing papers explorer
-
DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks
DeTox-Fed uses federated graph neural networks on local conversation graphs to detect toxic discussions in the Fediverse while keeping all raw data and labels on individual instances.
-
Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing
Establishes component-wise identifiability guarantees for partially shared causal latents in multimodal nonlinear mixing and introduces a differentiable Wasserstein-based module for recovery.
-
What Should Explanations Contain? A Human-Centered Explanation Content Model for Local, Post-Hoc Explanations
A 14-code content model for local post-hoc AI explanations, derived from 325 user statements and validated by experts with high reliability scores.
-
Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos
Introduces the diagnosis-driven CE video summarization task, the VideoCAP dataset with 240 annotated videos, and the DiCE framework that outperforms prior methods by screening candidates then weaving them into diagnostic contexts.
-
Rashomon Sets and Model Multiplicity in Federated Learning
The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.
-
Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
-
On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
-
M-CaStLe: Uncovering Local Causal Structures in Multivariate Space-Time Gridded Data
M-CaStLe generalizes local stencil-based causal discovery to the multivariate case and decomposes resulting graphs into reaction and spatial components for interpretation in space-time gridded data.
-
Label-Free Detection of Governance Evidence Degradation in Risk Decision Systems
A composite multi-proxy framework detects harmful drift in label-free risk decision systems and enables graduated governance alerts.
-
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
-
Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs
Secondary-structure-aware GNN using energy-filtered hydrogen-bond edges improves protein representation learning on standard benchmarks.
-
AAFLOW: Scalable Patterns for Agentic AI Workflows
AAFLOW is a unified distributed runtime that models agentic workflows as operators with a zero-copy data plane using Apache Arrow and Cylon, achieving up to 4.64x pipeline speedup through improved data flow and batching.
-
KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances
KubePACS formulates spot instance selection as an ILP problem solved via Golden Section Search, integrating performance benchmarks and Spot Placement Scores to achieve 55% average higher performance per dollar than prior tools.
-
Data Architectures and their Technical Requirements (DATER)
DATER is a new conceptual framework that analyzes six modern data architectures by historical context, defining features, and conformance to technical requirement dimensions.
-
PairWise Image Finder: An Open-source Tool for Finding Visually Aligned Street-Level Image Pairs for Urban Perception Studies
PairWise is an open-source tool that finds visually aligned street-level image pairs by integrating feature matching with semantic segmentation mask alignment and outputs quantitative alignment metrics for filtering.
-
Federated Learning for Multivariate Time Series Anomaly Detection in Industrial Automation
Introduces a cyclic-dynamics dataset for industrial MTSAD and benchmarks federated anomaly detection methods on it and a public dataset.
-
LLM-Enhanced Topical Trend Detection at Snapchat
Snapchat's deployed system detects emerging topical trends in short videos via multimodal extraction, time-series burst detection, and LLM consolidation, achieving high precision per six months of human evaluation and improving content freshness in production.
-
Enabling users to work sustainably on shared institute computing resources
VISPA adds voluntary energy monitoring, green-window job scheduling, and user feedback systems to a physics cluster to cut greenhouse-gas emissions through greater resource awareness.
-
Categorizing Mathematical Concepts with LLM Voting Ensembles in Mathswitch
This paper evaluates LLM voting ensembles for filtering noise in Wikidata-sourced mathematical concepts in Mathswitch, using MathWorld identifiers as positive control and grouping disagreements into three categories.
-
AI-Driven Security Alert Screening and Alert Fatigue Mitigation in Security Operations Centers: A Comprehensive Survey
A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.
- VulLink: A Dynamic Open-Access Vulnerability Graph Database for Cybersecurity Data Mining