PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.
arXiv preprint arXiv:2410.07348 (2024)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
baseline 1polarities
baseline 1representative citing papers
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
ZEDA injects zero-output experts and uses two-stage self-distillation to adapt post-trained MoE models into dynamic ones that skip over half the experts, yielding 1.2x inference speedup with small accuracy drops.
BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.
citing papers explorer
-
PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis
PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.
-
Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
-
Post-Trained MoE Can Skip Half Experts via Self-Distillation
ZEDA injects zero-output experts and uses two-stage self-distillation to adapt post-trained MoE models into dynamic ones that skip over half the experts, yielding 1.2x inference speedup with small accuracy drops.
-
BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE
BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.