hub

Adaptive Mixtures of Local Experts

Robert A · 1991 · DOI 10.1162/neco.1991.3.1.79

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open at publisher browse 17 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2 method 1

citation-polarity summary

background 1 support 1 use method 1

representative citing papers

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Expert specialization in vision MoE models is dominated by a stable animate-inanimate distinction visible from gating to readout, with broader tuning to continuous visual and semantic dimensions rather than narrow categorical preferences.

Are Independently Estimated View Uncertainties Comparable? Unified Routing for Trusted Multi-View Classification

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

TMUR replaces independent evidential fusion with a unified router that observes global multi-view context to produce reliable expert weights and address scale bias in view uncertainties.

On the Decompositionality of Neural Networks

cs.LO · 2026-04-09 · unverdicted · novelty 7.0

Neural decompositionality is defined via decision-boundary semantic preservation, and language transformers largely satisfy it under SAVED while vision models often do not.

Hypothesis generation and updating in large language models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

SAMoRA is a parameter-efficient fine-tuning framework that uses semantic-aware routing and task-adaptive scaling within a Mixture of LoRA Experts to improve multi-task performance and generalization over prior methods.

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.

A Human-Centric Framework for Data Attribution in Large Language Models

cs.CY · 2026-02-11 · unverdicted · novelty 6.0

Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

cs.CL · 2025-09-26 · unverdicted · novelty 6.0

EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.

Soft Learning

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.

When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.

A graph-based Neural Network surrogate model for accelerating semi-analytical model of galaxy formation and evolution

astro-ph.GA · 2026-04-25 · conditional · novelty 5.0

A conditional graph neural network serves as an accurate and fast surrogate for semi-analytic galaxy formation models, predicting key properties across cosmic time.

STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation

cs.IR · 2026-04-21 · unverdicted · novelty 5.0

STK-Adapter adds Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate evolving TKG graphs and event chains into LLMs, reducing information loss and improving extrapolation performance over prior methods.

Attention Residuals

cs.CL · 2026-03-16 · unverdicted · novelty 5.0

Attention Residuals replaces fixed residual summation with input-dependent softmax attention over preceding layers, and a blocked variant is shown to improve uniformity and downstream performance in a 48B-parameter model pre-trained on 1.4T tokens.

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

cs.CL · 2024-01-11 · unverdicted · novelty 5.0

DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

Apparent Age Estimation: Challenges and Outcomes

cs.LG · 2026-04-03 · unverdicted · novelty 3.0

AMRL reaches state-of-the-art accuracy on apparent age estimation yet exhibits clear performance drops for Asian and African American groups due to inconsistent feature focus, showing that technical tweaks are not enough without diverse localized data.

Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation

cs.CL · 2026-05-09

Soft-to-Hard Routing in Sparse Mixture-of-Experts Models

cs.LG · 2026-05-04

citing papers explorer

Showing 17 of 17 citing papers.

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts cs.CV · 2026-05-20 · unverdicted · none · ref 2
Expert specialization in vision MoE models is dominated by a stable animate-inanimate distinction visible from gating to readout, with broader tuning to continuous visual and semantic dimensions rather than narrow categorical preferences.
Are Independently Estimated View Uncertainties Comparable? Unified Routing for Trusted Multi-View Classification cs.LG · 2026-04-10 · unverdicted · none · ref 18
TMUR replaces independent evidential fusion with a unified router that observes global multi-view context to produce reliable expert weights and address scale bias in view uncertainties.
On the Decompositionality of Neural Networks cs.LO · 2026-04-09 · unverdicted · none · ref 20
Neural decompositionality is defined via decision-boundary semantic preservation, and language transformers largely satisfy it under SAVED while vision models often do not.
Hypothesis generation and updating in large language models cs.LG · 2026-05-07 · unverdicted · none · ref 146
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning cs.CL · 2026-04-21 · unverdicted · none · ref 12
SAMoRA is a parameter-efficient fine-tuning framework that uses semantic-aware routing and task-adaptive scaling within a Mixture of LoRA Experts to improve multi-task performance and generalization over prior methods.
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models cs.LG · 2026-04-07 · unverdicted · none · ref 13
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
A Human-Centric Framework for Data Attribution in Large Language Models cs.CY · 2026-02-11 · unverdicted · none · ref 92
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts cs.CL · 2025-09-26 · unverdicted · none · ref 18
EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.
Soft Learning cs.LG · 2026-05-16 · unverdicted · none · ref 29
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing cs.CV · 2026-05-15 · unverdicted · none · ref 1
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
A graph-based Neural Network surrogate model for accelerating semi-analytical model of galaxy formation and evolution astro-ph.GA · 2026-04-25 · conditional · none · ref 22
A conditional graph neural network serves as an accurate and fast surrogate for semi-analytic galaxy formation models, predicting key properties across cosmic time.
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation cs.IR · 2026-04-21 · unverdicted · none · ref 77
STK-Adapter adds Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate evolving TKG graphs and event chains into LLMs, reducing information loss and improving extrapolation performance over prior methods.
Attention Residuals cs.CL · 2026-03-16 · unverdicted · none · ref 20
Attention Residuals replaces fixed residual summation with input-dependent softmax attention over preceding layers, and a blocked variant is shown to improve uniformity and downstream performance in a 48B-parameter model pre-trained on 1.4T tokens.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 26
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.
Apparent Age Estimation: Challenges and Outcomes cs.LG · 2026-04-03 · unverdicted · none · ref 8
AMRL reaches state-of-the-art accuracy on apparent age estimation yet exhibits clear performance drops for Asian and African American groups due to inconsistent feature focus, showing that technical tweaks are not enough without diverse localized data.
Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation cs.CL · 2026-05-09 · unreviewed · ref 18
Soft-to-Hard Routing in Sparse Mixture-of-Experts Models cs.LG · 2026-05-04 · unreviewed · ref 6

Adaptive Mixtures of Local Experts

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer