BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
Journal of Machine Learning Research , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
Pre-trained MoE models exhibit deep-layer routing collapse for low-resource languages like Hebrew, largely corrected by continual pre-training on balanced bilingual data, with consistent patterns observed in Japanese.
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
At tiny scale, MoE transformers lower validation loss versus dense models when active parameters match but raise it when total stored parameters match.
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.
citing papers explorer
-
BoolXLLM: LLM-Assisted Explainability for Boolean Models
BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.
-
Mixture of Experts for Low-Resource LLMs
Pre-trained MoE models exhibit deep-layer routing collapse for low-resource languages like Hebrew, largely corrected by continual pre-training on balanced bilingual data, with consistent patterns observed in Japanese.
-
When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
-
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching
At tiny scale, MoE transformers lower validation loss versus dense models when active parameters match but raise it when total stored parameters match.
-
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts
VLA-GSE uses spectral decomposition of the VLA backbone to create generalized and specialized experts, enabling effective robot task adaptation while updating only 2.51% of parameters and achieving 81.2% zero-shot success on LIBERO-Plus.