A Judge-Aware Gated Multi-Task Learning architecture with outcome taxonomy supervision achieves SOTA accuracy on 13,937 UK Employment Tribunal decisions using an order of magnitude fewer parameters than generative SFT baselines on a 26B model.
Your mixture-of-experts llm is secretly an embedding model for free
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it