MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

· 2026 · cs.CL · arXiv 2604.24374

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how information is arranged across embedding dimensionality and model depth. In this work, we propose MIPIC (Matryoshka Representation Learning via Self-Distilled Intra-Relational Alignment and Progressive Information Chaining), a unified training framework designed to produce structurally coherent and semantically compact Matryoshka representations. MIPIC promotes cross-dimensional structural consistency through Self-Distilled Intra-Relational Alignment (SIA), which aligns token-level geometric and attention-driven relations between full and truncated representations using top-k CKA self-distillation. Complementarily, it enables depth-wise semantic consolidation via Progressive Information Chaining (PIC), a scaffolded alignment strategy that incrementally transfers mature task semantics from deeper layers into earlier layers. Extensive experiments on STS, NLI, and classification benchmarks (spanning models from TinyBERT to BGEM3, Qwen3) demonstrate that MIPIC yields Matryoshka representations that are highly competitive across all capacities, with significant performance advantages observed under extreme low-dimensional.

representative citing papers

MM-Matryoshka: Towards Budget-Elastic Visual Document Retrieval via a 2D Multimodal Matryoshka Training Framework

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

MM-Matryoshka is a 2D Matryoshka training framework enabling budget-elastic ColPali-style multi-vector visual document retrieval along dimension and layer without separate models per budget.

citing papers explorer

Showing 1 of 1 citing paper.

MM-Matryoshka: Towards Budget-Elastic Visual Document Retrieval via a 2D Multimodal Matryoshka Training Framework cs.CV · 2026-06-03 · unverdicted · none · ref 68 · internal anchor
MM-Matryoshka is a 2D Matryoshka training framework enabling budget-elastic ColPali-style multi-vector visual document retrieval along dimension and layer without separate models per budget.

MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

fields

years

verdicts

representative citing papers

citing papers explorer