It is also surprising to me, quite a cool side effect. You'll find similar supporting results on pp 5 of 'Matryoshka Representation Learning'. In fact, the graphs on that page suggest that MRL-E/tied MRL underperforms vanilla their FF model on low dimensions, not to mention, the differences between their baseline and MRL aren't that significant (in those specific graphs). Funnily enough, Sentence Transformers implements MRL-E/tied MRL, not untied MRL.
@tomaarsen
's results seem to be quite different in that MRL-E is winning against no MRL.