Papers
arxiv:2507.04886

Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Published on Jul 7
· Submitted by Bochkov on Jul 11
Authors:

Abstract

Transformer models equipped with fixed, visually derived embeddings outperform those with trainable embeddings on a reasoning benchmark, challenging the traditional role of embeddings in LLMs.

AI-generated summary

Understanding the locus of semantic representation in large language models (LLMs) is crucial for interpretability and architectural innovation. The dominant paradigm posits that trainable input embeddings serve as foundational "meaning vectors." This paper challenges that view. We construct Transformer models where the embedding layer is entirely frozen, with vectors derived not from data, but from the visual structure of Unicode glyphs. These non-semantic, precomputed visual embeddings are fixed throughout training. Our method is compatible with any tokenizer, including a novel Unicode-centric tokenizer we introduce to ensure universal text coverage. Despite the absence of trainable, semantically initialized embeddings, our models converge, generate coherent text, and, critically, outperform architecturally identical models with trainable embeddings on the MMLU reasoning benchmark. We attribute this to "representational interference" in conventional models, where the embedding layer is burdened with learning both structural and semantic features. Our results indicate that high-level semantics are not inherent to input embeddings but are an emergent property of the Transformer's compositional architecture and data scale. This reframes the role of embeddings from meaning containers to structural primitives. We release all code and models to foster further research.

Community

Paper author Paper submitter
edited 17 days ago

How does an LLM understand the meaning of 'wRiTe' when its building blocks—the individual character tokens 'w', 'R', 'i'—have no semantic content? This simple question challenges the very foundation of modern AI.
Our paper argues that high-level meaning is not contained in embeddings, but is constructed by the Transformer architecture. We prove this by replacing standard trainable embeddings with a completely frozen layer derived from the raw visual structure of Unicode glyphs. These non-semantic vectors are fixed before training even begins.
The result is paradigm-shifting: our models not only converge but consistently outperform identical architectures on reasoning benchmarks. This reveals a core principle for development: Induction. Instead of forcing a model to guess all its knowledge at once, we give it simple, immutable rules (the visual form of characters) and let it build complexity from there.
It’s the difference between trying to freeze an entire lake instantly, versus letting a solid sheet of ice form layer by layer. It’s the power of a locomotive moving an entire train by first conquering the inertia of a single car.
This foundational discovery unlocks a powerful new methodology. In our follow-up paper (arXiv:2507.07129 https://huggingface.co/papers/2507.07129 ), we demonstrate the practical payoff: merging expert models like LEGOs and "growing" powerful AI systems incrementally.
This two-part work presents a blueprint for a more modular, efficient, and scalable future for AI.

Sign up or log in to comment

Models citing this paper 26

Browse 26 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.04886 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.04886 in a Space README.md to link it from this page.

Collections including this paper 6