The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
Abstract
The study examines interpretable categorical features in large language models, using sparse autoencoders to identify semantic concept emergence over time, across layers, and varying sizes, revealing spatial feature reactivation.
This paper studies the emergence of interpretable categorical features within large language models (LLMs), analyzing their behavior across training checkpoints (time), transformer layers (space), and varying model sizes (scale). Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations. Results indicate clear temporal and scale-specific thresholds for feature emergence across multiple domains. Notably, spatial analysis reveals unexpected semantic reactivation, with early-layer features re-emerging at later layers, challenging standard assumptions about representational dynamics in transformer models.
Community
Abstract:
"This paper studies the emergence of interpretable categorical features within large language models (LLMs), analyzing their behavior across training checkpoints (time), transformer layers (space), and varying model sizes (scale). Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations. Results indicate clear temporal and scale-specific thresholds for feature emergence across multiple domains. Notably, spatial analysis reveals unexpected semantic reactivation, with early-layer features re-emerging at later layers, challenging standard assumptions about representational dynamics in transformer models. "
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Probing the Vulnerability of Large Language Models to Polysemantic Interventions (2025)
- Emergent Specialization: Rare Token Neurons in Language Models (2025)
- TRACE for Tracking the Emergence of Semantic Representations in Transformers (2025)
- Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models (2025)
- Interpreting the Linear Structure of Vision-language Model Embedding Spaces (2025)
- How Syntax Specialization Emerges in Language Models (2025)
- Exploring How LLMs Capture and Represent Domain-Specific Knowledge (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper