Hub documentation
Xet History & Overview
Xet History & Overview
In August 2024 Hugging Face acquired XetHub, a seed-stage startup based in Seattle, to replace Git LFS on the Hub.
Like Git LFS, a Xet-backed repository utilizes S3 as the remote storage with a .gitattributes
file at the repository root helping identify what files should be stored remotely.


A Git LFS pointer file provides metadata to locate the actual file contents in remote storage:
- SHA256: Provides a unique identifier for the actual large file. This identifier is generated by computing the SHA-256 hash of the file’s contents.
- Pointer size: The size of the pointer file stored in the Git repository.
- Size of the remote file: Indicates the size of the actual large file in bytes. This metadata is useful for both verification purposes and for managing storage and transfer operations.
A Xet pointer includes all of this information by design. Refer to the section on backwards compatibility with Git LFS with the addition of a Xet backed hash
field for referencing the file in Xet storage.


Unlike Git LFS, which deduplicates at the file level, Xet-enabled repositories deduplicate at the level of bytes. When a file backed by Xet storage is updated, only the modified data is uploaded to remote storage, significantly saving on network transfers. For many workflows, like incremental updates to model checkpoints or appending/inserting new data into a dataset, this improves iteration speed for yourself and your collaborators. To learn more about deduplication in Xet storage, refer to Deduplication.
Update on GitHub