Spaces:
Running
AGENTS.md - Requirements & Decisions
This document records the requirements, architecture decisions, and rationale for the S3 to HF Bucket Importer.
Requirements
Functional
OAuth Login: Users sign in with Hugging Face OAuth. Scopes:
manage-repos(create buckets),jobs(run Jobs), plus defaultopenid+profile.S3 Configuration: Users provide AWS credentials (Access Key ID, Secret Access Key), region, bucket name, optional endpoint URL (for S3-compatible services like MinIO), and optional source prefix.
File Browser: Display S3 bucket contents in a tree view with:
- Lazy loading (only load folder contents on expand)
- Checkboxes for file/folder selection
- Select all / Deselect all
- File count and size statistics
- CORS fallback mode if browser can't access S3 directly
Destination Configuration: User provides:
- Bucket name (
bucketornamespace/bucketformat) - Optional destination prefix
- Auto-create bucket if it doesn't exist
- Bucket name (
Import Execution: Launch an HF Job that:
- Installs
huggingface_hub[s3]from branchcursor/s3-to-hf-bucket-ingestion-f144 - Runs
hf buckets importwith appropriate arguments - Passes S3 credentials as encrypted secrets
- Uses CPU hardware (I/O-bound task)
- Installs
No Local Storage: Nothing persists in the browser. Page refresh = start over. All credentials are in-memory only.
Non-Functional
- Professional, confidence-inspiring design
- Dark theme inspired by HF Buckets announcement
- Responsive (works on mobile)
- Clear error messages and graceful degradation
Architecture Decisions
Decision 1: No Build Step
Choice: Vanilla HTML/CSS/JS with ES modules from CDN (esm.sh)
Alternatives considered:
- Vite + vanilla JS (tree-shaking, npm packages)
- React/Vue/Svelte (component model)
Rationale:
- Simplest deployment: just upload files to a static Space
- No
node_modules, nopackage.json, no build config - CDN imports are cached by the browser across sessions
- The app is a single page with ~4 states - doesn't warrant a framework
- AWS SDK v3 is ~500KB from CDN but cached effectively
- Easy for anyone to maintain: just HTML/CSS/JS, no tooling knowledge needed
Decision 2: CDN Provider (esm.sh)
Choice: esm.sh for ES module CDN
Rationale:
- Reliable ESM CDN that handles CommonJS -> ESM conversion
- Supports import maps natively
- Handles transitive dependencies for AWS SDK v3
- Alternative was jsdelivr+esm or unpkg, but esm.sh has better ESM support
Decision 3: CORS Fallback Strategy
Choice: Try browser-side S3 listing, gracefully degrade to manual mode
Alternatives considered:
- Skip browser listing entirely (simpler but less interactive)
- Require CORS (better UX but higher friction)
Rationale:
- Most S3 buckets won't have CORS configured for our domain
- Attempting listing first gives the best UX for configured buckets
- Manual fallback (include/exclude patterns) lets everyone use the tool
- Clear CORS instructions help users enable browsing if they want
Decision 4: Selection-to-Pattern Mapping
Choice: Compute include or exclude patterns, whichever set is smaller
Rationale:
hf buckets importsupports--includeand--excludewith fnmatch patterns- When most files are selected, use
--excludefor the few deselected - When few files are selected, use
--includefor the selected ones - Folder-level patterns (e.g.,
folder/*) keep command lines short - Shell-escaping prevents injection in the Job command
Decision 5: Job Configuration
Choice: python:3.12 Docker image with pip install + bash -c
Alternatives considered:
- Custom Docker image with huggingface_hub pre-installed
ghcr.io/astral-sh/uvimage withuv run
Rationale:
python:3.12is a standard, well-maintained imagepip installfrom git is simple and reliablebash -callows chaining install + import in one command- Once
hf buckets importis released, change topip install 'huggingface_hub[s3]>=X.Y.Z' - CPU-only hardware (
cpu-basic) is sufficient - this is purely I/O-bound
Decision 6: Bucket Creation
Choice: Call POST /api/repos/create with type: "bucket" before submitting the Job
Rationale:
- The Job needs the bucket to exist before it can write to it
- Creating from the browser (pre-Job) ensures the user has proper permissions
- Handle 409 (already exists) gracefully
Decision 7: Token Handling
Choice: Pass OAuth token as Job secret (HF_TOKEN), S3 credentials as secrets too
Rationale:
- Secrets are encrypted server-side by the Jobs API
- Injected as environment variables at runtime
- Never appear in logs or the HF UI
- The
hfCLI automatically usesHF_TOKENfor authentication - boto3/s3fs automatically uses
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY
OAuth Scopes
| Scope | Purpose |
|---|---|
openid |
ID token (always included) |
profile |
Username, avatar (always included) |
manage-repos |
Create and manage buckets |
jobs |
Create and monitor HF Jobs |
API Endpoints Used
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /api/whoami-v2 |
Get user info + orgs |
| POST | /api/repos/create |
Create destination bucket |
| POST | /api/jobs/{namespace} |
Submit import Job |
| GET | /api/jobs/{namespace}/{id} |
Poll Job status |
Security Model
S3 credentials: Only transmitted from browser to HF Jobs API as encrypted secrets. Never stored in localStorage, cookies, or any persistent storage.
OAuth token: Held in JavaScript memory only. Not persisted. Passed to Job as encrypted secret for HF API authentication.
Shell injection prevention: All user inputs interpolated into the Job command are escaped using single-quote wrapping with internal quote escaping.
No backend: The static site has no server-side code. All API calls are made directly from the browser to S3 and HF APIs.
File Structure
s3-importer/
βββ README.md # Space YAML frontmatter + project docs
βββ AGENTS.md # This file
βββ index.html # HTML structure + import maps
βββ style.css # Dark theme (CSS custom properties)
βββ app.js # Application logic (ES module, ~500 lines)
Future Improvements
- Show Job logs inline (stream from
/api/jobs/{namespace}/{id}/logs) - Support AWS session tokens (for temporary credentials / assumed roles)
- Remember recent imports (optional, opt-in localStorage)
- Support importing to existing bucket with merge strategy
- Add a "dry run" option that shows what would be imported
- Update
huggingface_hubinstall to stable release once available