Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
74.7
TFLOPS
273
16
48
nyuuzyou
nyuuzyou
Follow
hobqueer's profile picture
BaCreative's profile picture
Chrisss-0810's profile picture
272 followers
ยท
24 following
https://ducks.party/donate
nyuuzyou
AI & ML interests
None yet
Recent Activity
reacted
to
nouamanetazi
's
post
with ๐ค
about 3 hours ago
After training ๐๐ฆ๐จ๐ฅ๐๐๐ on ๐๐๐ ๐๐๐๐๐ฌ for nearly a month, I've come to realize something most people overlook: ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐ ๐ข๐ฌ ๐ญ๐ก๐ ๐ฆ๐๐ค๐-๐จ๐ซ-๐๐ซ๐๐๐ค ๐๐๐๐ญ๐จ๐ซ ๐ข๐ง ๐๐๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ . ๐ฅ Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious ๐๐๐๐ ๐๐ซ๐ซ๐จ๐ซ๐ฌ, or when your expensive GPU cluster is running at ๐๐% ๐๐๐๐ข๐๐ข๐๐ง๐๐ฒ, the problem isn't your model. It's most probably a ๐ฆ๐ข๐ฌ๐ฎ๐ฌ๐ ๐จ๐ ๐ญ๐ก๐ ๐ก๐๐ซ๐๐ฐ๐๐ซ๐. ๐ ๏ธ Questions that seemed simple but had no clear answers: Why is ๐๐จ๐ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ ๐ฌ๐ฅ๐จ๐ฐ๐๐ซ ๐ญ๐ก๐๐ง ๐๐๐ง๐ฌ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ? Which ๐๐๐๐ ๐๐ฅ๐๐ ๐ฌ should we actually set? How often should we checkpoint without killing throughput? That's why we built ๐๐ก๐ ๐๐ฆ๐จ๐ฅ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ฅ๐๐ฒ๐๐จ๐จ๐ค ๐: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐ ๐ฅ๐๐ฒ๐๐ซ that most teams get wrong. We validated real vs theoretical bandwidth across the entire stack: ๐๐๐๐ ๐ก๐ข๐ญ๐ญ๐ข๐ง๐ ๐ ๐๐/๐ฌ, ๐๐๐๐ข๐ง๐ค ๐.๐ ๐ซ๐๐๐๐ก๐ข๐ง๐ ๐๐๐ ๐๐/๐ฌ, ๐๐๐๐ ๐๐๐ง๐ ๐๐ญ ๐๐.๐ ๐๐/๐ฌ. Then we ran collective operations across ๐๐๐ ๐๐๐๐ฌ (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from ๐๐๐ ๐๐/๐ฌ on a single node to ๐๐๐-๐๐๐ ๐๐/๐ฌ across 16 nodes. If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging. ๐๐ก๐ ๐๐ฆ๐จ๐ฅ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ฅ๐๐ฒ๐๐จ๐จ๐ค: https://lnkd.in/e5MKXUHS Shared with โค๏ธ by the HuggingFace team
liked
a model
about 3 hours ago
utter-project/TowerVision-9B
reacted
to
mitkox
's
post
with ๐ฅ
11 days ago
Iโm just reading that Ryzen AI 395 has to be 30% slower than DGX Spark in LLM inferencingโฆ and only 96GB GPU RAMโฆ good I havenโt RTFM upfront, so I made the AMD faster with 128GB unified RAM ๐ซก Z2 mini G1a can run Qwen3 Coder 30B BF16 at 26.8 tok/sec in ~60GB GPU RAM
View all activity
Organizations
nyuuzyou
's datasets
138
Sort:ย Recently updated
nyuuzyou/bordaru-posts
Viewer
โข
Updated
Aug 3, 2024
โข
5.25M
โข
9
โข
1
nyuuzyou/nopaste-paefchen-archive
Viewer
โข
Updated
Jul 22, 2024
โข
1.72M
โข
6
nyuuzyou/cmc-posts
Viewer
โข
Updated
Jul 22, 2024
โข
1.23M
โข
3
nyuuzyou/smartlab-posts
Viewer
โข
Updated
Jul 13, 2024
โข
950k
โข
4
nyuuzyou/moshub-code
Updated
Jul 10, 2024
โข
46
โข
2
nyuuzyou/gitflic-code
Preview
โข
Updated
Jul 9, 2024
โข
67
โข
2
nyuuzyou/gitverse-code
Preview
โข
Updated
Jul 6, 2024
โข
30
โข
2
nyuuzyou/3dnews-articles
Viewer
โข
Updated
Feb 29, 2024
โข
54.2k
โข
5
โข
2
nyuuzyou/EMERCOM-questions
Viewer
โข
Updated
Feb 23, 2024
โข
25.7k
โข
12
โข
1
nyuuzyou/9111-questions
Preview
โข
Updated
Feb 19, 2024
โข
25
โข
7
nyuuzyou/rutube-channels
Viewer
โข
Updated
Feb 18, 2024
โข
26.8M
โข
11
โข
2
nyuuzyou/PM-products
Viewer
โข
Updated
Feb 4, 2024
โข
11.3k
โข
14
โข
1
nyuuzyou/ke-products
Viewer
โข
Updated
Jan 29, 2024
โข
2.06M
โข
20
โข
1
nyuuzyou/wb-feedbacks
Viewer
โข
Updated
Jan 17, 2024
โข
194M
โข
95
โข
5
nyuuzyou/wb-products
Viewer
โข
Updated
Jan 16, 2024
โข
336M
โข
70
โข
5
nyuuzyou/stickers
Updated
Jan 15, 2024
โข
186
โข
6
nyuuzyou/wb-questions
Viewer
โข
Updated
Jan 15, 2024
โข
7.41M
โข
8
โข
3
nyuuzyou/AnimeHeadsv3
Viewer
โข
Updated
Jul 2, 2023
โข
10.9k
โข
69
โข
5
Previous
1
...
3
4
5
Next