ajibawa-2023
·
AI & ML interests
LLM, RL, DL, ML, AGI. Developing LLMs (preferably fully fine tuned ) for various use cases.
Recent Activity
reacted
to
fdaudens's
post
with 🔥
12 days ago
You might not have heard of Moonshot AI — but within 24 hours, their new model Kimi K2 shot to the top of Hugging Face’s trending leaderboard.
So… who are they, and why does it matter?
Had a lot of fun co-writing this blog post with @xianbao, with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.
🧵 A few standout facts:
1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.
2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI — still a rare ambition among Chinese AI labs.
3. A trillion-parameter model that’s surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.
4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.
Most importantly, their move from closed to open source signals a broader shift in China’s AI scene — following Baidu’s pivot. But as Yang puts it: “Users are the only real leaderboard.”
👇 Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained
View all activity
Organizations
view post
Hi All, I recently released two Audio datasets which are generated using my earlier released dataset:
ajibawa-2023/Children-Stories-CollectionFirst Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.
view post
New Dataset: Software-ArchitectureLink:
ajibawa-2023/Software-ArchitectureI am releasing a Large Dataset covering topics related to Software-Architecture. This dataset consists of around 450,000 lines of data in jsonl.I have included following topics:Architectural FrameworksArchitectural Patterns for ReliabilityArchitectural Patterns for ScalabilityArchitectural PatternsArchitectural Quality AttributesArchitectural TestingArchitectural ViewsArchitectural Decision-MakingAdvanced ResearchCloud-Based ArchitecturesComponent-Based ArchitectureData ArchitectureEmerging TrendsEvent-Driven ArchitectureEvolvability and MaintainabilityMicroservices and MonolithicMicroservices ArchitectureSecurity ArchitectureService-Oriented ArchitectureSoftware Design Principlesand Many More!This dataset is useful in LLM development. Also those who are working on developing Software development related LLMs then this dataset can be useful.This dataset is very useful to Researchers as well.
models
32
ajibawa-2023/Python-Code-13B
Text Generation
•
13B
•
Updated
•
1.41k
•
6
ajibawa-2023/Young-Children-Storyteller-Mistral-7B
Text Generation
•
7B
•
Updated
•
92
•
21
ajibawa-2023/SlimOrca-Llama-3-8B
Text Generation
•
8B
•
Updated
•
28
•
•
4
ajibawa-2023/Code-Llama-3-8B
Text Generation
•
8B
•
Updated
•
583
•
31
ajibawa-2023/Uncensored-Frank-Llama-3-8B
Text Generation
•
8B
•
Updated
•
130
•
•
13
ajibawa-2023/Scarlett-Llama-3-8B-v1.0
Text Generation
•
Updated
•
6
•
5
ajibawa-2023/Scarlett-Llama-3-8B
Text Generation
•
Updated
•
6
•
8
ajibawa-2023/Code-Mistral-7B
Text Generation
•
7B
•
Updated
•
73
•
15
ajibawa-2023/General-Stories-Mistral-7B
Text Generation
•
Updated
•
13
•
5
ajibawa-2023/Code-Jamba-v0.1
Text Generation
•
52B
•
Updated
•
8
•
7