5 98 108

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a collection 6 days ago

Qwen3

liked a model 6 days ago

unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF

liked a model 7 days ago

moonshotai/Kimi-K2-Instruct

View all activity

Organizations

upvoted a collection 6 days ago

Qwen3

Collection

Qwen's new Qwen3 models. In Unsloth Dynamic 2.0, GGUF, 4-bit and 16-bit Safetensor formats. Includes 128K Context Length variants. • 73 items • Updated about 9 hours ago • 172

liked a model 6 days ago

unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF

Text Generation • 480B • Updated 5 days ago • 54.7k • 122

liked 2 models 7 days ago

moonshotai/Kimi-K2-Instruct

Text Generation • Updated about 15 hours ago • 283k • • 1.89k

facebook/vjepa2-vitl-fpc64-256

Video Classification • 0.3B • Updated Jun 17 • 54.3k • 145

updated a collection 18 days ago

Safety & Security

Collection

10 items • Updated 18 days ago

liked a model 18 days ago

facebook/Meta-SecAlign-8B

Updated 13 days ago • 2.67k • 7

reacted to fdaudens's post with 🔥 about 1 month ago

Post

1831

This is what efficient AI looks like: Gemma 3n just dropped - a natively multimodal model that runs entirely on your device. No cloud. No API calls.

🧠 Text, image, audio, and video - handled locally.
⚡️Only needs 2B in GPU memory to run
🤯 First sub-10B model to hit 1300+ Elo
✅ Plug-and-play with Hugging Face, MLX, llama.cpp, and more.

Plus: Multilingual out of the box (140+ languages), fine-tune in a free Colab notebook.

google/gemma-3n-685065323f5984ef315c93f4

1 reply

upvoted a paper about 1 month ago

Safety at Scale: A Comprehensive Survey of Large Model Safety

Paper • 2502.05206 • Published Feb 2 • 1

liked a dataset about 2 months ago

ai4bharat/Indic-Rag-Suite

Viewer • Updated Jun 7 • 21.4M • 101 • 2

liked a model about 2 months ago

deepseek-ai/DeepSeek-R1-0528

Text Generation • 685B • Updated May 29 • 498k • • 2.33k

upvoted a collection about 2 months ago

MobileLLM

Collection

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 40 items • Updated Jun 23 • 118

reacted to codelion's post with 🚀 about 2 months ago

Post

3430

🧠 We just implemented Andrej Karpathy's "third paradigm" for LLM learning!

System Prompt Learning (SPL) enables LLMs to automatically learn problem-solving strategies from experience, rather than relying on static prompts.

🚀 How it works:
Your LLM builds a database of effective strategies, selects the best ones for each problem, and refines them over time based on success rates.

📊 Results across math benchmarks:
Arena Hard: 29% → 37.6% (+8.6%)
AIME24: 23.33% → 30% (+6.67%)
OptILLMBench: 61% → 65% (+4%)

The best part? All strategies are human-readable and the system gets progressively better at problem types you use frequently.

✨ Key benefits:
🔄 Cumulative learning over time
📖 Transparent, inspectable strategies
🔌 Works with any OpenAI-compatible API
⚡ Simple integration: just add "spl-" prefix to your model

Built as an open-source plugin in optillm. After 500 queries, our system developed 129 strategies and refined 97 of them!

This feels like a genuine step toward AI that learns from experience while staying completely interpretable.

🔗 GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
📖 Full article: https://huggingface.co/blog/codelion/system-prompt-learning
🐦 Original Karpathy tweet: https://x.com/karpathy/status/1921368644069765486

Have you experimented with advanced system prompting? What strategies would you want your LLM to learn?