Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
Adjay Sagar
adjaysagar
Follow
UD-Shiprocket's profile picture
Nyzo77's profile picture
CaramelDelight8491's profile picture
4 followers
ยท
24 following
AI & ML interests
None yet
Recent Activity
reacted
to
m-ric
's
post
with ๐
8 days ago
STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! ๐จ A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year. โก๏ธ Tiny Recursive Model is 7M parameters โก๏ธ On ARC-AGI, it beats flagship models like Gemini-2.5-pro Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger and had 1,000 as many authors ๐ (Alexia is alone on the paper) What's this sorcery? In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning. @AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M) Hierarchical Reasoning Model had introduced one main feature: ๐ Deep supervision In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps. They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers. Alexia studied what was useful and what wasn't, and cleaned the architecture as follows : Why use a recurrent architecture, when you can just make it a loop? โก๏ธ She made the network recursive, looping over itself Why use 2 latent variables ? โก๏ธ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer. โก๏ธ She runs ablation studies to validate that 2 is indeed optimal. This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do. This might be the breakthrough we've been awaiting for so long!
published
a model
19 days ago
adjaysagar/hinglish-tts-checkpoint-265000
reacted
to
merve
's
post
with ๐ฅ
2 months ago
Real-time DEtection Transformer (RT-DETR) landed in transformers ๐คฉ with Apache 2.0 license ๐ ๐ models: https://huggingface.co/PekingU ๐ demo: https://huggingface.co/spaces/merve/RT-DETR-tracking-coco ๐ paper: https://huggingface.co/papers/2304.08069 ๐ notebook: https://github.com/merveenoyan/example_notebooks/blob/main/RT_DETR_Notebook.ipynb YOLO models are known to be super fast for real-time computer vision, but they have a downside with being volatile to NMS ๐ฅฒ Transformer-based models on the other hand are computationally not as efficient ๐ฅฒ Isn't there something in between? Enter RT-DETR! The authors combined CNN backbone, multi-stage hybrid decoder (combining convs and attn) with a transformer decoder. In the paper, authors also claim one can adjust speed by changing decoder layers without retraining altogether. The authors find out that the model performs better in terms of speed and accuracy compared to the previous state-of-the-art. ๐คฉ
View all activity
Organizations
models
1
adjaysagar/hinglish-tts-checkpoint-265000
Updated
19 days ago
datasets
0
None public yet