Abdoul Majid O. Thiombiano

thiomajid

https://thiomajid.github.io/

AI & ML interests

NLP & Reasoning

Recent Activity

updated a model about 9 hours ago

thiomajid/hausa_blend

published a model about 11 hours ago

thiomajid/hausa_blend

updated a model 4 days ago

thiomajid/hausa_lm

View all activity

Organizations

upvoted a paper 8 days ago

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published 28 days ago • 57

upvoted an article 8 days ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

•

Jan 30

• 105

upvoted a paper 9 days ago

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

Paper • 2507.14137 • Published 13 days ago • 31

upvoted a paper 17 days ago

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

Paper • 2507.08800 • Published 20 days ago • 75

upvoted a paper about 1 month ago

Matrix-Game: Interactive World Foundation Model

Paper • 2506.18701 • Published Jun 23 • 62

upvoted a collection about 1 month ago

Avey 1 Research Preview

Collection

1.5B preview models trained on 100B tokens of FineWeb, and an instruct-tuned version on smoltalk. • 3 items • Updated Jun 16 • 6

upvoted a paper 2 months ago

Kuwain 1.5B: An Arabic SLM via Language Injection

Paper • 2504.15120 • Published Apr 21 • 121

upvoted 3 papers 4 months ago

upvoted 7 papers 5 months ago

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Paper • 2503.10480 • Published Mar 13 • 54

Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9 • 38

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 146

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19 • 36

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 162

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 121

upvoted 3 papers 6 months ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 145

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 121

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 414

Abdoul Majid O. Thiombiano

AI & ML interests

Recent Activity

Organizations

thiomajid's activity

KV Caching Explained: Optimizing Transformer Inference Efficiency