5 6 3

Shantanu Agarwal

shantanuagarwal

AI & ML interests

None yet

Recent Activity

upvoted an article 26 days ago

Introducing SynthID Text

upvoted an article 26 days ago

Introduction to ggml

upvoted an article 28 days ago

KV Cache from scratch in nanoVLM

View all activity

Organizations

upvoted 2 articles 26 days ago

Article

Introducing SynthID Text

Oct 23, 2024

•

Article

Introduction to ggml

Aug 13, 2024

•

256

upvoted 2 articles 28 days ago

Article

KV Cache from scratch in nanoVLM

Jun 4

•

107

Article

Continuous batching from first principles

Nov 25

•

288

liked a Space about 2 months ago

The Smol Training Playbook

📚

2.72k

The secrets to building world-class LLMs

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 8 months ago

Hi @sirluk , thanks for the great post. Do you know if the above masking technique works for some attention implementations and would be incompatible with some other?

For example, would the above masking work with SDPA/flash_attention_2 and eager (each of these implementations are dealt a bit differently in https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L666 for example)?

upvoted an article 8 months ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

liked a Space 10 months ago

The Ultra-Scale Playbook

🌌

3.6k

The ultimate guide to training LLM on large GPU Clusters

liked a model over 1 year ago

Qwen/Qwen2.5-14B

Text Generation • 15B • Updated Sep 20, 2024 • 378k • • 138

New activity in Qwen/Qwen2.5-14B over 1 year ago

lora support

#3 opened over 1 year ago by

shantanuagarwal

New activity in mistralai/Mistral-Small-Instruct-2409 over 1 year ago

Base model please

❤️ 👍 23

#6 opened over 1 year ago by

rombodawg

New activity in nvidia/NV-Embed-v1 over 1 year ago

Why do we need to hardcode self._attn_implementation = "eager"

#35 opened over 1 year ago by

shantanuagarwal

MLP intermediate dimension

#3 opened over 1 year ago by

shantanuagarwal

upvoted a collection over 1 year ago

🤖 Agents

Collection

21 items • Updated Dec 31, 2024 • 171

Shantanu Agarwal

AI & ML interests

Recent Activity

Organizations