abhinavv3 commited on
Commit
a406486
·
verified ·
1 Parent(s): 329492e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -17,6 +17,15 @@ This model is designed for scalable training, long-context understanding, and ef
17
 
18
  ---
19
 
 
 
 
 
 
 
 
 
 
20
  ## 🔬 Key Features
21
 
22
  - ✅ **Grouped Query Attention (GQA)** — Groups query heads to share key/value heads, saving memory and speeding up attention
 
17
 
18
  ---
19
 
20
+
21
+ **Key Modifications from the Original Paper:**
22
+
23
+ 1) Replaced the default positional encoding with Rotary Positional Embeddings (RoPE) ,
24
+ 2) Altered the attention mechanism to use Grouped Query Attention ,
25
+ 3) Customized the DataLoader to support sharded datasets and data parallelism ,
26
+ 4) Implemented Mixed Precision Training along with Distributed Data Parallel (DDP) support ,
27
+ 5) Tweaked several training and model hyperparameters for better adaptability .
28
+
29
  ## 🔬 Key Features
30
 
31
  - ✅ **Grouped Query Attention (GQA)** — Groups query heads to share key/value heads, saving memory and speeding up attention