abhinavv3
/

GPT_with_Modified_Memorizing_Transformer

Text Generation

Model card Files Files and versions

abhinavv3 commited on Aug 2

Commit

a406486

·

verified ·

1 Parent(s): 329492e

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -17,6 +17,15 @@ This model is designed for scalable training, long-context understanding, and ef
 ---
 ## 🔬 Key Features
 - ✅ **Grouped Query Attention (GQA)** — Groups query heads to share key/value heads, saving memory and speeding up attention

 ---
+**Key Modifications from the Original Paper:**
+1) Replaced the default positional encoding with Rotary Positional Embeddings (RoPE) ,
+2) Altered the attention mechanism to use Grouped Query Attention ,
+3) Customized the DataLoader to support sharded datasets and data parallelism ,
+4) Implemented Mixed Precision Training along with Distributed Data Parallel (DDP) support ,
+5) Tweaked several training and model hyperparameters for better adaptability .
 ## 🔬 Key Features
 - ✅ **Grouped Query Attention (GQA)** — Groups query heads to share key/value heads, saving memory and speeding up attention