GBhaveshKumar commited on
Commit
5b0707e
·
verified ·
1 Parent(s): e9fc0e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -1
README.md CHANGED
@@ -9,4 +9,89 @@ tags:
9
  - Transformers
10
  - AI
11
  - Model
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - Transformers
10
  - AI
11
  - Model
12
+ ---
13
+
14
+ ConvoAI
15
+
16
+ # Model Card for Model ID
17
+
18
+ - GBhaveshKumar/ConvoAI
19
+
20
+ - Try this in Spaces : https://huggingface.co/spaces/GBhaveshKumar/ConvoAI
21
+
22
+ ## Model Details
23
+ - **Architecture:** Transformer decoder
24
+ - **Number of layers:** 32
25
+ - **Hidden size:** 1024
26
+ - **Number of attention heads:** 32
27
+ - **Context window:** 1024 tokens
28
+ - **Vocabulary size:** 50,257
29
+ - **Training objective:** Causal Conversation
30
+ - **Tokenizer:** GPT-2 Byte-Pair Encoding
31
+
32
+ ### Model Description
33
+
34
+ ConvoAI is a fully custom conversational AI model trained from scratch on the DailyDialog dataset. Built using transformer architecture, this model does not rely on any pretrained weights, making it unique and tailored specifically for generating human-like dialogues. It learns to produce coherent and contextually appropriate responses in multi-turn conversations.
35
+
36
+ The model uses the GPT-2 tokenizer and was trained with a causal Conversation objective. It performs well on casual conversations and is suitable for chatbot applications, dialogue system research, and educational purposes. While powerful within the domain of DailyDialog, its general knowledge and open-domain capabilities are limited due to its focused training scope. It's not pre-trained well.
37
+
38
+
39
+ ### Model Sources
40
+
41
+ https://huggingface.co/spaces/GBhaveshKumar/ConvoAI
42
+
43
+
44
+ ## Limitations
45
+
46
+ - The model was trained only on the DailyDialog dataset and not on large-scale corpora like Wikipedia or Common Crawl, it lacks general world knowledge and factual accuracy on many topics.
47
+
48
+ - With a limited training sequence length, the model might struggle to maintain coherence in very long conversations or when asked to recall information from earlier in the dialogue.
49
+
50
+ - The model doesn’t learn from new interactions unless it is retrained. It cannot update its knowledge or remember prior conversations between sessions.
51
+
52
+ - On CPU-only systems, generation can be slow.
53
+
54
+ - The model is not fine tuned.
55
+
56
+
57
+ ## How to Get Started with the Model
58
+
59
+ <pre> \`\`\`python from transformers import AutoModelForCausalLM, AutoTokenizer
60
+
61
+ model = AutoModelForCausalLM.from_pretrained("GBhaveshKumar/ConvoAI")
62
+ tokenizer = AutoTokenizer.from_pretrained("GBhaveshKumar/ConvoAI") \`\`\` </pre>
63
+
64
+
65
+ ### Training Data
66
+
67
+ Dataset: https://huggingface.co/datasets/roskoN/dailydialog
68
+
69
+
70
+ #### Training Hyperparameters
71
+
72
+ - Model type: Transformer
73
+
74
+ - Tokenizer: GPT-2 tokenizer
75
+
76
+ - Dataset: DailyDialog
77
+
78
+ - Number of Epochs: 15
79
+
80
+ - Maximum Sequence Length: 256
81
+
82
+ - Batch Size: 16
83
+
84
+ - Embedding Size (n_embd): 1024
85
+
86
+ - Number of Layers (n_layer): 32
87
+
88
+ - Number of Attention Heads (n_head): 32
89
+
90
+ - Sampling Strategy:
91
+
92
+ Top-k: 50
93
+
94
+ Top-p (nucleus sampling): 0.95
95
+
96
+ Temperature: 0.8
97
+