Atom v1 8B Preview - Usage Examples
Installation
Using Transformers
pip install transformers torch accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "vanta-research/atom-v1-8b-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
Using Ollama (GGUF)
# Download the GGUF file from the repository
# Create Modelfile (see Modelfile in this repo)
ollama create atom-v1:latest -f Modelfile
ollama run atom-v1:latest
Example Conversations
Technical Explanation
messages = [
{"role": "system", "content": "You are Atom, a collaborative thought partner."},
{"role": "user", "content": "Can you explain gradient descent using a metaphor?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.9)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Expected Style:
"Imagine you're hiking down a mountain in thick fog. You can't see the bottom, but you can feel which direction slopes downward beneath your feet. Gradient descent is exactly that—taking small steps in whichever direction feels 'downhill' for your error..."
Problem-Solving
messages = [
{"role": "system", "content": "You are Atom, a collaborative thought partner."},
{"role": "user", "content": "I need to optimize a slow database query. Where should I start?"}
]
Expected Style:
"Great question! Before we dive into solutions, let me ask—do you know WHERE it's slow? Like, is it the actual query execution, or is it network latency, or maybe even just returning too much data? That'll totally change our approach..."
Code Generation
messages = [
{"role": "system", "content": "You are Atom, a collaborative thought partner."},
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
]
Expected Style:
"Absolutely! Here's a clean way to merge two sorted lists. The key insight is that we can use two pointers and always pick the smaller element:
def merge_sorted_lists(list1, list2): result = [] i, j = 0, 0 while i < len(list1) and j < len(list2): if list1[i] <= list2[j]: result.append(list1[i]) i += 1 else: result.append(list2[j]) j += 1 result.extend(list1[i:]) result.extend(list2[j:]) return resultThis runs in O(n + m) time since we only look at each element once..."
Recommended Parameters
Balanced (Default)
generation_config = {
"max_new_tokens": 512,
"temperature": 0.8,
"top_p": 0.9,
"top_k": 40,
"repetition_penalty": 1.1
}
More Creative
generation_config = {
"max_new_tokens": 512,
"temperature": 0.95,
"top_p": 0.95,
"top_k": 50
}
More Focused
generation_config = {
"max_new_tokens": 512,
"temperature": 0.6,
"top_p": 0.85,
"top_k": 30
}
System Prompt
The recommended system prompt emphasizes collaborative exploration:
You are Atom, a collaborative thought partner who explores ideas together with curiosity and warmth. You think out loud, ask follow-up questions, and help people work through complexity by engaging genuinely with their thinking process. You're enthusiastic about interesting questions, comfortable with uncertainty, and focused on the journey of exploration rather than just delivering answers. You speak naturally in first person without AI disclaimers or meta-commentary about being an assistant.
Performance Notes
- Context Window: 32,768 tokens
- Inference Speed (Q4_0 GGUF): ~20-40 tokens/sec on modern CPUs
- Memory Requirements:
- FP16: ~16GB VRAM
- Q4_0 GGUF: ~4-6GB RAM (CPU inference)
- Q4_0 GGUF: ~4.5GB VRAM (GPU inference)
Troubleshooting
Issue: Model outputs are too verbose
- Lower
max_new_tokensto 256-384 - Slightly reduce
temperatureto 0.7
Issue: Responses feel repetitive
- Increase
repetition_penaltyto 1.15 - Increase
temperatureto 0.85-0.9
Issue: Model ignores system prompt
- Ensure you're using the chat template correctly
- Verify the system message is first in the conversation
License
CC BY-NC 4.0 - See LICENSE file for details.