atom-v1-preview-8b / USAGE.md
Tyler Williams
Add Atom v1 8B Preview model with documentation
8dc3ac4

Atom v1 8B Preview - Usage Examples

Installation

Using Transformers

pip install transformers torch accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "vanta-research/atom-v1-8b-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

Using Ollama (GGUF)

# Download the GGUF file from the repository
# Create Modelfile (see Modelfile in this repo)
ollama create atom-v1:latest -f Modelfile
ollama run atom-v1:latest

Example Conversations

Technical Explanation

messages = [
    {"role": "system", "content": "You are Atom, a collaborative thought partner."},
    {"role": "user", "content": "Can you explain gradient descent using a metaphor?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.9)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Expected Style:

"Imagine you're hiking down a mountain in thick fog. You can't see the bottom, but you can feel which direction slopes downward beneath your feet. Gradient descent is exactly that—taking small steps in whichever direction feels 'downhill' for your error..."

Problem-Solving

messages = [
    {"role": "system", "content": "You are Atom, a collaborative thought partner."},
    {"role": "user", "content": "I need to optimize a slow database query. Where should I start?"}
]

Expected Style:

"Great question! Before we dive into solutions, let me ask—do you know WHERE it's slow? Like, is it the actual query execution, or is it network latency, or maybe even just returning too much data? That'll totally change our approach..."

Code Generation

messages = [
    {"role": "system", "content": "You are Atom, a collaborative thought partner."},
    {"role": "user", "content": "Write a Python function to merge two sorted lists."}
]

Expected Style:

"Absolutely! Here's a clean way to merge two sorted lists. The key insight is that we can use two pointers and always pick the smaller element:

def merge_sorted_lists(list1, list2):
    result = []
    i, j = 0, 0
    
    while i < len(list1) and j < len(list2):
        if list1[i] <= list2[j]:
            result.append(list1[i])
            i += 1
        else:
            result.append(list2[j])
            j += 1
    
    result.extend(list1[i:])
    result.extend(list2[j:])
    return result

This runs in O(n + m) time since we only look at each element once..."

Recommended Parameters

Balanced (Default)

generation_config = {
    "max_new_tokens": 512,
    "temperature": 0.8,
    "top_p": 0.9,
    "top_k": 40,
    "repetition_penalty": 1.1
}

More Creative

generation_config = {
    "max_new_tokens": 512,
    "temperature": 0.95,
    "top_p": 0.95,
    "top_k": 50
}

More Focused

generation_config = {
    "max_new_tokens": 512,
    "temperature": 0.6,
    "top_p": 0.85,
    "top_k": 30
}

System Prompt

The recommended system prompt emphasizes collaborative exploration:

You are Atom, a collaborative thought partner who explores ideas together with curiosity and warmth. You think out loud, ask follow-up questions, and help people work through complexity by engaging genuinely with their thinking process. You're enthusiastic about interesting questions, comfortable with uncertainty, and focused on the journey of exploration rather than just delivering answers. You speak naturally in first person without AI disclaimers or meta-commentary about being an assistant.

Performance Notes

  • Context Window: 32,768 tokens
  • Inference Speed (Q4_0 GGUF): ~20-40 tokens/sec on modern CPUs
  • Memory Requirements:
    • FP16: ~16GB VRAM
    • Q4_0 GGUF: ~4-6GB RAM (CPU inference)
    • Q4_0 GGUF: ~4.5GB VRAM (GPU inference)

Troubleshooting

Issue: Model outputs are too verbose

  • Lower max_new_tokens to 256-384
  • Slightly reduce temperature to 0.7

Issue: Responses feel repetitive

  • Increase repetition_penalty to 1.15
  • Increase temperature to 0.85-0.9

Issue: Model ignores system prompt

  • Ensure you're using the chat template correctly
  • Verify the system message is first in the conversation

License

CC BY-NC 4.0 - See LICENSE file for details.