smcleish/deepscaler-1.5b-8k-hard-first-run-with-shuffle-8k-400-chkpt-step-400 Text Generation • 2B • Updated about 2 hours ago
smcleish/Qwen3-Embedding-0.6B-Qwen3-4B-Instruct-2507-cs16-summary_mean-bst1024-attn-mlp-ov256 Updated about 2 hours ago
smcleish/deepscaler-1.5b-8k-easy-first-run-with-shuffle-8k-400-chkpt-step-400 Text Generation • 2B • Updated about 2 hours ago
smcleish/deepscaler-1.5b-8k-easy-first-run-with-shuffle-8k-500-chkpt-step-400 Text Generation • 2B • Updated about 2 hours ago
smcleish/Qwen3-Embedding-0.6B-Qwen3-4B-Instruct-2507-cs16-summary_mean-bst1024-attn Updated about 2 hours ago
smcleish/deepscaler-1.5b-8k-hard-first-run-with-shuffle-8k-400-chkpt-step-200 Text Generation • 2B • Updated 2 days ago • 8
smcleish/deepscaler-1.5b-8k-hard-first-run-with-shuffle-8k-500-chkpt-step-200 Text Generation • 2B • Updated 2 days ago • 6
smcleish/deepscaler-1.5b-8k-hard-first-run-with-shuffle-step500 Text Generation • 2B • Updated 2 days ago • 10
smcleish/deepscaler-1.5b-8k-easy-first-run-with-shuffle-step500 Text Generation • 2B • Updated 2 days ago • 11
smcleish/deepscaler-1.5b-8k-easy-first-run-with-shuffle-8k-400-chkpt-step-200 Text Generation • 2B • Updated 5 days ago • 13
smcleish/deepscaler-1.5b-8k-easy-first-run-with-shuffle-8k-500-chkpt-step-200 Text Generation • 2B • Updated 5 days ago • 16
smcleish/Qwen3-Embedding-0.6B-Qwen3-4B-Instruct-2507-cs16-summary_mean-bst1024-lr-1e5 Updated 7 days ago
smcleish/Qwen3-Embedding-0.6B-Qwen3-4B-Instruct-2507-cs16-summary_mean-bst1024-lr-3e6 Updated 7 days ago
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4-single-phase Text Generation • 0.8B • Updated Nov 11, 2025 • 1
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4-two-phase Text Generation • 0.8B • Updated Nov 11, 2025 • 2
smcleish/Recurrent-OLMo-2-0425-train-recurrence-4 Text Generation • 1B • Updated Nov 11, 2025 • 6 • 1
smcleish/Recurrent-OLMo-2-0425-train-recurrence-32 Text Generation • 1B • Updated Nov 11, 2025 • 30 • 2
smcleish/Recurrent-TinyLlama-3T-train-recurrence-32 Text Generation • 0.8B • Updated Nov 11, 2025 • 31 • 1
smcleish/Recurrent-TinyLlama-3T-train-recurrence-16 Text Generation • 0.8B • Updated Nov 11, 2025 • 1 • 1