Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

maxine's picture

maxine

crumb

Nuri-Tas's profile picture

seasnake's profile picture

gabrielchua's profile picture

·

cephaloform
aicrumb
crumb.bsky.social

AI & ML interests

small models

Organizations

crumb 's collections 5

Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl

crumb/llama2-7b-moe-text-exp0-4

Updated Jul 19, 2023 • 15
crumb/llama2-7b-moe-text-exp1-4

Updated Jul 19, 2023 • 14 • 2
crumb/llama2-7b-moe-text-exp2-4

Updated Jul 19, 2023 • 8
crumb/llama2-7b-moe-text-exp3-4

Updated Jul 19, 2023 • 6

GPT2 Models using Linear layers instead of Conv layers for convenience.

crumbly/gpt2-linear-xl

Text Generation • Updated Jul 18, 2023 • 273 • 1
crumbly/gpt2-linear-large

Text Generation • Updated Jul 17, 2023 • 2
crumbly/gpt2-linear-medium

Text Generation • Updated Jul 17, 2023 • 2
crumbly/gpt2-linear-small

Text Generation • Updated Jul 17, 2023 • 4

Cramp(ed) Models

Smaller models trained locally on my 2xA6000 Lambda Vector

crumbly/cramp-25m

Text Generation • Updated Feb 15, 2024 • 145 • 8
crumb/cramped-94m-8btok

Text Generation • Updated Oct 11, 2023 • 3 • 1

First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.

crumb/test-00-switchllama-i3b-f10b-e4-init

Text Generation • Updated Sep 13, 2023 • 9
crumb/test-00-qlora-wizmlpmix-c0

Updated Sep 4, 2023 • 3
crumb/test-00-qlora-wizmlpmix-c1

Updated Sep 4, 2023 • 2
crumb/test-00-qlora-wizmlpmix-c3

Updated Sep 4, 2023 • 2

Shrink Llama - V1

Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.

crumb/core1-base-464m-c4

Text Generation • 0.5B • Updated Sep 12, 2023 • 9
crumb/core1-base-464m-redpajama

Text Generation • Updated Sep 12, 2023 • 6

Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl

crumb/llama2-7b-moe-text-exp0-4

Updated Jul 19, 2023 • 15
crumb/llama2-7b-moe-text-exp1-4

Updated Jul 19, 2023 • 14 • 2
crumb/llama2-7b-moe-text-exp2-4

Updated Jul 19, 2023 • 8
crumb/llama2-7b-moe-text-exp3-4

Updated Jul 19, 2023 • 6

First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.

crumb/test-00-switchllama-i3b-f10b-e4-init

Text Generation • Updated Sep 13, 2023 • 9
crumb/test-00-qlora-wizmlpmix-c0

Updated Sep 4, 2023 • 3
crumb/test-00-qlora-wizmlpmix-c1

Updated Sep 4, 2023 • 2
crumb/test-00-qlora-wizmlpmix-c3

Updated Sep 4, 2023 • 2

GPT2 Models using Linear layers instead of Conv layers for convenience.

crumbly/gpt2-linear-xl

Text Generation • Updated Jul 18, 2023 • 273 • 1
crumbly/gpt2-linear-large

Text Generation • Updated Jul 17, 2023 • 2
crumbly/gpt2-linear-medium

Text Generation • Updated Jul 17, 2023 • 2
crumbly/gpt2-linear-small

Text Generation • Updated Jul 17, 2023 • 4

Shrink Llama - V1

Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.

crumb/core1-base-464m-c4

Text Generation • 0.5B • Updated Sep 12, 2023 • 9
crumb/core1-base-464m-redpajama

Text Generation • Updated Sep 12, 2023 • 6

Cramp(ed) Models

Smaller models trained locally on my 2xA6000 Lambda Vector

crumbly/cramp-25m

Text Generation • Updated Feb 15, 2024 • 145 • 8
crumb/cramped-94m-8btok

Text Generation • Updated Oct 11, 2023 • 3 • 1

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs