72 179

crumb

MotherEarth's profile picture

Astrra's profile picture

FCorteReal's profile picture

https://cephaloform.neocities.org/

cephaloform
aicrumb
crumb.bsky.social

AI & ML interests

None yet

Recent Activity

updated a model about 1 hour ago

crumb/qwen3.5-9B-cq3

updated a model about 3 hours ago

crumb/cq-example-adapter

published a model about 3 hours ago

crumb/cq-example-adapter

View all activity

Organizations

crumb 's collections 6

MoLora-v1

Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl

crumb/llama2-7b-moe-text-exp0-4

Updated Jul 19, 2023 • 6
crumb/llama2-7b-moe-text-exp1-4

Updated Jul 19, 2023 • 10 • 2
crumb/llama2-7b-moe-text-exp2-4

Updated Jul 19, 2023 • 8
crumb/llama2-7b-moe-text-exp3-4

Updated Jul 19, 2023 • 5

GPT2-Linear

GPT2 Models using Linear layers instead of Conv layers for convenience.

crumbly/gpt2-linear-xl

Text Generation • Updated Jul 18, 2023 • 27 • 1
crumbly/gpt2-linear-large

Text Generation • Updated Jul 17, 2023 • 10
crumbly/gpt2-linear-medium

Text Generation • Updated Jul 17, 2023 • 11
crumbly/gpt2-linear-small

Text Generation • Updated Jul 17, 2023 • 7

Cramp(ed) Models

Smaller models trained locally on my 2xA6000 Lambda Vector

crumbly/cramp-25m

Text Generation • Updated Feb 15, 2024 • 5 • 8
crumb/cramped-94m-8btok

Text Generation • Updated Oct 11, 2023 • 12 • 1

MoLora-v2

First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.

crumb/test-00-switchllama-i3b-f10b-e4-init

Text Generation • Updated Sep 13, 2023 • 32
crumb/test-00-qlora-wizmlpmix-c0

Updated Sep 4, 2023 • 4
crumb/test-00-qlora-wizmlpmix-c1

Updated Sep 4, 2023 • 2
crumb/test-00-qlora-wizmlpmix-c3

Updated Sep 4, 2023 • 2

Shrink Llama - V1

Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.

crumb/core1-base-464m-c4

Text Generation • 0.5B • Updated Sep 12, 2023 • 12
crumb/core1-base-464m-redpajama

Text Generation • Updated Sep 12, 2023 • 6

MoAT (More Artificial Tokens)

Allowing for the LM to learn a soft-"multi-step program" to predict future tokens instead of learning to predict future tokens itself.

crumb/16xF-6m-init

Text Generation • Updated Oct 16, 2023 • 17
crumb/32xF-6m-init

Text Generation • Updated Oct 16, 2023 • 11

MoLora-v1

Model assets for the first Mixture-of-Lora technique applied to Llama. https://bit.ly/48bqshl

crumb/llama2-7b-moe-text-exp0-4

Updated Jul 19, 2023 • 6
crumb/llama2-7b-moe-text-exp1-4

Updated Jul 19, 2023 • 10 • 2
crumb/llama2-7b-moe-text-exp2-4

Updated Jul 19, 2023 • 8
crumb/llama2-7b-moe-text-exp3-4

Updated Jul 19, 2023 • 5

MoLora-v2

First Prototype of the second iteration of MoLora utilizing mixture of expert techniques applied to the Llama2 model.

crumb/test-00-switchllama-i3b-f10b-e4-init

Text Generation • Updated Sep 13, 2023 • 32
crumb/test-00-qlora-wizmlpmix-c0

Updated Sep 4, 2023 • 4
crumb/test-00-qlora-wizmlpmix-c1

Updated Sep 4, 2023 • 2
crumb/test-00-qlora-wizmlpmix-c3

Updated Sep 4, 2023 • 2

GPT2-Linear

GPT2 Models using Linear layers instead of Conv layers for convenience.

crumbly/gpt2-linear-xl

Text Generation • Updated Jul 18, 2023 • 27 • 1
crumbly/gpt2-linear-large

Text Generation • Updated Jul 17, 2023 • 10
crumbly/gpt2-linear-medium

Text Generation • Updated Jul 17, 2023 • 11
crumbly/gpt2-linear-small

Text Generation • Updated Jul 17, 2023 • 7

Shrink Llama - V1

Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept.

crumb/core1-base-464m-c4

Text Generation • 0.5B • Updated Sep 12, 2023 • 12
crumb/core1-base-464m-redpajama

Text Generation • Updated Sep 12, 2023 • 6

Cramp(ed) Models

Smaller models trained locally on my 2xA6000 Lambda Vector

crumbly/cramp-25m

Text Generation • Updated Feb 15, 2024 • 5 • 8
crumb/cramped-94m-8btok

Text Generation • Updated Oct 11, 2023 • 12 • 1

MoAT (More Artificial Tokens)

Allowing for the LM to learn a soft-"multi-step program" to predict future tokens instead of learning to predict future tokens itself.

crumb/16xF-6m-init

Text Generation • Updated Oct 16, 2023 • 17
crumb/32xF-6m-init

Text Generation • Updated Oct 16, 2023 • 11