nanochat-d14 model, created with https://github.com/karpathy/nanochat/commit/5019acc
It was pretrained on a single A100 like this:
python -m scripts.base_train --depth=14 --num-iterations=-1 --target-param-data-ratio=7.0 --device-batch-size=16 --window-pattern="L"
Some of the notable stats of the base model are as follows:
- depth: 14
- max_seq_len: 2048
- target_param_data_ratio: 7.0
- device_batch_size: 16
- Number of parameters: 399,114,882
- Number of FLOPs per token: 1.293684e+09
- Calculated number of iterations: 2192
- Number of training tokens: 1,149,239,296
- DDP world size: 1
- Minimum validation bpb: 0.8369
- Final validation bpb: 0.8369
- Total training flops: 1.486753e+18
- Peak memory usage: 20954.99MiB
The fine-tuned model was trained with:
python -m scripts.chat_sft --device-batch-size=16
And some of its stats:
- init_lr_frac: 0.8000
- warmup_ratio: 0.0000
- warmdown_ratio: 0.5000
- final_lr_frac: 0.0000
- eval_tokens: 20,971,520
- Number of iterations: 971
- Minimum validation bpb: 0.3426
The achieved CORE score of the base model is 0.1590. (For comparison, GPT-2 is 0.25). The achieved ChatCORE score of the fine-tuned model is 0.2404.
To incorporate this into your nanochat repo, download these files and then:
- the
token_bytes.pt,tokenizer.pklhave to go into ~/.cache/nanochat/tokenizer directory - the
meta_002192.jsonandmodel_002192.pthave to go into ~/.cache/nanochat/base_checkpoints/d14/ - the
meta_000971.jsonandmodel_000971.pthave to go into ~/.cache/nanochat/chatsft_checkpoints/d14/
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support