It was pretrained on a single A100 like this:

python -m scripts.base_train --depth=14 --num-iterations=-1 --target-param-data-ratio=7.0 --device-batch-size=16 --window-pattern="L"

Some of the notable stats of the base model are as follows:

- depth: 14
- max_seq_len: 2048
- target_param_data_ratio: 7.0
- device_batch_size: 16
- Number of parameters: 399,114,882
- Number of FLOPs per token: 1.293684e+09
- Calculated number of iterations: 2192
- Number of training tokens: 1,149,239,296
- DDP world size: 1
- Minimum validation bpb: 0.8369
- Final validation bpb: 0.8369
- Total training flops: 1.486753e+18
- Peak memory usage: 20954.99MiB

The fine-tuned model was trained with:

python -m scripts.chat_sft --device-batch-size=16

And some of its stats:

- init_lr_frac: 0.8000
- warmup_ratio: 0.0000
- warmdown_ratio: 0.5000
- final_lr_frac: 0.0000
- eval_tokens: 20,971,520
- Number of iterations: 971
- Minimum validation bpb: 0.3426

The achieved CORE score of the base model is 0.1590. (For comparison, GPT-2 is 0.25). The achieved ChatCORE score of the fine-tuned model is 0.2404.

To incorporate this into your nanochat repo, download these files and then:

the token_bytes.pt, tokenizer.pkl have to go into ~/.cache/nanochat/tokenizer directory
the meta_002192.json and model_002192.pt have to go into ~/.cache/nanochat/base_checkpoints/d14/
the meta_000971.json and model_000971.pt have to go into ~/.cache/nanochat/chatsft_checkpoints/d14/

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support