Neural-Hacker
/

OpenMath

Text Generation

Model card Files Files and versions

Neural-Hacker commited on Feb 8

Commit

3540770

·

verified ·

1 Parent(s): bbba8b8

Update README.md

Files changed (1) hide show

README.md +26 -15

README.md CHANGED Viewed

@@ -57,21 +57,32 @@ Only the solution portion of each example was used for loss computation through
 ## Training Configuration
-Method: LoRA (full precision, bfloat16)
-Precision: bfloat16 (no 4-bit quantization)
-LoRA rank: 16
-LoRA alpha: 32
-LoRA dropout: 0.05
-Target modules: q_proj, k_proj, v_proj, o_proj
-Max sequence length: 1024
-Batch size: 2
-Gradient accumulation: 8
-Effective batch size: 16
-Learning rate: 1e-4
-Optimizer: adamw_torch
-Scheduler: cosine
-Warmup: 5 percent
-Epochs: 3
 ---

 ## Training Configuration
+## Training Configuration (MI300X Run)
+**Method:** LoRA (full precision, bfloat16)
+**Precision:** bfloat16 (no 4-bit quantization)
+**LoRA settings**
+- Rank: 16
+- Alpha: 32
+- Dropout: 0.05
+- Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`
+**Data & sequence**
+- Max sequence length: 1024
+**Optimization**
+- Batch size: 2
+- Gradient accumulation: 8
+- **Effective batch size:** 16
+- Learning rate: 1e-4
+- Optimizer: `adamw_torch`
+- Scheduler: cosine
+- Warmup: 5%
+**Training**
+- Epochs: 3
 ---