Test-1-4000 β A 190M Parameter Narrative Engine
Overview
Test-1-4000 is the final training checkpoint of a compact decoder-only Transformer model built on the Llama architecture and trained on the TinyStories dataset.
The project focuses on studying how narrative coherence, logical consistency, and language fluency emerge inside small-scale language models through structured training.
By Step 4000, the model reaches a significantly higher level of generative stability and narrative fluency compared to earlier checkpoints, achieving a final training loss of 0.573 after nearly two full epochs of training.
Model Highlights
| Feature | Specification |
|---|---|
| Architecture | Llama-based Decoder-only Transformer |
| Parameters | 190.55 Million |
| Context Window | 2048 Tokens |
| Final Training Step | 4000 |
| Final Training Loss | 0.573 |
| Precision | bfloat16 |
| Attention Backend | Flash Attention 2 |
| Compilation | torch.compile |
| Tokenizer | GPT-2 Tokenizer |
Architecture
| Component | Value |
|---|---|
| Hidden Dimension | 768 |
| Layers | 12 |
| Attention Heads | 12 |
| Intermediate Size | 3072 |
| Activation Function | SwiGLU |
| Normalization | RMSNorm |
| Vocabulary Size | 50,257 |
The model uses Rotary Positional Embeddings (RoPE) for stable long-range token relationships across the 2048-token context window.
Training Progression
Phase 1 β Lexical Learning (0 β 250)
The model learned grammar, sentence formation, and common linguistic patterns.
Phase 2 β Relational Understanding (250 β 1000)
The model began associating entities, actions, and environments into logically connected sequences.
Phase 3 β Narrative Coherence (1000 β 2000)
Narrative continuity emerged. Stories developed stable structure, conflict resolution, and reduced contradiction.
Phase 4 β Emergent Narrative Intelligence (2000 β 3000)
The model improved in emotional consistency, long-range memory, and thematic continuity across generations.
Phase 5 β Fluent Generative Stability (3000 β 4000)
This final phase marked a transition from structured storytelling into fluent narrative generation.
The model became substantially better at:
- maintaining tone,
- producing natural sentence flow,
- avoiding repetitive degeneration,
- preserving character consistency,
- and generating smoother transitions between events.
By this stage, generations began feeling less mechanically predicted and more organically written. Dialogue improved noticeably, pacing became more natural, and narrative structure stabilized across longer outputs.
The reduction in loss to 0.573 indicates a major improvement in predictive confidence and language fluency.
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 5e-4 |
| Scheduler | OneCycleLR |
| Weight Decay | 0.01 |
| Precision | bfloat16 |
| Effective Batch Size | ~262K tokens/step |
Dataset
The model was trained on TinyStories, a synthetic storytelling dataset designed to teach language models reasoning and narrative structure using simplified vocabulary and clean writing patterns.
This allows the model to focus on:
- causal reasoning,
- narrative flow,
- emotional continuity,
- and long-range coherence.
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "GODELEV/Test-1-4000"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Once upon a time, a boy found a silver key."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Final Notes
Test-1-4000 demonstrates that coherent and fluent narrative behavior can emerge in compact Transformer models when training is focused on clean, structured data and long-form consistency.
Despite its relatively small size, the model exhibits:
- strong narrative fluency,
- stable story progression,
- coherent emotional structure,
- and reliable long-context generation.
The project serves as an exploration into how efficient language models can develop increasingly sophisticated generative behavior through progressive training refinement.
Citation
@misc{test14000,
title={Test-1-4000: A 190M Parameter Narrative Engine},
author={GODELEV},
year={2026}
}
- Downloads last month
- 25