Test-1-4000 — A 190M Parameter Narrative Engine

Overview

Test-1-4000 is the final training checkpoint of a compact decoder-only Transformer model built on the Llama architecture and trained on the TinyStories dataset.

The project focuses on studying how narrative coherence, logical consistency, and language fluency emerge inside small-scale language models through structured training.

By Step 4000, the model reaches a significantly higher level of generative stability and narrative fluency compared to earlier checkpoints, achieving a final training loss of 0.573 after nearly two full epochs of training.

Model Highlights

Feature	Specification
Architecture	Llama-based Decoder-only Transformer
Parameters	190.55 Million
Context Window	2048 Tokens
Final Training Step	4000
Final Training Loss	0.573
Precision	bfloat16
Attention Backend	Flash Attention 2
Compilation	torch.compile
Tokenizer	GPT-2 Tokenizer

Architecture

Component	Value
Hidden Dimension	768
Layers	12
Attention Heads	12
Intermediate Size	3072
Activation Function	SwiGLU
Normalization	RMSNorm
Vocabulary Size	50,257

The model uses Rotary Positional Embeddings (RoPE) for stable long-range token relationships across the 2048-token context window.

Training Progression

Phase 1 — Lexical Learning (0 → 250)

The model learned grammar, sentence formation, and common linguistic patterns.

Phase 2 — Relational Understanding (250 → 1000)

The model began associating entities, actions, and environments into logically connected sequences.

Phase 3 — Narrative Coherence (1000 → 2000)

Narrative continuity emerged. Stories developed stable structure, conflict resolution, and reduced contradiction.

Phase 4 — Emergent Narrative Intelligence (2000 → 3000)

The model improved in emotional consistency, long-range memory, and thematic continuity across generations.

Phase 5 — Fluent Generative Stability (3000 → 4000)

This final phase marked a transition from structured storytelling into fluent narrative generation.

The model became substantially better at:

maintaining tone,
producing natural sentence flow,
avoiding repetitive degeneration,
preserving character consistency,
and generating smoother transitions between events.

By this stage, generations began feeling less mechanically predicted and more organically written. Dialogue improved noticeably, pacing became more natural, and narrative structure stabilized across longer outputs.

The reduction in loss to 0.573 indicates a major improvement in predictive confidence and language fluency.

Training Configuration

Parameter	Value
Optimizer	AdamW
Learning Rate	5e-4
Scheduler	OneCycleLR
Weight Decay	0.01
Precision	bfloat16
Effective Batch Size	~262K tokens/step

Dataset

The model was trained on TinyStories, a synthetic storytelling dataset designed to teach language models reasoning and narrative structure using simplified vocabulary and clean writing patterns.

This allows the model to focus on:

causal reasoning,
narrative flow,
emotional continuity,
and long-range coherence.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "GODELEV/Test-1-4000"

tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Once upon a time, a boy found a silver key."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Final Notes

Test-1-4000 demonstrates that coherent and fluent narrative behavior can emerge in compact Transformer models when training is focused on clean, structured data and long-form consistency.

Despite its relatively small size, the model exhibits:

strong narrative fluency,
stable story progression,
coherent emotional structure,
and reliable long-context generation.

The project serves as an exploration into how efficient language models can develop increasingly sophisticated generative behavior through progressive training refinement.

Citation

@misc{test14000,
  title={Test-1-4000: A 190M Parameter Narrative Engine},
  author={GODELEV},
  year={2026}
}

Downloads last month: 25

Safetensors

Model size

0.2B params

Tensor type

F32

GODELEV
/

Test-1-4000