A newer version of this model is available: GODELEV/Test-1-4000

Test-1-3000 — A 190M Parameter Narrative Intelligence Engine

Overview

Test-1-3000 is a compact yet remarkably capable decoder-only Transformer language model built upon the modern Llama architecture.

The project explores an important question in language model research:

How much narrative reasoning, coherence, and world understanding can emerge inside a small model when trained correctly?

Despite containing only 190.55 million parameters, Test-1-3000 demonstrates surprisingly advanced:

Narrative continuity
Character persistence
Long-range memory consistency
Emotional progression
Logical event sequencing
Contextual storytelling stability

The model was trained specifically for short-form narrative intelligence, focusing on coherent storytelling rather than broad internet-scale memorization.

Unlike many small models that generate fragmented or repetitive text, Test-1-3000 learns to maintain:

causal relationships,
stable story worlds,
emotional trajectories,
and meaningful resolutions across long contexts.

Key Highlights

Feature	Description
Architecture	Llama-based Decoder-only Transformer
Parameters	190.55 Million
Context Length	2048 Tokens
Final Training Step	3000
Final Training Loss	0.8516
Attention Optimization	Flash Attention 2
Compilation	`torch.compile`
Precision	bfloat16 Mixed Precision
Positional Encoding	Rotary Positional Embeddings (RoPE)

#What Makes Test-1-3000 Special?

Most compact language models struggle with:

maintaining consistency,
remembering earlier events,
resolving story arcs,
and avoiding repetition.

Test-1-3000 was trained with a different objective philosophy:

Narrative Intelligence First

Instead of optimizing for broad factual memorization, the model focuses on:

temporal continuity,
event causality,
emotional logic,
and narrative closure.

This creates a surprisingly stable storytelling engine capable of generating coherent multi-paragraph narratives with strong thematic flow.

Model Architecture

Test-1-3000 follows a modern efficient Transformer design optimized for both:

training stability,
and inference throughput.

The architecture borrows heavily from the proven Llama design philosophy while remaining lightweight enough for experimentation and rapid iteration.

Technical Specifications

Feature	Specification
Model Type	Decoder-only Transformer
Hidden Dimension	768
Layers (Depth)	12
Attention Heads	12
Intermediate Size	3072
Activation Function	SwiGLU
Normalization	RMSNorm
Vocabulary Size	50,257
Tokenizer	GPT-2 Tokenizer
Context Window	2048 Tokens
Precision	bfloat16
Attention Backend	Flash Attention 2

Positional Understanding with RoPE

Test-1-3000 uses Rotary Positional Embeddings (RoPE) to maintain precise token relationship awareness throughout long contexts.

This allows the model to:

track entities across paragraphs,
preserve story continuity,
maintain dialogue references,
and understand long-range dependencies efficiently.

For a model of this scale, the 2048-token context window provides unusually strong narrative memory.

#The Evolution of Learning

Training Test-1-3000 revealed clear emergent phases of cognitive development.

The model did not merely memorize text patterns — it progressively developed increasingly sophisticated representations of narrative structure and world dynamics.

#The Lexical Phase

(Steps 0 → 250)

At the beginning of training, the model learned the statistical foundations of language.

It discovered:

common sentence structures,
punctuation behavior,
frequent vocabulary patterns,
and story-opening syntax.

During this phase, phrases such as:

"Once upon a time"

became strong narrative anchors.

The model began constructing basic grammatical fluency but still lacked deeper logical understanding.

Characteristics

High repetition
Weak memory
Poor event continuity
Basic syntax acquisition

The Relational Phase

(Steps 250 → 1000)

The model started connecting concepts together into meaningful relationships.

It learned:

object interactions,
spatial reasoning,
basic causality,
and action consistency.

For example:

parks imply trees and playing,
rain implies umbrellas or wetness,
sadness often precedes comfort or resolution.

The training loss rapidly decreased below 1.5, signaling major improvements in structural reasoning.

Emergent Behaviors

Scene consistency
Character-action alignment
Basic emotional logic
Improved descriptive continuity

The Coherence Phase

(Steps 1000 → 2000)

This phase marked the emergence of true narrative stabilization.

The model learned:

story pacing,
setup/payoff relationships,
conflict resolution,
and multi-sentence thematic continuity.

Stories no longer collapsed into unrelated fragments.

Instead, the model began maintaining:

stable goals,
emotional arcs,
and logical conclusions.

If a story introduced a problem:

"Lily was lonely."

the model increasingly learned to produce meaningful emotional resolutions later in the text.

Major Improvements

Long-range memory
Reduced contradiction
Better endings
Stronger narrative flow
Lower hallucination frequency

Final loss at this stage:

Step	Loss
2000	1.27

The Emergent Narrative Intelligence Phase

(Steps 2000 → 3000)

This final stage represented a major leap in generative sophistication.

Rather than simply maintaining coherence, the model began exhibiting signs of:

implicit world modeling,
narrative anticipation,
emotional persistence,
and latent planning behavior.

The model increasingly understood that stories possess:

momentum,
consequences,
emotional gravity,
and thematic closure.

Characters began behaving more consistently across long contexts.

Events earlier in stories influenced future generations more reliably.

The model also became significantly better at:

avoiding repetitive loops,
maintaining tone,
preserving narrative identity,
and generating cleaner transitions between scenes.

Emergent Capabilities

Multi-event causal chaining
Persistent emotional tone
Improved dialogue continuity
Better conflict resolution
Reduced topic drift
More natural pacing
Stronger thematic stability

Most importantly:

The model began generating stories that feel intentionally written rather than statistically assembled.

#Final Training Statistics

Metric	Value
Final Step	3000
Final Loss	0.8516
Training Stability	Excellent
Gradient Behavior	Stable
Divergence Events	None Observed

Training Configuration

Hyperparameters

Parameter	Value
Optimizer	AdamW
Betas	β₁=0.9, β₂=0.95
Learning Rate	5e-4
Scheduler	OneCycleLR
Weight Decay	0.01
Precision	bfloat16
Compilation	torch.compile
Attention Optimization	Flash Attention 2
Effective Batch Size	~262,144 Tokens / Step

Dataset

TinyStories (2M)

Test-1-3000 was trained on the TinyStories dataset.

TinyStories is uniquely valuable because it isolates:

narrative structure,
reasoning,
consistency,
and causality

without the overwhelming informational noise of the open web.

The stories use:

child-level vocabulary,
but professionally structured narrative composition.

This creates an ideal environment for studying emergent reasoning inside small language models.

Training Philosophy

The project intentionally prioritizes:

coherence over memorization,
reasoning over factual retrieval,
and narrative intelligence over benchmark chasing.

The goal is not merely to create a chatbot.

The goal is to study:

how structured cognition emerges inside compact neural systems.

#Usage — Quick Start

Install dependencies:

pip install transformers torch accelerate

Inference Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "GODELEV/Test-1-3000"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prompt
prompt = "Once upon a time, Tom found a blue car."

inputs = tokenizer(
    prompt,
    return_tensors="pt"
).to(model.device)

# Generate
output = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Recommended Generation Settings

Parameter	Recommended
Temperature	0.7
Top-p	0.9
Repetition Penalty	1.1
Max Tokens	128–512
Sampling	Enabled

Observed Emergent Behaviors

During evaluation, the model demonstrated:

Character persistence
Goal-oriented progression
Emotional continuity
Environmental consistency
Contextual callbacks
Story resolution awareness

These behaviors are especially notable given the model's relatively small parameter count.

Limitations

Although highly capable for its size, Test-1-3000 still has limitations:

Limited factual world knowledge
Occasional repetition in very long generations
Reduced reasoning performance outside storytelling domains
Less stable beyond trained narrative styles

The model is optimized specifically for:

coherent short-form storytelling.

📜 Citation

@misc{test13000,
  title={Test-1-3000: A 190M Parameter Narrative Intelligence Engine},
  author={GODELEV},
  year={2026},
  note={Compact narrative-focused language model trained on TinyStories}
}

License

This project is intended for:

research,
experimentation,
educational use,
and open exploration of compact language models.

Final Thoughts

Test-1-3000 demonstrates that meaningful narrative intelligence can emerge inside surprisingly small neural systems when training is focused, clean, and structurally optimized.

At only 190M parameters, the model exhibits behaviors often associated with significantly larger systems:

narrative planning,
emotional continuity,
causal consistency,
and coherent resolution generation.

The project serves as both:

a practical storytelling model,
and an experiment in emergent cognition within compact architectures.

“Small models are not weak models.

They are compressed intelligence waiting to emerge.”

````

Downloads last month: 74

Safetensors

Model size

0.2B params

Tensor type

F32

GODELEV
/

Test-1-3000