A newer version of this model is available: GODELEV/Test-1-4000

Test-1-3000 β€” A 190M Parameter Narrative Intelligence Engine

Architecture Parameters Context Framework Training


Overview

Test-1-3000 is a compact yet remarkably capable decoder-only Transformer language model built upon the modern Llama architecture.

The project explores an important question in language model research:

How much narrative reasoning, coherence, and world understanding can emerge inside a small model when trained correctly?

Despite containing only 190.55 million parameters, Test-1-3000 demonstrates surprisingly advanced:

  • Narrative continuity
  • Character persistence
  • Long-range memory consistency
  • Emotional progression
  • Logical event sequencing
  • Contextual storytelling stability

The model was trained specifically for short-form narrative intelligence, focusing on coherent storytelling rather than broad internet-scale memorization.

Unlike many small models that generate fragmented or repetitive text, Test-1-3000 learns to maintain:

  • causal relationships,
  • stable story worlds,
  • emotional trajectories,
  • and meaningful resolutions across long contexts.

Key Highlights

Feature Description
Architecture Llama-based Decoder-only Transformer
Parameters 190.55 Million
Context Length 2048 Tokens
Final Training Step 3000
Final Training Loss 0.8516
Attention Optimization Flash Attention 2
Compilation torch.compile
Precision bfloat16 Mixed Precision
Positional Encoding Rotary Positional Embeddings (RoPE)

#What Makes Test-1-3000 Special?

Most compact language models struggle with:

  • maintaining consistency,
  • remembering earlier events,
  • resolving story arcs,
  • and avoiding repetition.

Test-1-3000 was trained with a different objective philosophy:

Narrative Intelligence First

Instead of optimizing for broad factual memorization, the model focuses on:

  • temporal continuity,
  • event causality,
  • emotional logic,
  • and narrative closure.

This creates a surprisingly stable storytelling engine capable of generating coherent multi-paragraph narratives with strong thematic flow.


Model Architecture

Test-1-3000 follows a modern efficient Transformer design optimized for both:

  • training stability,
  • and inference throughput.

The architecture borrows heavily from the proven Llama design philosophy while remaining lightweight enough for experimentation and rapid iteration.


Technical Specifications

Feature Specification
Model Type Decoder-only Transformer
Hidden Dimension 768
Layers (Depth) 12
Attention Heads 12
Intermediate Size 3072
Activation Function SwiGLU
Normalization RMSNorm
Vocabulary Size 50,257
Tokenizer GPT-2 Tokenizer
Context Window 2048 Tokens
Precision bfloat16
Attention Backend Flash Attention 2

Positional Understanding with RoPE

Test-1-3000 uses Rotary Positional Embeddings (RoPE) to maintain precise token relationship awareness throughout long contexts.

This allows the model to:

  • track entities across paragraphs,
  • preserve story continuity,
  • maintain dialogue references,
  • and understand long-range dependencies efficiently.

For a model of this scale, the 2048-token context window provides unusually strong narrative memory.


#The Evolution of Learning

Training Test-1-3000 revealed clear emergent phases of cognitive development.

The model did not merely memorize text patterns β€” it progressively developed increasingly sophisticated representations of narrative structure and world dynamics.


#The Lexical Phase

(Steps 0 β†’ 250)

At the beginning of training, the model learned the statistical foundations of language.

It discovered:

  • common sentence structures,
  • punctuation behavior,
  • frequent vocabulary patterns,
  • and story-opening syntax.

During this phase, phrases such as:

"Once upon a time"

became strong narrative anchors.

The model began constructing basic grammatical fluency but still lacked deeper logical understanding.

Characteristics

  • High repetition
  • Weak memory
  • Poor event continuity
  • Basic syntax acquisition

The Relational Phase

(Steps 250 β†’ 1000)

The model started connecting concepts together into meaningful relationships.

It learned:

  • object interactions,
  • spatial reasoning,
  • basic causality,
  • and action consistency.

For example:

  • parks imply trees and playing,
  • rain implies umbrellas or wetness,
  • sadness often precedes comfort or resolution.

The training loss rapidly decreased below 1.5, signaling major improvements in structural reasoning.

Emergent Behaviors

  • Scene consistency
  • Character-action alignment
  • Basic emotional logic
  • Improved descriptive continuity

The Coherence Phase

(Steps 1000 β†’ 2000)

This phase marked the emergence of true narrative stabilization.

The model learned:

  • story pacing,
  • setup/payoff relationships,
  • conflict resolution,
  • and multi-sentence thematic continuity.

Stories no longer collapsed into unrelated fragments.

Instead, the model began maintaining:

  • stable goals,
  • emotional arcs,
  • and logical conclusions.

If a story introduced a problem:

"Lily was lonely."

the model increasingly learned to produce meaningful emotional resolutions later in the text.

Major Improvements

  • Long-range memory
  • Reduced contradiction
  • Better endings
  • Stronger narrative flow
  • Lower hallucination frequency

Final loss at this stage:

Step Loss
2000 1.27

The Emergent Narrative Intelligence Phase

(Steps 2000 β†’ 3000)

This final stage represented a major leap in generative sophistication.

Rather than simply maintaining coherence, the model began exhibiting signs of:

  • implicit world modeling,
  • narrative anticipation,
  • emotional persistence,
  • and latent planning behavior.

The model increasingly understood that stories possess:

  • momentum,
  • consequences,
  • emotional gravity,
  • and thematic closure.

Characters began behaving more consistently across long contexts.

Events earlier in stories influenced future generations more reliably.

The model also became significantly better at:

  • avoiding repetitive loops,
  • maintaining tone,
  • preserving narrative identity,
  • and generating cleaner transitions between scenes.

Emergent Capabilities

  • Multi-event causal chaining
  • Persistent emotional tone
  • Improved dialogue continuity
  • Better conflict resolution
  • Reduced topic drift
  • More natural pacing
  • Stronger thematic stability

Most importantly:

The model began generating stories that feel intentionally written rather than statistically assembled.


#Final Training Statistics

Metric Value
Final Step 3000
Final Loss 0.8516
Training Stability Excellent
Gradient Behavior Stable
Divergence Events None Observed

Training Configuration

Hyperparameters

Parameter Value
Optimizer AdamW
Betas β₁=0.9, Ξ²β‚‚=0.95
Learning Rate 5e-4
Scheduler OneCycleLR
Weight Decay 0.01
Precision bfloat16
Compilation torch.compile
Attention Optimization Flash Attention 2
Effective Batch Size ~262,144 Tokens / Step

Dataset

TinyStories (2M)

Test-1-3000 was trained on the TinyStories dataset.

TinyStories is uniquely valuable because it isolates:

  • narrative structure,
  • reasoning,
  • consistency,
  • and causality

without the overwhelming informational noise of the open web.

The stories use:

  • child-level vocabulary,
  • but professionally structured narrative composition.

This creates an ideal environment for studying emergent reasoning inside small language models.


Training Philosophy

The project intentionally prioritizes:

  • coherence over memorization,
  • reasoning over factual retrieval,
  • and narrative intelligence over benchmark chasing.

The goal is not merely to create a chatbot.

The goal is to study:

how structured cognition emerges inside compact neural systems.


#Usage β€” Quick Start

Install dependencies:

pip install transformers torch accelerate

Inference Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "GODELEV/Test-1-3000"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prompt
prompt = "Once upon a time, Tom found a blue car."

inputs = tokenizer(
    prompt,
    return_tensors="pt"
).to(model.device)

# Generate
output = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Recommended Generation Settings

Parameter Recommended
Temperature 0.7
Top-p 0.9
Repetition Penalty 1.1
Max Tokens 128–512
Sampling Enabled

Observed Emergent Behaviors

During evaluation, the model demonstrated:

  • Character persistence
  • Goal-oriented progression
  • Emotional continuity
  • Environmental consistency
  • Contextual callbacks
  • Story resolution awareness

These behaviors are especially notable given the model's relatively small parameter count.


Limitations

Although highly capable for its size, Test-1-3000 still has limitations:

  • Limited factual world knowledge
  • Occasional repetition in very long generations
  • Reduced reasoning performance outside storytelling domains
  • Less stable beyond trained narrative styles

The model is optimized specifically for:

coherent short-form storytelling.


``


πŸ“œ Citation

@misc{test13000,
  title={Test-1-3000: A 190M Parameter Narrative Intelligence Engine},
  author={GODELEV},
  year={2026},
  note={Compact narrative-focused language model trained on TinyStories}
}

License

This project is intended for:

  • research,
  • experimentation,
  • educational use,
  • and open exploration of compact language models.

Final Thoughts

Test-1-3000 demonstrates that meaningful narrative intelligence can emerge inside surprisingly small neural systems when training is focused, clean, and structurally optimized.

At only 190M parameters, the model exhibits behaviors often associated with significantly larger systems:

  • narrative planning,
  • emotional continuity,
  • causal consistency,
  • and coherent resolution generation.

The project serves as both:

  • a practical storytelling model,
  • and an experiment in emergent cognition within compact architectures.

β€œSmall models are not weak models.

They are compressed intelligence waiting to emerge.”

````
Downloads last month
74
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train GODELEV/Test-1-3000