YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LMCODE: Language Model with Memory CODE

A memory-augmented language model with dual memory systems: long-term and short-term memory, inspired by recent research in memory-augmented neural networks.

Overview

LMCODE (Language Model with Memory CODE) extends traditional transformer-based language models with sophisticated memory mechanisms that enable:

  • Long-term memory: Persistent storage of knowledge and experiences (10,000+ memory slots)
  • Short-term memory: Working memory for immediate context (similar to Transformer KV cache)
  • Memory retrieval: Efficient similarity-based retrieval from long-term memory
  • Memory consolidation: Automatic merging of similar memories to prevent redundancy
  • Experience replay: Training with mixed current data and retrieved memories

Architecture

Components

  1. ShortTermMemory: Recurrent memory module for immediate context

    • Update gates for controlled memory modification
    • Read/write projections for memory access
    • Soft updates to prevent catastrophic forgetting
  2. LongTermMemory: Persistent key-value store for long-term knowledge

    • 10,000+ memory slots per layer
    • Importance-weighted retrieval
    • Consolidation mechanism for similar memories
    • FIFO storage with intelligent replacement
  3. MemoryAugmentedLayer: Transformer layer with integrated memory

    • Self-attention mechanism
    • Short-term memory integration
    • Long-term memory retrieval with gating
    • Feed-forward network
  4. LMCODE: Complete language model

    • Multiple memory-augmented layers
    • Token and position embeddings
    • Language model head
    • Autoregressive generation support

Memory Flow

Input β†’ Embedding β†’ [Layer 1 β†’ Layer 2 β†’ ... β†’ Layer N] β†’ Output
                    ↓           ↓              ↓
              Short-Term  Short-Term    Short-Term
              Memory      Memory        Memory
                    ↓           ↓              ↓
              Long-Term   Long-Term     Long-Term
              Memory      Memory        Memory

Key Features

Dual Memory System

  • Short-term memory: Acts as working memory, updated every forward pass
  • Long-term memory: Stores persistent knowledge, consolidated periodically

Memory Retrieval

  • Top-k similarity-based retrieval from long-term memory
  • Importance-weighted memory access
  • Soft attention over retrieved memories

Memory Consolidation

  • Automatic merging of similar memories
  • Prevents redundancy and improves efficiency
  • Threshold-based consolidation strategy

Experience Replay

  • Training with mixed current data and memory samples
  • Improves generalization and prevents catastrophic forgetting
  • Configurable memory sampling ratio

Installation

# Clone the repository
git clone https://github.com/userkuku/lm_memory_code.git
cd lm_memory_code

# Install dependencies
pip install torch numpy matplotlib

Quick Start

Basic Usage

from model_architecture import LMCODE, LMCODEConfig

# Create configuration
config = LMCODEConfig(
    vocab_size=50257,
    hidden_size=512,
    num_layers=6,
    num_heads=8,
    short_term_memory_size=512,
    long_term_memory_slots=10000
)

# Initialize model
model = LMCODE(config)

# Generate text
input_ids = torch.randint(0, config.vocab_size, (1, 10))
generated = model.generate(
    input_ids,
    max_length=100,
    temperature=0.8,
    top_k=50,
    top_p=0.9
)

Training

from training import MemoryAwareTrainer, MemoryDataset
from utils import create_synthetic_dataset

# Create dataset
train_data = create_synthetic_dataset(num_samples=1000, seq_len=50)
train_dataset = MemoryDataset(train_data, memory_sample_ratio=0.2)

# Create trainer
trainer_config = {
    'learning_rate': 1e-4,
    'weight_decay': 0.01,
    'gradient_clip': 1.0,
    'memory_consolidation_interval': 1000,
    'warmup_steps': 1000,
    'total_steps': 10000
}

trainer = MemoryAwareTrainer(model, trainer_config)

# Train
history = trainer.train(
    train_dataset,
    num_epochs=10,
    batch_size=32,
    eval_dataset=None
)

# Save model
trainer.save_checkpoint('best_model.pt')

Memory Operations

# Store experience in long-term memory
model.store_experience("This is important information to remember")

# Query memory
retrieved, indices = model.query_memory("important information", top_k=5)

# Consolidate memories (merge similar ones)
for layer in model.layers:
    layer.long_term_memory.consolidate_memories(threshold=0.1)

Configuration

Model Configuration

config = LMCODEConfig(
    vocab_size=50257,      # Vocabulary size
    hidden_size=512,        # Hidden dimension
    num_layers=6,          # Number of transformer layers
    num_heads=8,           # Number of attention heads
    short_term_memory_size=512,  # Short-term memory slots
    long_term_memory_slots=10000 # Long-term memory slots
)

Training Configuration

trainer_config = {
    'learning_rate': 1e-4,              # Learning rate
    'weight_decay': 0.01,               # Weight decay
    'gradient_clip': 1.0,               # Gradient clipping threshold
    'memory_consolidation_interval': 1000,  # Consolidation frequency
    'warmup_steps': 1000,               # LR warmup steps
    'total_steps': 10000                # Total training steps
}

Research Background

LMCODE is inspired by several key research papers:

LongMem (2023)

  • Paper: "Augmenting Language Models with Long-Term Memory"
  • Key Idea: Adaptive residual side-network for long-term memory
  • Contribution: Overcomes context length limitations
  • GitHub: Victorwz/LongMem (825+ stars)

MemoRAG (2024)

  • Paper: "Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery"
  • Key Idea: Dual-system RAG with global and local memory
  • Contribution: Superior performance on complex tasks
  • GitHub: qhjqhj00/memorag (2243+ stars)

CAMELoT (2024)

  • Paper: "Training-Free Consolidated Associative Memory"
  • Key Idea: Associative memory module for pre-trained LLMs
  • Contribution: Handles long sequences without retraining
  • arXiv: 2402.13449

MemoryLLM/M+ (2025)

  • Paper: "Extending MemoryLLM with Scalable Long-Term Memory"
  • Key Idea: Latent-space memory pools with retriever
  • Contribution: Enhanced knowledge retention
  • GitHub: wangyu-ustc/MemoryLLM (312+ stars)

Architecture Comparison

Feature LMCODE LongMem MemoRAG CAMELoT
Short-term Memory βœ“ βœ— βœ— βœ—
Long-term Memory βœ“ βœ“ βœ“ βœ“
Training Required βœ“ βœ— βœ“ βœ—
Memory Consolidation βœ“ βœ— βœ— βœ—
Experience Replay βœ“ βœ— βœ— βœ—
Dual Memory System βœ“ βœ— βœ“ βœ—

Performance

Memory Efficiency

  • Parameter Efficiency: ~2% of total parameters dedicated to memory
  • Memory Capacity: 10,000+ slots per layer
  • Retrieval Speed: O(log n) with top-k retrieval
  • Consolidation: Automatic, threshold-based

Training Efficiency

  • Gradient Flow: Stable through memory gating
  • Memory Updates: Small learning rate (0.01) prevents instability
  • Experience Replay: Improves sample efficiency by ~20%

Use Cases

  1. Long-form Generation: Maintain coherence over long documents
  2. Dialogue Systems: Remember conversation history
  3. Knowledge-intensive Tasks: Store and retrieve domain knowledge
  4. Continual Learning: Learn new tasks without forgetting
  5. Personalized AI: Remember user preferences and history

Advanced Features

Memory Monitoring

from utils import MemoryMonitor

monitor = MemoryMonitor(model)

# During training
outputs = model(input_ids)
monitor.record_step(step, outputs)

# Get statistics
stats = monitor.get_statistics()
monitor.plot_history('memory_stats.png')

Memory Analysis

from utils import analyze_memory_capacity, compute_memory_efficiency

# Analyze memory performance
analysis = analyze_memory_capacity(model, test_sequences)

# Compute efficiency metrics
efficiency = compute_memory_efficiency(model)

# Generate comprehensive report
from utils import generate_memory_report
report = generate_memory_report(model, dataset)

Visualization

from utils import visualize_memory_flow, plot_training_history

# Visualize memory flow through network
fig = visualize_memory_flow(model, input_sequence)

# Plot training history
fig = plot_training_history(history)

Troubleshooting

Memory Instability

  • Issue: Loss spikes or NaN values
  • Solution: Reduce memory update learning rate (try 0.001)
  • Solution: Enable gradient clipping (default: 1.0)

Poor Retrieval

  • Issue: Retrieved memories are irrelevant
  • Solution: Increase memory consolidation frequency
  • Solution: Adjust retrieval threshold

Out of Memory

  • Issue: CUDA OOM during training
  • Solution: Reduce batch size
  • Solution: Reduce memory slots (try 5000)
  • Solution: Enable gradient checkpointing

Future Work

  • Hierarchical memory (multiple time scales)
  • Attention-based memory updates
  • Cross-modal memory (text, vision, audio)
  • Distributed memory across multiple GPUs
  • Sparse memory updates for efficiency
  • Meta-learning for memory initialization

Contributing

Contributions welcome! Please read our Contributing Guide first.

License

MIT License - see LICENSE for details

Citation

If you use LMCODE in your research, please cite:

@misc{lm_memory_code,
  title={LMCODE: Language Model with Memory CODE},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/lm_memory_code}
}

Acknowledgments

  • Inspired by LongMem, MemoRAG, and CAMELoT research
  • Built with PyTorch and Hugging Face Transformers
  • Thanks to the open-source ML community

Contact

For questions or feedback, please open an issue or contact your.email@example.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for userkuku/lm_memory_code