TECHNOLOGY EVOLUTION

A deep dive into our adaptive engine (V2) and our historical foundation (V1).

MODELMANGO V2 (TITANS)

Status: Active | Core: Titans + Nested Learning (TTT) | Focus: Real-Time Adaptation

1. INTRODUCTION: ADAPTIVE INTELLIGENCE

The evolution to version V2 marks the transition from a static statistical model to an "Adaptive Intelligence" system. While traditional models (V1) attempt to apply past patterns to a constantly changing present, V2 is designed to learn in real-time, treating financial markets not as repetitive series, but as dynamic, evolving systems.

ModelMango V2 does not seek "Mean Reversion" (returning to the average), but identifies structural regime changes the moment they happen, adapting to any global asset, from commodities to tech indices.

2. ARCHITECTURE: TITANS & NESTED LEARNING

The technological core of V2 is based on two advanced research paradigms that redefine the concept of memory in neural networks:

  • Titans Architecture (Memory as Context): Unlike classic Transformers that have a limited context window, the Titans architecture introduces a long-term neural memory. This allows the model to "remember" significant historical events (such as liquidity crises or structural rallies) and use them as context for current analysis, without the computational limits of traditional sliding windows.
  • Nested Learning (Test-Time Training - TTT): This is the fundamental innovation. The model is not "frozen" after initial training. Using the Nested Learning paradigm, the system runs a continuous internal training loop during the inference phase (live). For every new market data point received, the model calculates gradients and updates its internal weights instantly.

In summary: the model learns while it operates, adapting to the specific volatility of the asset under examination at that precise historical moment.

Scientific References:

3. THE NEURAL MODEL

The current architecture is built upon the proprietary TransformerTimeSeriesMACModel class, integrating "Meta-Learning" mechanisms directly into the data flow:

  • Continuum Memory System (CMS): The system's brain. Instead of static memory, we use a hierarchy of Functional Memory Modules. These are "stateless" modules operating on different time frequencies (short, medium, long term).
  • Meta-Weights & Fast Weights: The model does not memorize raw prices. It memorizes the synaptic weights that generate the market's transition function. During inference, the system updates its "Fast Weights" instantly to adapt to the current market regime, while "Meta-Weights" (slow weights learned during global training) ensure structural stability.
  • Instance Normalization: To make the model "absolute price agnostic" (working on both $10 and $30,000 assets), each time window is statistically normalized against its own local mean and standard deviation (Z-Score), allowing the network to focus purely on relative dynamics.
  • Attention Pooling: Rather than performing a simple average of historical data, the model uses an Attention Pooling mechanism that assigns a "relevance weight" to each memory fragment, actively filtering market noise from valid signals.

4. TRAINING & UNIVERSAL DATASET

V2 training was performed on high-performance NVIDIA GPUs, utilizing a robust loss function (HuberLoss) to minimize the impact of extreme outliers typical of market crashes.

The data management strategy is rigorous to prevent "Data Leakage":

  • Chronological Split: We do not use random shuffling. For every asset, the dataset is rigidly cut: the oldest 80% for Training, the most recent 20% for Validation. The model never sees the future during training.
  • Extreme Clipping: During normalization, we apply 10-sigma clipping. This prevents "Black Swan" events from destroying gradients during backpropagation, keeping training stable.
The "Global Market Proxy" Dataset includes:
  • Risk & Momentum: BTC, ETH, SOL, BNB
  • Market Health (Indices): S&P 500, Dow Jones, Nasdaq 100, DAX, Nikkei 225
  • Market Movers (Big Tech): NVIDIA, Apple, Microsoft, Google, Amazon, Tesla, Meta
  • Forex & Macro: EUR/USD, USD/JPY, GBP/USD, AUD/USD, USD/CAD
  • Commodities & Energy: Gold, Silver, Crude Oil
  • Macro Drivers: VIX (Fear Index), DXY (Dollar Index), Treasury Yields (TNX)

5. PERFORMANCE V2

Current performance, measured on the global validation dataset (data never seen by the model during training), shows an exceptional level of predictive accuracy for a non-linear financial system.

High Price MAPE

1.48%

Low Price MAPE

1.20%

Close Price MAPE

1.05%

*MAPE: Mean Absolute Percentage Error. A value of 1.05% on the Close means that, on average, the model's prediction deviates by only 1% from the actual closing price of the next day.

6. SPECIALIZED AGENTS

Instead of using a single monolith for all predictions, the V2 architecture employs specialized agents across distinct time horizons, applicable to any financial asset:

A. AGENT T+1 (Sniper)

  • Horizon: Short Term (24 Hours / Intraday).
  • Function: Extremely reactive. Designed to capture immediate volatility and impulsive breakouts. It excels at recognizing when an asset is starting a strong directional move, ignoring background noise.

B. AGENT T+2 (Ranger)

  • Horizon: Medium Term (48 Hours).
  • Function: Filters intraday noise to identify trend sustainability. It serves to confirm if a move detected by the Sniper has the characteristics to extend beyond the trading day.

7. THE STRATEGY: FUSION V2 (ORACLE)

Artificial intelligence provides probabilities, but the Oracle provides discipline. It is a deterministic decision engine that applies institutional Risk Management rules on top of neural predictions, ensuring safety in any market:

  • Dynamic Confluence: Market entry is validated only if both Sniper and Ranger agents show directional agreement, drastically reducing false positives.
  • Regime Guard: A primary trend filter that prevents "counter-trend" trading in adverse market conditions. If the underlying trend is bearish, the Oracle imposes a veto on Long positions, protecting capital during market crashes.
  • Volatility Guard: Constantly monitors the asset's volatility. If it exceeds predefined safety thresholds, the system reduces exposure or suspends trading to preserve capital.

8. VALIDATION: THE TIME WALL

To guarantee performance integrity on any asset, we implemented a rigorous validation protocol named Chronological Split.

The dataset is divided by an impassable "Time Wall": the model trains exclusively on the past and is validated on the future, without any data shuffling. This faithfully simulates real-world operations, eliminating the risk of "Data Leakage" (where the model accidentally sees future data) and ensuring that observed performance is replicable in production.

HISTORICAL ARCHIVE (LEGACY)

MODELMANGO V1 (DEPRECATED)

Note: This section describes the previous technology, decommissioned in Q3 2024 due to limitations in handling regime changes (static nature).

1. INTRODUCTION (LEGACY V1)

This document provides a general description of the architecture and operation of MODELMANGO V1, an advanced Artificial Intelligence (AI) system designed to analyze financial time series, generate price predictions (High, Low, Close), and provide strategic trading signals. The system combines a specialized Transformer model with adaptive memory mechanisms to capture complex market dynamics.

2. ARCHITECTURE V1

The system primarily operates through two integrated logical components:

  1. MODELMANGO PREDICTION

    Base AI Prediction Model
    • Responsible for loading historical OHLCV (Open, High, Low, Close, Volume) data for a specific asset.
    • Performs a preprocessing and feature engineering phase, transforming raw data to extract meaningful information.
    • Uses a pre-trained transformer model called TransformerTimeSeriesMACModel to generate High, Low, and Close price predictions for the next day (T+1).
  2. MODELMANGO STRATEGY

    AI Model for direct stock market operations:
    • Loads its specific configuration and parameters.
    • Uses the base HLC predictions generated by the previous model as fundamental input.
    • Loads historical OHLCV data and the historical predictions generated in the past by the base model (history of predictions generated by the MODELMANGO PREDICTION model for a given asset, since its listing date) to create a richer feature set.
    • Performs even more advanced feature engineering.
    • Prepares data for inference, including a placeholder for day T+1 with the base predictions.
    • Loads the trained model for strategies.
    • Performs inference to obtain:
      • Adjustment delta for the entry price (relative to the predicted Low).
      • Adjustment delta for an implicit exit price (relative to the predicted Close).
      • A volatility prediction specific to the strategy model.
    • Calculates operational levels:
      • Optimized Entry Price.
      • Stop Loss Price (based on predicted volatility and potentially adaptive).
      • Take Profit Price (based on volatility, desired Risk/Reward, and potentially the exit delta).
    • Applies decision logic (based on configurable thresholds for exit delta, volatility, and R/R) to determine the final signal: LONG ENTRY or HOLD/NO ENTRY.
    • Returns a structured dictionary with the decision and all calculated parameters.

3. BASE MODEL V1

The core of the system is a modified Transformer model, specifically designed for financial time series:

  • Transformer Architecture: Leverages the self-attention mechanisms of Transformers, excellent at identifying complex relationships and long-term dependencies in sequential data.
  • Memory as Context (MAC): Implements a memory mechanism inspired by recent work to improve long-term context management and adaptation:
    • Persistent Memory: "Learnable" tokens that maintain general and stable information over time.
    • Memory Module (M): A deep MLP (Multi-Layer Perceptron) that learns to map contextual queries to relevant memory representations (u_C).
    • Online Update: The memory M is updated during inference using a mechanism based on the gradient of the loss between the memory retrieved for a key derived from the current chunk and the value associato that chunk. This allows the model to quickly adapt to the recent dynamics of the specific asset being examined, even if it was not part of the main training. It uses momentum (eta), gradient intensity (theta), and forgetting (alpha) for stable updates.
    • Long-Term Memory: A buffer that accumulates representations of past chunks, queried to retrieve additional context.
  • 1D Depthwise-Separable Convolutions: Applied to query, key, and value projections before attention and memory update. This helps capture local and spatial patterns within features efficiently.
  • Chunking: The input sequence is processed in segments ("chunks") to handle long sequences and allow the integration of the MAC mechanism at each step.

4. TRAINING V1

The base TransformerTimeSeriesMACModel was trained on a historical dataset of 5900 trading days of Bitcoin (BTC-USD). This long period of data on an asset known for its volatility and complex dynamics (there is no underlying asset in the traditional sense) has nevertheless allowed the model to learn fundamental patterns of financial market behavior and thus generalize.

Training was performed on a MacBook Pro with the following specifications:

  • Processor: 2.9 GHz Quad-Core Intel Core i7
  • RAM: 16 GB 2133 MHz LPDDR3
  • Graphics: Intel HD Graphics 630 1536 MB

Although it is not a high-end GPU workstation but rather a "steam iron" (this is what I have available), it is sufficient to train a model of MODELMANGO's size (approximately 4.4 million parameters).

5. PERFORMANCE & GENERALIZATION V1

Despite primary training on Bitcoin and the relatively small size of the model (4,426,509 parameters), the system demonstrates remarkable performance across a wide range of global assets (around 10,000 assets including stocks, cryptocurrencies, indices, Forex), as highlighted below by the recent average MAPE (Mean Absolute Percentage Error) metrics on the base HLC predictions:

  • Average High Price MAPE: 1.26%
  • Average Low Price MAPE: 1.31%
  • Average Close Price MAPE: 1.38%

Across 41 different assets, including: AAPL.US, MSFT.US, GOOGL.US, BTC-USD.CC, ETH-USD.CC, XOM.US, EURUSD.FOREX, GSPC.INDX, etc.

(The model's performance is constantly updated and available at this address: https://www.modelmango.co/performance on a basket of 41 assets divided into categories)

How is this performance possible with a "small" model trained on only one asset?

Several factors contribute to explaining this surprising generalization capability:

  1. Robust Feature Engineering: Preprocessing and feature creation transform raw prices into more abstract representations that capture behavioral dynamics and patterns (trend, momentum, mean reversion, volatility) rather than absolute price levels. These underlying dynamics are often "universal" across different financial markets, even if they manifest with different intensities and scales.
  2. Learning Fundamental Patterns: By training on 5900 days of Bitcoin, an asset that has gone through multiple market regimes (bull, bear, sideways, high/low volatility), the model had the opportunity to learn these fundamental "price action" patterns and the relationships between derived technical indicators.
  3. Power of Transformers: Even with "only" 4.4M parameters, the Transformer architecture is inherently powerful in modeling complex, non-linear dependencies within the engineered feature sequences.
  4. Adaptation via MAC: The Memory-as-Context (MAC) mechanism, particularly its "online update" during inference, plays a crucial role. It allows the model, trained on Bitcoin's general patterns, to "dynamically adapt" to the specifics of the asset it is analyzing. When processing a new data chunk (a chunk consists of a portion of data), for example, Apple stock (using historical prices obtained daily via EODHD API), the M memory updates slightly to reflect recent dynamics, improving the relevance of predictions for that specific asset. The model not only "remembers" the past but "learns how to learn" from the current context.
  5. Lower Risk of Specific Overfitting: A smaller model might be less prone to excessively "memorizing" the specific idiosyncrasies of the training dataset (Bitcoin) compared to huge models (billions of parameters). This can foster generalization, as the model is forced to focus on the most robust and transferable patterns.
  6. Focus on Relative Predictions (Strategy): The strategy model, instead, does not predict absolute prices but "adjustment deltas" and "volatility". This additional level of abstraction can make the strategy more robust against errors still present in the base model's price prediction.

In summary, MODELMANGO's ability to generalize so effectively with extremely limited resources stems from a combination of extracting universal patterns through feature engineering, the Transformer's capacity to model these features, and an adaptive memory mechanism (MAC) that allows for "on-the-fly" specialization to the current asset during inference.

6. INFERENCE V1

Inference (generating predictions and signals) is performed on an AWS EC2 server with Amazon Linux 2023 (kernel 6.1, x86_64). Thanks to the model's contained size and architectural efficiency, inference is extremely fast and computationally inexpensive, keeping the system efficient and suitable for applications requiring rapid decisions or the analysis of many assets in parallel.

7. CONCLUSION V1

MODELMANGO represents a sophisticated approach to financial time series analysis. By combining the effectiveness of Transformers with adaptive memory mechanisms and careful feature engineering, it manages to provide accurate predictions and potentially useful trading signals across a wide range of assets, despite its relatively modest computational size and primary training focused on Bitcoin. Its online adaptation capability and inference efficiency make it a powerful tool for market analysis.


Giovanni Canclini.

PS: The next chapter for MODELMANGO will be aimed at predicting linear nucleotide sequences and the three-dimensional structure of RNA that folds into complex shapes determining its biological function. The challenge will be to push MODELMANGO to predict these shapes by leveraging the generalization capabilities inherent in this architecture.