0.1 — Project 3 — Machine Learning Strategy

📅 February 8, 2025

0.1 — Project 3: Machine Learning Strategy#

Incorporate predictive models into your trading workflow: build features from historical data, train a model, produce trade signals, and execute them using the paper trading bot from Project 2.

Objective#

Create an ML-driven trading system that trains on historical data, generates predictions (signals), and executes via your existing paper trading infrastructure.

Focus Areas#

Feature engineering from price/volume and other data
Statistical learning (supervised classification or regression)
Model evaluation and validation (train/test, time-series splits)
Strategy optimization and risk-aware use of predictions

Technologies#

Python — Pipeline and integration
scikit-learn — Models, preprocessing, and evaluation
Pandas & NumPy — Feature construction and data handling
Your backtester and paper bot — Signal consumption and execution

Example Models#

Logistic regression — Binary outcome (e.g., up/down next day)
Simple classifiers — Random forest, gradient boosting for direction or regime
Time series features — Lags, rolling stats, volatility; no future data

Workflow#

Generate features from historical OHLCV (and optionally other sources).
Train model on a training window; validate on out-of-sample or walk-forward.
Produce trade signals from model predictions (e.g., probability or class).
Feed signals into the paper trading bot from Project 2.

Deliverable#

An ML-driven system that:

Builds a feature set and target from historical data
Trains at least one model (e.g., logistic regression or tree-based)
Evaluates with appropriate metrics (accuracy, precision/recall, or P&L in backtest)
Outputs signals (e.g., long/flat/short or position size) for the paper bot
Runs in paper mode so you can observe live behavior

Prerequisites (with links to lessons)#

Topic	Why you need it	Where to learn
Project 1 — Backtester	Data pipeline, backtest evaluation	Project 1: Backtesting Engine
Project 2 — Paper Bot	Execution of ML signals	Project 2: Paper Trading Bot
Python & Pandas	Data and feature construction	Python & Pandas
Statistics	Interpretation, overfitting, validation	Applied Statistics (stub)
Probability	Uncertainty, calibration	Quant Research

Steps to Complete the Project#

Define the prediction target
Choose a concrete target: e.g., “next-day return positive/negative” (classification) or “next-day return” (regression). Define the horizon and the holding period so it matches how the paper bot will trade.
Build a feature set
From OHLCV (and any other data), create features: lags, moving averages, volatility, volume measures, etc. Ensure no look-ahead: each row uses only past data. Use Pandas for alignment and Python & Pandas patterns.
Create train/validation/test splits
Use time-based splits (e.g., train on past, validate on next period, test on most recent). Avoid shuffling so you don’t leak future information. Document the split dates.
Train a first model
Start with logistic regression or a simple tree model in scikit-learn. Train on the training set, tune hyperparameters on the validation set if needed. Check for overfitting (train vs. validation performance).
Evaluate properly
Report metrics on the test set (accuracy, precision, recall, or regression metrics). Optionally backtest: turn predictions into signals and run through your Project 1 engine to get P&L and Sharpe. Compare to a baseline (e.g., random or buy-and-hold).
Connect to the paper bot
Export or call your model to produce a signal (e.g., every minute or at market open). Feed that signal into the paper bot’s signal interface. Ensure the bot only trades when the model says so and respects risk limits.
Run and monitor
Run the full pipeline in paper mode. Log predictions, signals, and fills. After some time, compare realized P&L to backtest expectations and note any degradation (e.g., regime change, overfitting).
Document and iterate
README: how to train the model, how to run the pipeline, and how to interpret results. Document one or two ideas for improvement (e.g., more features, different model, or risk sizing).

Hints (with backlinks)#

No future information — Features and targets must use only data available at prediction time. Align targets with a shift (e.g., next-day return). Quant Research and Applied Statistics (stub) for proper evaluation.
Overfitting — Keep the model simple at first. Use time-series cross-validation or a single holdout. If train performance is much better than validation/test, simplify or add regularization. Applied Statistics (stub).
From probability to signal — Map model output (e.g., probability of “up”) to position: e.g., threshold above 0.6 → long, below 0.4 → short, else flat. You can later add position sizing. Quant Research for risk and expectation.
Backtest vs. live — Backtest the ML strategy through your Project 1 engine before going live in paper. Compare ML strategy metrics to your earlier non-ML backtest. Project 1.
Reproducibility — Set random seeds for train/test split and model training. Save feature list and model version so you can reproduce results. Python & Pandas and good project structure.

Goals#

By the end of this project you should be able to:

Build a time-series-safe feature set and target
Train and validate a simple ML model for trading
Evaluate with backtest and basic risk metrics
Integrate model predictions with the paper trading bot
Understand the gap between backtest and live performance

This completes the core roadmap: backtester → paper bot → ML strategy. From here you can extend with more features, models, or deployment (e.g., Docker, cloud) as in the long-term vision in the Quant Development Roadmap.