Financial Time Series Forecasting Pipeline

Executive Summary

This repository documents the first stage of a broader carry-trade-oriented financial machine learning project. The current implementation focuses on SPY market data and was designed to build, test, and evaluate a complete forecasting pipeline before extending the approach to FX and carry-specific signals.

The project includes:

data cleaning and exploratory data analysis, feature engineering and scaling, baseline and classical time-series benchmarks, deep learning forecasting with Temporal Fusion Transformer (TFT), classification experiments, and distribution-shift analysis.

A key result of the project is that most models showed weak out-of-sample predictive power, which is consistent with the difficulty of forecasting returns in liquid financial markets. At the same time, the project identified important methodological insights, including the stronger robustness of return-based features compared with price-level-based features and the relevance of data shift for model stability.

One notable positive result was that a simple EMA12-based benchmark outperformed several more complex classical models in the hourly setup. However, the broader set of experiments did not show a robust and repeatable forecasting edge across the full pipeline.

Overall, this repository should be understood as a practical financial forecasting and modeling foundation, developed as the first phase of a larger carry-trade research direction.

Data Availability

The original raw SPY dataset used in this project is not included in the repository because it was obtained from a paid data source and cannot be redistributed publicly.

The project expects the raw input file at:

data/SPY.txt

To reproduce the workflow, users can either:

provide their own SPY intraday dataset in a compatible format, obtain equivalent data from a licensed provider, or contact me for clarification regarding the expected structure and preprocessing assumptions.

This repository therefore focuses on the research workflow, feature engineering, modeling pipeline, and evaluation framework, while the proprietary raw market data remain excluded for licensing reasons.

Financial Time Series Forecasting Pipeline

SPY-Based Modeling Foundation for a Carry Trade Research Project

This repository contains a practical financial machine learning project focused on building and evaluating an end-to-end forecasting pipeline for financial time series.

The broader long-term idea behind the project is related to carry trade research, but the current implementation uses SPY ETF data as a liquid and structured benchmark dataset to develop and validate the modeling workflow first.

This repository should therefore be understood as:

Phase 1: forecasting pipeline development, model comparison, and robustness analysis on SPY

The project covers:

data cleaning and exploratory data analysis
feature engineering and scaling
benchmark and baseline models
deep learning forecasting with Temporal Fusion Transformer (TFT)
classification experiments
distribution-shift analysis
and critical evaluation of weak predictive performance in financial markets

Project Goal

The goal of this project was to test whether short- and medium-horizon market returns can be forecast using:

technical indicators
volatility features
lagged returns
time-based features
classical time-series methods
and deep learning models such as the Temporal Fusion Transformer (TFT)

A second objective was to understand why predictive performance is weak in many financial settings, including the roles of:

noisy return series
class imbalance
feature design
and train-test distribution shift

Why SPY if the project is called Carry Trade?

The long-term research direction is carry-trade-related forecasting and strategy design. However, instead of beginning directly with FX carry-trade data, the current project first develops the forecasting and feature-engineering pipeline on SPY.

This was intentional:

SPY is highly liquid and easy to structure
it allows the pipeline to be tested under realistic market conditions
the methodology can later be transferred to FX pairs, interest-rate differentials, and carry-specific signals

So the current repository is best interpreted as:

A practical modeling foundation for a future carry-trade extension

Data

The project uses SPY intraday market data.

Depending on the notebook, the raw minute-level data are resampled into:

15-minute OHLCV candles
hourly OHLCV candles
daily OHLCV candles
monthly OHLCV candles

The datasets contain standard market variables:

Open
High
Low
Close
Volume

Target variables are defined as either:

simple returns
or log returns

depending on the experiment.

Notebook Roadmap

The notebooks are not all part of one single linear pipeline. There is one main early pipeline and then several later TFT experiments that were rebuilt independently.

Step 1 — Data Cleaning and EDA

Notebook: 1 data_cleaning EDA.ipynb

This is the starting point of the whole project.

What it does:

loads raw SPY minute-level data
combines date and time into a datetime index
filters regular trading hours
checks data quality and duplicates
performs exploratory data analysis
visualizes prices, volume, and correlations
resamples data
saves cleaned datasets for later use

This notebook is the base for the later workflow.

Step 2 — Feature Engineering and Scaling

Notebook: 2 Feautures Scaling.ipynb

This notebook builds on the cleaned data from the first notebook.

What it does:

creates forecasting targets
engineers technical and time-based features
removes missing values
splits the data chronologically into train / validation / test
scales numerical features
saves prepared datasets for modeling

Important note:

This was the original feature-engineering pipeline. Later in the project, some of its design choices were revised because this setup turned out to be more sensitive to data shift, especially when features were still strongly tied to the raw price level.

Step 3 — Baseline Models

Notebook: 3 Base Lines.ipynb

This notebook uses the prepared data from the earlier pipeline and compares simpler and classical benchmark models.

Models compared include:

Linear Regression
Random Walk
EMA12
MACD Signal
ARIMA
SARIMA
GARCH Mean

Key result:

The strongest benchmark result came from the EMA12-based model, which outperformed the other tested baseline and classical models in the hourly setup.

Performance of EMA12:

MAE: 0.002434
RMSE: 0.004085
R²: 0.2182
SMAPE: 67.32%

This is an important positive finding in the project: a simple trend-based benchmark captured more useful signal than several more complex classical approaches.

Step 4 — TFT with Data Shift

Notebook: 4 TFT Hour with data shift.ipynb

This notebook still builds on the earlier feature-scaling pipeline.

What it does:

trains a Temporal Fusion Transformer (TFT) on hourly log returns
evaluates forecasting performance
compares train and test feature distributions
explicitly checks for distribution shift / data shift

This notebook is important because it showed that the earlier feature setup was vulnerable to train-test instability.

Main insight:

some predictors were still too closely tied to the price level
train and test distributions differed more strongly
this likely contributed to weak out-of-sample performance

This notebook is therefore best understood as the transition point in the project: it revealed a methodological issue that motivated the later redesign.

What happened after the data-shift notebook?

After the TFT data-shift experiment, the later notebooks were rebuilt more independently.

Instead of continuing to rely on the old feature-scaling pipeline, the later notebooks were connected more directly to the cleaned data from the Data Cleaning and EDA notebook and used revised feature engineering.

This means:

the first four notebooks form the original main pipeline
the later TFT notebooks are new standalone experiments
they were designed to reduce the earlier data-shift problem
they rely more on return-based and more stationary features

Later Independent TFT Experiments

These notebooks should be understood as separate follow-up experiments, each with its own full modeling setup.

`TFT Model 15 min.ipynb`

Purpose:

forecast 15-minute SPY log returns

Main result:

weak predictive performance
predictions tended to stay close to zero
no clear strong visual data shift for the target distribution

Main takeaway:

short-horizon returns are extremely noisy
weak performance is more likely due to low signal than obvious train-test mismatch

`TFT Day.ipynb`

Purpose:

forecast daily SPY log returns

Main takeaway:

daily forecasting pipeline implemented successfully
but predictive power remained very weak

`TFT Day log-return.ipynb`

Purpose:

forecast the next 5 daily log returns

Main takeaway:

similar setup and conclusion to the daily TFT experiments
more complex modeling still did not produce a robust edge

`TFT Day return.ipynb`

Purpose:

forecast simple daily returns instead of log returns

Main takeaway:

useful comparison against the log-return formulation
overall predictive performance remained weak

`TFT Day with lag.ipynb`

Purpose:

use lagged daily returns and technical indicators to predict future daily returns

Main takeaway:

adding lag features did not materially change the conclusion
signal remained weak

`TFT Day Classification.ipynb`

Purpose:

reformulate daily prediction as a 3-class classification problem:
- short
- hold
- long

Main takeaway:

model accuracy looked acceptable at first glance
but the model largely collapsed to the majority class
highlights class imbalance and weak directional predictability

`TFT Month.ipynb`

Purpose:

forecast monthly SPY returns

Main takeaway:

even at a longer horizon, predictive power remained limited
no strong evidence of a robust forecasting edge

Practical Execution Guide

If you want to use the repository in a structured way, the recommended order is:

Core sequence

1 data_cleaning EDA.ipynb
2 Feautures Scaling.ipynb
3 Base Lines.ipynb
4 TFT Hour with data shift.ipynb

This sequence shows the original pipeline and the key methodological finding about data shift.

Follow-up experiments

After that, continue with the later standalone notebooks:

TFT Model 15 min.ipynb
TFT Day.ipynb
TFT Day log-return.ipynb
TFT Day return.ipynb
TFT Day with lag.ipynb
TFT Day Classification.ipynb
TFT Month.ipynb

Main Findings

Across the project, the main findings are:

forecasting SPY returns is difficult, especially at short horizons
many models showed weak out-of-sample predictive power
deep learning models often produced near-zero or negative R²
simple baselines can outperform more complex models in some settings
EMA12 was the strongest benchmark result in the hourly setup
classification introduces additional problems such as class imbalance
feature design strongly affects robustness
price-level-based features are more vulnerable to data shift than return-based features

Main Insight

A central insight from the project is that the models did not generate a robust and repeatable forecasting advantage across the full pipeline.

This is consistent with the idea that liquid financial markets are difficult to predict and broadly supports the weak-form Efficient Market Hypothesis in this setup.

At the same time, the project still produced clear methodological value:

building a realistic financial forecasting workflow
comparing several model families
identifying where and why models fail
improving feature engineering after diagnosing data shift
and documenting the difference between local positive results and robust predictive edge

Backtesting Status

This repository does not yet include a full trading strategy backtest.

That was intentional.

The current focus is on:

data pipeline construction
forecasting model development
out-of-sample evaluation
and robustness analysis

A full trading backtest was not prioritized because most models did not demonstrate a sufficiently robust predictive edge in out-of-sample testing.

In other words:

The main limitation at the current stage appears to be the weakness of the predictive signal itself, not the absence of a backtesting layer.

Repository Structure

. ├── data/ ├── images/ ├── modeling/ ├── models/ ├── notebooks/ ├── requirements.txt ├── requirements_dev.txt └── README.md

Installation

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

Install the core dependencies:

pip install -r requirements.txt

Install development and notebook dependencies:

pip install -r requirements_dev.txt

Requirements

This project uses two dependency files.

`requirements.txt`

Contains the core dependencies needed for modeling and analysis.

`requirements_dev.txt`

Contains additional development dependencies for:

Jupyter notebooks
testing
formatting
experiment utilities

Core Dependencies

Main project dependencies include:

arch
lightning
matplotlib
numpy
pandas
plotly
pytorch-forecasting
scikit-learn
seaborn
statsmodels
ta
torch
tqdm

Development dependencies include:

black
jupyterlab
mlflow
parsenvy
protobuf
pytest
testbook
pipreqs

Why this project is useful

This repository is useful as a practical example of how a financial forecasting workflow can be built, tested, revised, and critically evaluated.

It shows:

how to clean and transform market data
how to engineer finance-specific features
how to compare simple and complex models
how to detect and interpret data shift
how to handle weak results honestly
and how a research idea can evolve in stages before becoming a strategy project

Current Project Status

The current state of the repository is best described as:

Phase 1: practical forecasting pipeline development on SPY

It is not yet a full carry-trade implementation with FX data, rate differentials, and strategy-level portfolio backtesting.

Instead, it provides the methodological foundation for that next step.

Future Work

Possible next steps include:

extending the pipeline from SPY to FX pairs
adding carry-related macro and rate-differential features
building genuine carry-trade signals
adding walk-forward validation
optimizing hyperparameters
integrating a strategy-level backtesting layer
evaluating portfolio-level performance

Author

This project was developed as a practical financial machine learning project focused on time-series forecasting, feature engineering, model comparison, robustness analysis, and realistic evaluation of financial prediction models.

The current implementation uses SPY as the first-stage benchmark dataset and forms the modeling foundation for a broader carry-trade-oriented research direction.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
images		images
modeling		modeling
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Understanding-Currency-Carry-Trade.pptx		Understanding-Currency-Carry-Trade.pptx
__init__.py		__init__.py
environment.yml		environment.yml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt

Folders and files

Latest commit

History

Repository files navigation

Financial Time Series Forecasting Pipeline

SPY-Based Modeling Foundation for a Carry Trade Research Project

Project Goal

Why SPY if the project is called Carry Trade?

Data

Notebook Roadmap

Recommended reading and execution order

Step 1 — Data Cleaning and EDA

Step 2 — Feature Engineering and Scaling

Step 3 — Baseline Models

Step 4 — TFT with Data Shift

What happened after the data-shift notebook?

Later Independent TFT Experiments

TFT Model 15 min.ipynb

TFT Day.ipynb

TFT Day log-return.ipynb

TFT Day return.ipynb

TFT Day with lag.ipynb

TFT Day Classification.ipynb

TFT Month.ipynb

Practical Execution Guide

Core sequence

Follow-up experiments

Main Findings

Main Insight

Backtesting Status

Repository Structure

Installation

Requirements

requirements.txt

requirements_dev.txt

Core Dependencies

Why this project is useful

Current Project Status

Future Work

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`TFT Model 15 min.ipynb`

`TFT Day.ipynb`

`TFT Day log-return.ipynb`

`TFT Day return.ipynb`

`TFT Day with lag.ipynb`

`TFT Day Classification.ipynb`

`TFT Month.ipynb`

`requirements.txt`

`requirements_dev.txt`

Packages