1.3 — Time Series and DatetimeIndex

📅 February 8, 2025

1.3 — Time Series and DatetimeIndex#

For backtesting, your DataFrame should have one row per bar (e.g. per day) and a DatetimeIndex so you can slice by date and keep all series aligned. This lesson covers how to create and use a DatetimeIndex and keep a single timezone.

Used in: Project 1 — Backtesting Engine: storing and indexing OHLCV so signals and P&L align by date.

Why a DatetimeIndex?#

Slicing by date: df.loc['2024-01-01':'2024-06-30'] without converting strings or using row numbers.
Alignment: When you add columns (e.g. indicators), pandas aligns by index, so dates stay matched.
Resampling and frequency: Later you can resample to different frequencies (e.g. daily → weekly) using the index.

Setting the index to dates#

If you have a column of dates (e.g. from a CSV or API):

# Assume df has a column 'date' (string or datetime)
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

After this, the index is a DatetimeIndex and you no longer have a date column (unless you keep a copy). Prefer set_index so all operations use the same index.

One row per date#

For a daily backtest, ensure you have exactly one row per trading day (or one per bar for intraday). Duplicate dates or missing dates can break alignment. After loading data:

Drop duplicates: df = df[~df.index.duplicated(keep='first')]
Sort by index: df = df.sort_index() so that .iloc[:i] is “past and current” and .loc slices behave as expected.

Single timezone#

Mix timezones can cause off-by-one or wrong alignment. Normalize to one timezone, e.g. US/Eastern for US equities:

# If index is timezone-naive, localize then convert
df.index = df.index.tz_localize('UTC').tz_convert('America/New_York')

# If already timezone-aware, convert
df.index = df.index.tz_convert('America/New_York')

Use one convention (e.g. exchange local time) and stick to it for both data and any timestamps you generate.

Date ranges and missing bars#

pd.date_range(start='2024-01-01', end='2024-12-31', freq='B') creates a business-day index. You can reindex your data to this to expose missing days: df = df.reindex(date_range) — then fill or drop NaN as needed.
For a simple backtest, loading data from a single source (e.g. yfinance) usually gives contiguous trading days; just ensure no duplicate indices and that the index is sorted.

Summary for backtesting#

Use one row per bar and a DatetimeIndex.
Sort the index and remove duplicate dates.
Use a single timezone for the index (and for any timestamps you add).
Select ranges with .loc['start':'end'] and “past only” with .iloc[:i+1] when iterating.

Quiz time#

Question 1

Why should you use a DatetimeIndex instead of a separate “date” column for backtest data? Show the difference with code.

Show Solution

With a DatetimeIndex, you slice by date and alignment is automatic. With a date column, you have to filter manually. Run this:

import pandas as pd

# Using DatetimeIndex: slice by date directly, add columns and they align
df = pd.DataFrame(
    {'close': [100, 101, 102]},
    index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03'])
)
df = df.rename_axis('date')
slice_h1 = df.loc['2024-01-01':'2024-01-02']
df['returns'] = df['close'].pct_change()
print('With DatetimeIndex:\n', df)

# With a date column: you'd need df[df['date'].between(...)] and alignment is less clear

Question 2

How do you remove duplicate dates from the index and keep only the first occurrence? Why does it matter for a backtest?

Show Solution

Use df[~df.index.duplicated(keep='first')]. Duplicates break alignment and can double-count when iterating. Example:

import pandas as pd

# Simulate duplicate dates (e.g. from a bad merge or multiple feeds)
df = pd.DataFrame(
    {'close': [100, 101, 102, 101.5]},
    index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-02', '2024-01-03'])  # 01-02 twice
)
print('Before:', len(df), 'rows')
df = df[~df.index.duplicated(keep='first')]
print('After:', len(df), 'rows')
print(df)

Question 3

You have a DataFrame with a timezone-naive index (e.g. from a CSV). How do you convert it to US/Eastern and why use a single timezone?

Show Solution

Localize to UTC then convert to Eastern (or convert directly if already aware). Single timezone avoids off-by-one and misaligned bars. Run this:

import pandas as pd

df = pd.DataFrame(
    {'close': [100, 101, 102]},
    index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03'])
)
# Timezone-naive -> assume UTC, then convert to US/Eastern
if df.index.tz is None:
    df.index = df.index.tz_localize('UTC')
df.index = df.index.tz_convert('America/New_York')
print(df.index)
# DatetimeIndex(['2023-12-31 19:00:00-05:00', '2024-01-01 19:00:00-05:00', ...], dtype='datetime64[ns, America/New_York]')

Next steps#

1.4 — Loading and cleaning market data covers fetching OHLCV (e.g. with yfinance), handling missing values, and building a clean DataFrame with a DatetimeIndex.