1.3 β Time Series and DatetimeIndex
1.3 β Time Series and DatetimeIndex#
For backtesting, your DataFrame should have one row per bar (e.g. per day) and a DatetimeIndex so you can slice by date and keep all series aligned. This lesson covers how to create and use a DatetimeIndex and keep a single timezone.
Used in: Project 1 β Backtesting Engine: storing and indexing OHLCV so signals and P&L align by date.
Why a DatetimeIndex?#
- Slicing by date:
df.loc['2024-01-01':'2024-06-30']without converting strings or using row numbers. - Alignment: When you add columns (e.g. indicators), pandas aligns by index, so dates stay matched.
- Resampling and frequency: Later you can resample to different frequencies (e.g. daily β weekly) using the index.
Setting the index to dates#
If you have a column of dates (e.g. from a CSV or API):
# Assume df has a column 'date' (string or datetime)
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
After this, the index is a DatetimeIndex and you no longer have a date column (unless you keep a copy). Prefer set_index so all operations use the same index.
One row per date#
For a daily backtest, ensure you have exactly one row per trading day (or one per bar for intraday). Duplicate dates or missing dates can break alignment. After loading data:
- Drop duplicates:
df = df[~df.index.duplicated(keep='first')] - Sort by index:
df = df.sort_index()so that.iloc[:i]is βpast and currentβ and.locslices behave as expected.
Single timezone#
Mix timezones can cause off-by-one or wrong alignment. Normalize to one timezone, e.g. US/Eastern for US equities:
# If index is timezone-naive, localize then convert
df.index = df.index.tz_localize('UTC').tz_convert('America/New_York')
# If already timezone-aware, convert
df.index = df.index.tz_convert('America/New_York')
Use one convention (e.g. exchange local time) and stick to it for both data and any timestamps you generate.
Date ranges and missing bars#
pd.date_range(start='2024-01-01', end='2024-12-31', freq='B')creates a business-day index. You can reindex your data to this to expose missing days:df = df.reindex(date_range)β then fill or drop NaN as needed.- For a simple backtest, loading data from a single source (e.g. yfinance) usually gives contiguous trading days; just ensure no duplicate indices and that the index is sorted.
Summary for backtesting#
- Use one row per bar and a DatetimeIndex.
- Sort the index and remove duplicate dates.
- Use a single timezone for the index (and for any timestamps you add).
- Select ranges with
.loc['start':'end']and βpast onlyβ with.iloc[:i+1]when iterating.
Quiz time#
Question 1
Why should you use a DatetimeIndex instead of a separate βdateβ column for backtest data? Show the difference with code.
Show Solution
With a DatetimeIndex, you slice by date and alignment is automatic. With a date column, you have to filter manually. Run this:
import pandas as pd
# Using DatetimeIndex: slice by date directly, add columns and they align
df = pd.DataFrame(
{'close': [100, 101, 102]},
index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03'])
)
df = df.rename_axis('date')
slice_h1 = df.loc['2024-01-01':'2024-01-02']
df['returns'] = df['close'].pct_change()
print('With DatetimeIndex:\n', df)
# With a date column: you'd need df[df['date'].between(...)] and alignment is less clear
Question 2
How do you remove duplicate dates from the index and keep only the first occurrence? Why does it matter for a backtest?
Show Solution
Use df[~df.index.duplicated(keep='first')]. Duplicates break alignment and can double-count when iterating. Example:
import pandas as pd
# Simulate duplicate dates (e.g. from a bad merge or multiple feeds)
df = pd.DataFrame(
{'close': [100, 101, 102, 101.5]},
index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-02', '2024-01-03']) # 01-02 twice
)
print('Before:', len(df), 'rows')
df = df[~df.index.duplicated(keep='first')]
print('After:', len(df), 'rows')
print(df)
Question 3
You have a DataFrame with a timezone-naive index (e.g. from a CSV). How do you convert it to US/Eastern and why use a single timezone?
Show Solution
Localize to UTC then convert to Eastern (or convert directly if already aware). Single timezone avoids off-by-one and misaligned bars. Run this:
import pandas as pd
df = pd.DataFrame(
{'close': [100, 101, 102]},
index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03'])
)
# Timezone-naive -> assume UTC, then convert to US/Eastern
if df.index.tz is None:
df.index = df.index.tz_localize('UTC')
df.index = df.index.tz_convert('America/New_York')
print(df.index)
# DatetimeIndex(['2023-12-31 19:00:00-05:00', '2024-01-01 19:00:00-05:00', ...], dtype='datetime64[ns, America/New_York]')
Next steps#
1.4 β Loading and cleaning market data covers fetching OHLCV (e.g. with yfinance), handling missing values, and building a clean DataFrame with a DatetimeIndex.