1.x — Chapter 1 summary and quiz

📅 February 8, 2025

1.x — Chapter 1 summary and quiz#

Chapter summary#

This chapter covered the pandas fundamentals you need for loading and organizing market data and building a backtest.

Key concepts#

Series and DataFrame: one- and two-dimensional structures with labeled indices. For backtesting, use a DataFrame with one row per bar and columns for OHLCV and indicators.
Indexing: .loc for labels (dates), .iloc for position. Use “past only” slices to avoid look-ahead.
DatetimeIndex: set your index to dates, sort it, use one timezone, and ensure one row per bar.
Loading and cleaning: use yfinance (or similar) to get OHLCV, handle NaNs, normalize timezone, and sort.

What we learned#

1.1 Series and DataFrame: Create and add columns; keep one row per date and align by index.
1.2 Indexing and selection: Select columns and rows by label or position; filter with boolean indexing; avoid look-ahead in backtests.
1.3 Time series and DatetimeIndex: Set and sort a DatetimeIndex; use a single timezone for reliable alignment.
1.4 Loading and cleaning market data: Download OHLCV with yfinance, drop or fill NaNs, convert timezone, and produce a clean DataFrame.

Application#

These skills directly support Project 1 — Backtesting Engine: load data (1.4), store in a DataFrame with DatetimeIndex (1.1, 1.3), add indicators and signals (1.1, 1.2), and iterate without look-ahead (1.2).

Quiz time#

Practice by writing code that satisfies the given inputs and test cases. Try it yourself before opening the solution or hints.

Question 1

Task: Given the DataFrame below (with a default integer index), convert it to use a DatetimeIndex with the given dates. Then write an expression that slices rows from '2024-01-01' through '2024-01-02' (inclusive) and assign the result to h1. Your code must use the exact input.

Input:

import pandas as pd
df = pd.DataFrame(
    {'close': [100, 101, 102, 99]},
    index=[0, 1, 2, 3]
)
dates = ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04']

Test cases:

After your changes, df.index is a DatetimeIndex with the four dates.
h1 = df.loc['2024-01-01':'2024-01-02'] yields a DataFrame with 2 rows and the same close values for those two dates.

Expected output: h1 has shape (2, 1) and index 2024-01-01, 2024-01-02.

Hint

Use pd.to_datetime(dates) for the index. Assign with df.index = .... Slice with .loc['start':'end'] (both ends inclusive for label slicing).

Show Solution

df.index = pd.to_datetime(dates)
h1 = df.loc['2024-01-01':'2024-01-02']
print(h1)

Question 2

Task: Given a DataFrame df with a close column and 5 rows, write a loop over bar index i from 1 to 4 (inclusive). At each step, form the slice of df that contains only past and current data (no future rows), then compute the mean of close over that slice and store it (e.g. in a list or print it). Use the exact input below.

Input:

import pandas as pd
df = pd.DataFrame({'close': [10, 11, 12, 13, 14]})

Test cases:

At i=1, the slice has 2 rows (indices 0 and 1); mean of close is 10.5.
At i=4, the slice has 5 rows; mean of close is 12.0.
You never use rows beyond the current bar (no look-ahead).

Expected output: At each bar i, the computed mean uses only df.iloc[:i+1] (or equivalent).

Hint

Use df.iloc[:i+1] to get rows from the start up to and including position i. That slice is “past and current” only.

Show Solution

for i in range(1, len(df)):
    past_and_current = df.iloc[:i+1]
    sma = past_and_current['close'].mean()
    print(f"Bar {i}: sma = {sma}")

Question 3

Task: You are given two DataFrames with a close column containing one NaN each (in different positions). Implement two cleaning approaches: (1) Option A: drop any row that has a NaN. (2) Option B: forward-fill NaNs, then drop any remaining rows that still have NaN. Use the exact inputs below. Assign the results to clean1 and clean2 respectively.

Input:

import pandas as pd
df1 = pd.DataFrame({'close': [100, float('nan'), 102, 99]})
df2 = pd.DataFrame({'close': [float('nan'), 101, 102, 99]})

Test cases:

Option A applied to df1: clean1 has 3 rows (the row with NaN removed).
Option B applied to df2: forward-fill makes the first row 101, then dropping leaves 3 rows; clean2 has 3 rows.

Expected output: clean1 and clean2 each have no NaN in close and the correct number of rows.

Hint

For Option A use a method that drops rows with missing values. For Option B use a method that fills forward, then drop remaining NaN.

Show Solution

clean1 = df1.dropna()
clean2 = df2.ffill().dropna()
print('Option A (dropna):\n', clean1)
print('Option B (ffill then dropna):\n', clean2)