1.1 — Series and DataFrame Fundamentals

📅 February 8, 2025

1.1 — Series and DataFrame Fundamentals#

Pandas builds on two core structures: Series (one-dimensional) and DataFrame (two-dimensional). For backtesting and market data, you will almost always work with DataFrames where each row is a timestamp and each column is a price series, indicator, or signal. This lesson gives you what you need to create and manipulate these structures.

Used in: Project 1 — Backtesting Engine: storing OHLCV data and indicator columns in a DataFrame.

Series#

A Series is a one-dimensional labeled array. Labels form the index; the values can be any type (often numeric).

import pandas as pd

# Series from a list (default integer index)
prices = pd.Series([100.0, 101.5, 99.2, 102.1])
print(prices)
# 0    100.0
# 1    101.5
# 2     99.2
# 3    102.1

# Series with explicit index (e.g. dates or labels)
prices = pd.Series([100.0, 101.5, 99.2], index=['Mon', 'Tue', 'Wed'])
print(prices)

You can do element-wise math and use .mean(), .std(), .sum() etc. Series align by index when you combine them—important when we add indicator columns to a price Series.

DataFrame#

A DataFrame is a table: rows and columns, each column a Series sharing the same index. For market data, think one row per date, one column per series (Open, High, Low, Close, Volume, then your indicators).

# DataFrame from a dict of lists (each key = column name)
df = pd.DataFrame({
    'open':   [100, 101, 99],
    'high':   [102, 103, 100],
    'low':    [98, 100, 97],
    'close':  [101, 102, 99],
})
#    open  high  low  close
# 0   100   102   98    101
# 1   101   103  100    102
# 2    99   100   97     99

df

Rows: default integer index 0, 1, 2. For backtesting you will replace this with a DatetimeIndex (see 1.3 — Time series and DatetimeIndex).
Columns: access with df['close'] or df.close (returns a Series). Add a new column by assignment: df['sma_10'] = ....

OHLC and OHLCV: why we care#

OHLC stands for Open, High, Low, Close—the four standard prices for a single bar (e.g. one day or one hour):

Open: price at the start of the period
High: highest price during the period
Low: lowest price during the period
Close: price at the end of the period

OHLCV adds Volume (number of shares or contracts traded). In pandas we store these as columns: one row per bar, columns open, high, low, close, and optionally volume.

Why it matters: almost all market data and backtesting assumes this shape. Exchanges and data vendors provide OHLC(V); indicators (e.g. moving averages) usually use close; strategies often compare high/low to open/close. Keeping this layout keeps your code compatible with the rest of the quant ecosystem.

SMA: what it is and why we care#

SMA means Simple Moving Average: the average of the last (n) values. For example, a 10-period SMA of close is the mean of the last 10 closing prices.

What it does: smooths out noise so you can see trend. Short SMAs (e.g. 10) react quickly; long SMAs (e.g. 50) react slowly.
Why we care in quant: many strategies use SMAs for signals (e.g. “buy when price crosses above the 50-day SMA”) or as a baseline for volatility and momentum. In pandas you compute it with a rolling window—see below.

In general pandas terms, “SMA” is just one use of a rolling mean; the same rolling machinery is used for rolling standard deviation, min, max, etc.

Adding and overwriting columns#

You add indicator or signal columns by assigning to a new column name. Pandas aligns by index, so if you assign a Series with a matching index, values line up by date.

Three functions show up constantly for time-series and backtesting. It’s worth knowing what each does and when to use it.

`.shift(periods)` — lag (or lead) values#

What it does: moves values forward or backward along the index. s.shift(1) puts the previous row’s value in the current row; the first row gets NaN (no previous value).
Why we care (quant): “previous close,” “yesterday’s signal,” or any lagged feature. Essential to avoid look-ahead: you never use future data at the current bar.
When to use (general): any time you need “value from N rows ago” (e.g. compare to last week, last month). Positive periods = lag (past); negative = lead (future—use only for targets, not features).

df['close_lag1'] = df['close'].shift(1)   # previous bar's close

`.rolling(window)` — rolling windows#

What it does: groups consecutive rows into a window of size window and lets you apply an aggregation (.mean(), .std(), .min(), etc.). The first window - 1 rows are NaN (not enough history yet).
Why we care (quant): SMAs (rolling(n).mean()), rolling volatility (rolling(n).std()), rolling high/low, and many indicators are “something over the last N bars.”
When to use (general): any “over the last N rows” calculation—smoothing, local trends, or running statistics.

df['sma_10'] = df['close'].rolling(10).mean()

`.pct_change(periods=1)` — percent change (returns)#

What it does: computes (current − previous) / previous, i.e. the proportional change from the previous row. With default periods=1, the first row is NaN.
Why we care (quant): daily (or bar) returns are the standard for performance, risk, and many signals. Returns are usually in decimal form (0.01 = 1%).
When to use (general): any “how much did this change compared to last time?” metric—prices, volumes, or other series.

df['returns'] = df['close'].pct_change()

Putting them together:

df['close_lag1'] = df['close'].shift(1)   # previous day's close
df['returns']    = df['close'].pct_change() # simple daily returns
df['sma_2']      = df['close'].rolling(2).mean()  # 2-bar SMA (for illustration)

This pattern—one row per date, new columns for indicators and signals—is exactly what you use in a backtest: load OHLCV, then add columns for moving averages, signals, etc.

Key takeaways for backtesting#

Use a DataFrame for your main data: one row per bar (e.g. per day), columns for Open, High, Low, Close, Volume, and any indicators or signals.
Use a Series for a single series (e.g. one symbol’s close, or one indicator) when you need to pass it to a function or compute on it.
Add columns with assignment (df['new_col'] = ...). Keep the same index (dates) so everything stays aligned.

Quiz time#

Practice by writing code that satisfies the given inputs and test cases. Try it yourself before opening the solution or hints.

Question 1

Task: Given a list of closing prices, create a Series with labels 'Mon', 'Tue', 'Wed', 'Thu', then create a DataFrame with columns open, high, low, close for the same four days, using the same index. Add a column that is the mean of the close Series (single number broadcast). Your code should use the exact inputs below.

Input:

close_prices = [101.0, 102.0, 99.0, 100.0]
index = ['Mon', 'Tue', 'Wed', 'Thu']
# open, high, low for each day (you can define these as lists)
open_  = [100, 101, 102, 99]
high   = [102, 103, 104, 101]
low    = [98, 100, 97, 100]

Test cases:

The Series has index ['Mon','Tue','Wed','Thu'] and values [101, 102, 99, 100].
The DataFrame has columns open, high, low, close and the same index.
A new column close_mean equals the scalar mean of close (e.g. 100.5) in every row.

Expected output (shape): A DataFrame with 4 rows, 5 columns (open, high, low, close, close_mean).

Hint

Use pd.Series(close_prices, index=index). Build the DataFrame from a dict of lists and pass index=.... For close_mean, assign the scalar df['close'].mean().

Show Solution

import pandas as pd

close_prices = [101.0, 102.0, 99.0, 100.0]
index = ['Mon', 'Tue', 'Wed', 'Thu']
open_  = [100, 101, 102, 99]
high   = [102, 103, 104, 101]
low    = [98, 100, 97, 100]

s = pd.Series(close_prices, index=index)
df = pd.DataFrame({'open': open_, 'high': high, 'low': low, 'close': close_prices}, index=index)
df['close_mean'] = df['close'].mean()
print(df)

Question 2

Task: Given a DataFrame with a close column and a datetime index, add a column close_lag1 that contains the previous bar’s close. First row should have NaN. Use the exact input below.

Input:

import pandas as pd
df = pd.DataFrame(
    {'close': [100, 101, 102, 99, 100]},
    index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'])
)

Test cases:

close_lag1 on the first date is NaN.
On 2024-01-02, close_lag1 is 100.
On 2024-01-05, close_lag1 is 99.

Expected output: Same DataFrame with an extra column close_lag1; values aligned by date.

Hint

Use a method that shifts values down by one row so the “previous” value appears in the current row. No loops needed.

Show Solution

df['close_lag1'] = df['close'].shift(1)
print(df)
# close  close_lag1
# 2024-01-01   100         NaN
# 2024-01-02   101       100.0
# 2024-01-03   102       101.0
# 2024-01-04    99       102.0
# 2024-01-05   100        99.0

Question 3

Task: Using the same df as in Question 2 (with close and datetime index), add two columns: returns (percent change of close) and sma_2 (2-period simple moving average of close). Match the test cases below.

Input: The df from Question 2 after adding close_lag1 (or start fresh with the same close and index).

Test cases:

First row: returns is NaN, sma_2 is NaN.
Second row: returns ≈ (101−100)/100 = 0.01; sma_2 = (100+101)/2 = 100.5.
Third row: sma_2 = (101+102)/2 = 101.5.

Expected output: DataFrame with columns close, close_lag1 (optional), returns, sma_2.

Hint

Percent change from previous row: use the method that computes (current − previous) / previous. For a 2-period average over close, use a rolling window of size 2 and take the mean.

Show Solution

df['returns'] = df['close'].pct_change()
df['sma_2'] = df['close'].rolling(2).mean()
print(df)

Next steps#

In 1.2 — Indexing and selection we cover how to select rows by date, slice ranges, and select columns. In 1.3 — Time series and DatetimeIndex we set up a proper date index for backtesting.

1.1 — Series and DataFrame Fundamentals#

Series#

DataFrame#

OHLC and OHLCV: why we care#

SMA: what it is and why we care#

Adding and overwriting columns#

.shift(periods) — lag (or lead) values#

.rolling(window) — rolling windows#

.pct_change(periods=1) — percent change (returns)#

Key takeaways for backtesting#

Quiz time#

Next steps#

`.shift(periods)` — lag (or lead) values#

`.rolling(window)` — rolling windows#

`.pct_change(periods=1)` — percent change (returns)#