1.1 — Series and DataFrame Fundamentals

1.1 — Series and DataFrame Fundamentals#

Pandas builds on two core structures: Series (one-dimensional) and DataFrame (two-dimensional). For backtesting and market data, you will almost always work with DataFrames where each row is a timestamp and each column is a price series, indicator, or signal. This lesson gives you what you need to create and manipulate these structures.

Used in: Project 1 — Backtesting Engine: storing OHLCV data and indicator columns in a DataFrame.

Series#

A Series is a one-dimensional labeled array. Labels form the index; the values can be any type (often numeric).

import pandas as pd

# Series from a list (default integer index)
prices = pd.Series([100.0, 101.5, 99.2, 102.1])
print(prices)
# 0    100.0
# 1    101.5
# 2     99.2
# 3    102.1

# Series with explicit index (e.g. dates or labels)
prices = pd.Series([100.0, 101.5, 99.2], index=['Mon', 'Tue', 'Wed'])
print(prices)

You can do element-wise math and use .mean(), .std(), .sum() etc. Series align by index when you combine them—important when we add indicator columns to a price Series.

DataFrame#

A DataFrame is a table: rows and columns, each column a Series sharing the same index. For market data, think one row per date, one column per series (Open, High, Low, Close, Volume, then your indicators).

# DataFrame from a dict of lists (each key = column name)
df = pd.DataFrame({
    'open':   [100, 101, 99],
    'high':   [102, 103, 100],
    'low':    [98, 100, 97],
    'close':  [101, 102, 99],
})
#    open  high  low  close
# 0   100   102   98    101
# 1   101   103  100    102
# 2    99   100   97     99

df

OHLC and OHLCV: why we care#

OHLC stands for Open, High, Low, Close—the four standard prices for a single bar (e.g. one day or one hour):

OHLCV adds Volume (number of shares or contracts traded). In pandas we store these as columns: one row per bar, columns open, high, low, close, and optionally volume.

Why it matters: almost all market data and backtesting assumes this shape. Exchanges and data vendors provide OHLC(V); indicators (e.g. moving averages) usually use close; strategies often compare high/low to open/close. Keeping this layout keeps your code compatible with the rest of the quant ecosystem.

SMA: what it is and why we care#

SMA means Simple Moving Average: the average of the last (n) values. For example, a 10-period SMA of close is the mean of the last 10 closing prices.

In general pandas terms, “SMA” is just one use of a rolling mean; the same rolling machinery is used for rolling standard deviation, min, max, etc.

Adding and overwriting columns#

You add indicator or signal columns by assigning to a new column name. Pandas aligns by index, so if you assign a Series with a matching index, values line up by date.

Three functions show up constantly for time-series and backtesting. It’s worth knowing what each does and when to use it.

.shift(periods) — lag (or lead) values#

df['close_lag1'] = df['close'].shift(1)   # previous bar's close

.rolling(window) — rolling windows#

df['sma_10'] = df['close'].rolling(10).mean()

.pct_change(periods=1) — percent change (returns)#

df['returns'] = df['close'].pct_change()

Putting them together:

df['close_lag1'] = df['close'].shift(1)   # previous day's close
df['returns']    = df['close'].pct_change() # simple daily returns
df['sma_2']      = df['close'].rolling(2).mean()  # 2-bar SMA (for illustration)

This pattern—one row per date, new columns for indicators and signals—is exactly what you use in a backtest: load OHLCV, then add columns for moving averages, signals, etc.

Key takeaways for backtesting#

Quiz time#

Practice by writing code that satisfies the given inputs and test cases. Try it yourself before opening the solution or hints.


Question 1

Task: Given a list of closing prices, create a Series with labels 'Mon', 'Tue', 'Wed', 'Thu', then create a DataFrame with columns open, high, low, close for the same four days, using the same index. Add a column that is the mean of the close Series (single number broadcast). Your code should use the exact inputs below.

Input:

close_prices = [101.0, 102.0, 99.0, 100.0]
index = ['Mon', 'Tue', 'Wed', 'Thu']
# open, high, low for each day (you can define these as lists)
open_  = [100, 101, 102, 99]
high   = [102, 103, 104, 101]
low    = [98, 100, 97, 100]

Test cases:

  1. The Series has index ['Mon','Tue','Wed','Thu'] and values [101, 102, 99, 100].
  2. The DataFrame has columns open, high, low, close and the same index.
  3. A new column close_mean equals the scalar mean of close (e.g. 100.5) in every row.

Expected output (shape): A DataFrame with 4 rows, 5 columns (open, high, low, close, close_mean).

Hint

Use pd.Series(close_prices, index=index). Build the DataFrame from a dict of lists and pass index=.... For close_mean, assign the scalar df['close'].mean().

Show Solution
import pandas as pd

close_prices = [101.0, 102.0, 99.0, 100.0]
index = ['Mon', 'Tue', 'Wed', 'Thu']
open_  = [100, 101, 102, 99]
high   = [102, 103, 104, 101]
low    = [98, 100, 97, 100]

s = pd.Series(close_prices, index=index)
df = pd.DataFrame({'open': open_, 'high': high, 'low': low, 'close': close_prices}, index=index)
df['close_mean'] = df['close'].mean()
print(df)

Question 2

Task: Given a DataFrame with a close column and a datetime index, add a column close_lag1 that contains the previous bar’s close. First row should have NaN. Use the exact input below.

Input:

import pandas as pd
df = pd.DataFrame(
    {'close': [100, 101, 102, 99, 100]},
    index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'])
)

Test cases:

  1. close_lag1 on the first date is NaN.
  2. On 2024-01-02, close_lag1 is 100.
  3. On 2024-01-05, close_lag1 is 99.

Expected output: Same DataFrame with an extra column close_lag1; values aligned by date.

Hint

Use a method that shifts values down by one row so the “previous” value appears in the current row. No loops needed.

Show Solution
df['close_lag1'] = df['close'].shift(1)
print(df)
# close  close_lag1
# 2024-01-01   100         NaN
# 2024-01-02   101       100.0
# 2024-01-03   102       101.0
# 2024-01-04    99       102.0
# 2024-01-05   100        99.0

Question 3

Task: Using the same df as in Question 2 (with close and datetime index), add two columns: returns (percent change of close) and sma_2 (2-period simple moving average of close). Match the test cases below.

Input: The df from Question 2 after adding close_lag1 (or start fresh with the same close and index).

Test cases:

  1. First row: returns is NaN, sma_2 is NaN.
  2. Second row: returns ≈ (101−100)/100 = 0.01; sma_2 = (100+101)/2 = 100.5.
  3. Third row: sma_2 = (101+102)/2 = 101.5.

Expected output: DataFrame with columns close, close_lag1 (optional), returns, sma_2.

Hint

Percent change from previous row: use the method that computes (current − previous) / previous. For a 2-period average over close, use a rolling window of size 2 and take the mean.

Show Solution
df['returns'] = df['close'].pct_change()
df['sma_2'] = df['close'].rolling(2).mean()
print(df)

Next steps#

In 1.2 — Indexing and selection we cover how to select rows by date, slice ranges, and select columns. In 1.3 — Time series and DatetimeIndex we set up a proper date index for backtesting.