1.2 — Indexing and Selection
1.2 — Indexing and Selection#
To implement a backtester you need to select specific rows (e.g. by date or date range) and columns (e.g. close, or your signal column). Pandas provides label-based indexing (.loc), position-based indexing (.iloc), and boolean indexing. Use these to walk through time and read only the data you need.
Used in: Project 1 — Backtesting Engine: iterating over dates, selecting current and past data without look-ahead.
Selecting columns#
- Single column → Series:
df['close']ordf.close - Multiple columns → DataFrame:
df[['open', 'high', 'low', 'close']] - New column (add or overwrite):
df['sma'] = df['close'].rolling(10).mean()
Selecting rows: .loc and .iloc#
.loc[index]— access by label (e.g. a date). Use this when your index is dates:df.loc['2024-01-15'], or a slicedf.loc['2024-01-01':'2024-01-31']..iloc[i]— access by integer position (0-based).df.iloc[0]is the first row,df.iloc[-1]the last. Slices:df.iloc[10:20]gives rows 10–19.
For a backtest you often loop by date (using .loc and the DatetimeIndex) or by integer position (.iloc) so you can use df.iloc[:i] for “all data up to (but not including) current bar” and avoid look-ahead.
Boolean indexing (filtering rows)#
Select rows where a condition is true:
# Rows where close > 100
df[df['close'] > 100]
# Rows where short_ma > long_ma (e.g. for crossover)
df[df['sma_short'] > df['sma_long']]
You can combine conditions with & (and), | (or), and parentheses. Result is a DataFrame of matching rows.
Slicing with a DatetimeIndex#
Once your index is dates (see 1.3 — Time series and DatetimeIndex):
- Single date:
df.loc['2024-06-15'] - Range (inclusive on both ends):
df.loc['2024-01-01':'2024-12-31'] - “All rows up to date X”: use
.locslice or get the position and usedf.iloc[:i+1]for the firsti+1rows.
Avoiding look-ahead in backtests#
When you compute indicators or signals at time , use only data from times . In practice:
- Compute indicators with methods that use only past data (e.g.
.rolling(n).mean()uses the current and previous bars). - When iterating, for each bar use
df.iloc[:i+1]or the equivalent date slice so you never pass future rows to your logic.
Quiz time#
Question 1
What is the difference between .loc and .iloc? When would you use each in a backtest?
Show Solution
.loc = by label (e.g. date). .iloc = by integer position. In a backtest, use .loc for date ranges; use .iloc for “first N rows” so you never use future data. Run this:
import pandas as pd
df = pd.DataFrame(
{'close': [98, 101, 102, 99, 100]},
index=pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'])
)
# .loc: by label (date)
row = df.loc['2024-01-03']
slice_by_date = df.loc['2024-01-02':'2024-01-04']
print('By date slice:\n', slice_by_date)
# .iloc: by position (for backtest loop: "past and current only")
i = 2 # current bar
past_and_current = df.iloc[:i+1]
print('Past and current (i=2):\n', past_and_current)
Question 2
How would you select all rows where close is greater than 100 and volume is greater than 1 million?
Show Solution
Use boolean indexing with & and parentheses. Full example:
import pandas as pd
df = pd.DataFrame({
'close': [99, 101, 102, 98, 103],
'volume': [2e6, 1.5e6, 0.8e6, 1.1e6, 1.2e6],
})
# Correct: combine conditions with & and parentheses
subset = df[(df['close'] > 100) & (df['volume'] > 1_000_000)]
print(subset)
# close volume
# 1 101 1500000.0
# 4 103 1200000.0 (row 2 has volume 0.8e6, so excluded when volume > 1e6)
# Wrong (operator precedence): df['close'] > 100 & df['volume'] > 1_000_000
Question 3
Why is df.iloc[:i+1] (and not df.iloc[:i] or df.iloc[:i+2]) the right slice for “all data up to and including the current bar” at iteration i? Demonstrate with a small loop.
Show Solution
:i+1 gives rows 0..i (includes current bar). :i excludes current bar; :i+2 includes the next bar (look-ahead). Run this:
import pandas as pd
df = pd.DataFrame({'close': [10, 11, 12, 13, 14]})
for i in range(2, len(df)):
# Correct: data up to and including bar i
past_and_current = df.iloc[:i+1]
# Wrong: df.iloc[:i] -> excludes row i
# Wrong: df.iloc[:i+2] -> includes row i+1 (future)
print(f"Bar i={i}, rows used (indices 0..{i}): {past_and_current['close'].tolist()}")
# Bar i=2, rows used (indices 0..2): [10, 11, 12]
# Bar i=3, rows used (indices 0..3): [10, 11, 12, 13]
# Bar i=4, rows used (indices 0..4): [10, 11, 12, 13, 14]
Next steps#
1.3 — Time series and DatetimeIndex shows how to set a DatetimeIndex and align data to a single timezone—required for reliable backtests.