0.3 — Python and NumPy Setup for Probability Work

📅 January 15, 2025

0.3 — Python and NumPy Setup for Probability Work#

This lesson covers setting up your Python environment with the essential libraries for probability and statistical work in quantitative finance.

Prerequisites#

You’ll need Python 3.8 or higher. Check your version:

python --version
# or
python3 --version

Core Libraries#

For probability work, we’ll use:

NumPy: Numerical computing, arrays, and basic probability functions - NumPy Documentation
SciPy: Advanced statistical functions and distributions - SciPy Documentation
Matplotlib: Visualization and plotting - Matplotlib Documentation
pandas: Data manipulation (for time series work) - pandas Documentation
Jupyter: Interactive notebooks (recommended) - Jupyter Documentation

Installation#

Using pip (Recommended)#

Install all essential packages at once:

pip install numpy scipy matplotlib pandas jupyter

Or if using Python 3 specifically:

pip3 install numpy scipy matplotlib pandas jupyter

Using conda#

If you’re using Anaconda or Miniconda:

conda install numpy scipy matplotlib pandas jupyter

Conda is excellent for data science work as it handles dependencies well.

Verifying Installation#

Create a test script test_setup.py:

import numpy as np
import scipy
import matplotlib
import pandas as pd

print(f"NumPy version: {np.__version__}")
print(f"SciPy version: {scipy.__version__}")
print(f"Matplotlib version: {matplotlib.__version__}")
print(f"Pandas version: {pd.__version__}")

# Test NumPy
arr = np.array([1, 2, 3, 4, 5])
print(f"\nNumPy array: {arr}")
print(f"Mean: {np.mean(arr)}")

# Test SciPy
from scipy import stats
print(f"\nSciPy stats available: {hasattr(stats, 'norm')}")

print("\n✅ All libraries installed successfully!")

Run it:

python test_setup.py

NumPy Basics for Probability#

NumPy is our primary tool for probability work. Here’s a quick overview:

Creating Arrays#

import numpy as np

# From a list
arr = np.array([1, 2, 3, 4, 5])

# Random numbers (uniform distribution)
random_uniform = np.random.uniform(0, 1, size=10)

# Random numbers (normal distribution)
random_normal = np.random.normal(0, 1, size=10)  # mean=0, std=1

# Random integers
random_ints = np.random.randint(1, 7, size=10)  # dice rolls

Statistical Functions#

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Basic statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
variance = np.var(data)

# Percentiles
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)

Random Number Generation#

# Set seed for reproducibility
np.random.seed(42)

# Uniform distribution [0, 1)
uniform = np.random.rand(10)

# Normal distribution
normal = np.random.randn(10)  # mean=0, std=1
normal_custom = np.random.normal(5, 2, size=10)  # mean=5, std=2

# Discrete uniform (dice)
dice = np.random.randint(1, 7, size=10)

SciPy for Probability Distributions#

SciPy provides probability distributions and statistical functions:

from scipy import stats

# Normal distribution
norm_dist = stats.norm(loc=0, scale=1)  # mean=0, std=1

# Probability density function (PDF)
pdf_value = norm_dist.pdf(0)  # density at x=0

# Cumulative distribution function (CDF)
cdf_value = norm_dist.cdf(1.96)  # P(X <= 1.96)

# Percent point function (inverse CDF)
ppf_value = norm_dist.ppf(0.95)  # 95th percentile

# Random samples
samples = norm_dist.rvs(size=1000)  # 1000 random samples

Development Environment#

Option 1: Jupyter Notebooks (Recommended)#

Jupyter is excellent for interactive probability work:

# Start Jupyter
jupyter notebook

# Or use JupyterLab (enhanced version)
jupyter lab

Advantages:

Interactive code execution
Mix code, text, and visualizations
Great for experimentation
Easy to share and document

Option 2: VS Code#

VS Code with Python extension provides:

Full IDE features
Integrated terminal
Jupyter notebook support
Debugging capabilities

Option 3: PyCharm#

Professional Python IDE with excellent scientific computing support.

Virtual Environments (Recommended)#

Using a virtual environment keeps your project dependencies isolated:

# Create virtual environment
python -m venv probability_env

# Activate (Windows)
probability_env\Scripts\activate

# Activate (macOS/Linux)
source probability_env/bin/activate

# Install packages
pip install numpy scipy matplotlib pandas jupyter

# Deactivate when done
deactivate

Quick Start Example#

Here’s a simple example combining everything:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Generate random data from normal distribution
np.random.seed(42)
data = np.random.normal(100, 15, size=1000)  # mean=100, std=15

# Calculate statistics
mean = np.mean(data)
std = np.std(data)
print(f"Mean: {mean:.2f}, Std: {std:.2f}")

# Create normal distribution object
norm_dist = stats.norm(loc=mean, scale=std)

# Plot histogram and theoretical PDF
plt.hist(data, bins=30, density=True, alpha=0.7, label='Data')
x = np.linspace(data.min(), data.max(), 100)
plt.plot(x, norm_dist.pdf(x), 'r-', label='Theoretical PDF')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Normal Distribution Example')
plt.legend()
plt.show()

Common Issues and Solutions#

Issue: ModuleNotFoundError: No module named 'numpy'

Solution: Make sure you’ve installed NumPy: pip install numpy
Check you’re using the correct Python interpreter

Issue: Import errors in Jupyter

Solution: Make sure you installed packages in the same environment where Jupyter is running
Try: python -m pip install numpy scipy matplotlib pandas

Issue: Permission errors

Solution: Use pip install --user numpy or use a virtual environment

Issue: Version conflicts

Solution: Use a virtual environment to isolate dependencies

What’s Next?#

Now that you have Python and the essential libraries set up, we’re ready to dive into probability theory! In Chapter 1, we’ll start with basic probability concepts: sample spaces, events, and probability axioms.