0.3 — Python and NumPy Setup for Probability Work
0.3 — Python and NumPy Setup for Probability Work#
This lesson covers setting up your Python environment with the essential libraries for probability and statistical work in quantitative finance.
Prerequisites#
You’ll need Python 3.8 or higher. Check your version:
python --version
# or
python3 --version
Core Libraries#
For probability work, we’ll use:
- NumPy: Numerical computing, arrays, and basic probability functions - NumPy Documentation
- SciPy: Advanced statistical functions and distributions - SciPy Documentation
- Matplotlib: Visualization and plotting - Matplotlib Documentation
- pandas: Data manipulation (for time series work) - pandas Documentation
- Jupyter: Interactive notebooks (recommended) - Jupyter Documentation
Installation#
Using pip (Recommended)#
Install all essential packages at once:
pip install numpy scipy matplotlib pandas jupyter
Or if using Python 3 specifically:
pip3 install numpy scipy matplotlib pandas jupyter
Using conda#
If you’re using Anaconda or Miniconda:
conda install numpy scipy matplotlib pandas jupyter
Conda is excellent for data science work as it handles dependencies well.
Verifying Installation#
Create a test script test_setup.py:
import numpy as np
import scipy
import matplotlib
import pandas as pd
print(f"NumPy version: {np.__version__}")
print(f"SciPy version: {scipy.__version__}")
print(f"Matplotlib version: {matplotlib.__version__}")
print(f"Pandas version: {pd.__version__}")
# Test NumPy
arr = np.array([1, 2, 3, 4, 5])
print(f"\nNumPy array: {arr}")
print(f"Mean: {np.mean(arr)}")
# Test SciPy
from scipy import stats
print(f"\nSciPy stats available: {hasattr(stats, 'norm')}")
print("\n✅ All libraries installed successfully!")
Run it:
python test_setup.py
NumPy Basics for Probability#
NumPy is our primary tool for probability work. Here’s a quick overview:
Creating Arrays#
import numpy as np
# From a list
arr = np.array([1, 2, 3, 4, 5])
# Random numbers (uniform distribution)
random_uniform = np.random.uniform(0, 1, size=10)
# Random numbers (normal distribution)
random_normal = np.random.normal(0, 1, size=10) # mean=0, std=1
# Random integers
random_ints = np.random.randint(1, 7, size=10) # dice rolls
Statistical Functions#
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Basic statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
variance = np.var(data)
# Percentiles
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
Random Number Generation#
# Set seed for reproducibility
np.random.seed(42)
# Uniform distribution [0, 1)
uniform = np.random.rand(10)
# Normal distribution
normal = np.random.randn(10) # mean=0, std=1
normal_custom = np.random.normal(5, 2, size=10) # mean=5, std=2
# Discrete uniform (dice)
dice = np.random.randint(1, 7, size=10)
SciPy for Probability Distributions#
SciPy provides probability distributions and statistical functions:
from scipy import stats
# Normal distribution
norm_dist = stats.norm(loc=0, scale=1) # mean=0, std=1
# Probability density function (PDF)
pdf_value = norm_dist.pdf(0) # density at x=0
# Cumulative distribution function (CDF)
cdf_value = norm_dist.cdf(1.96) # P(X <= 1.96)
# Percent point function (inverse CDF)
ppf_value = norm_dist.ppf(0.95) # 95th percentile
# Random samples
samples = norm_dist.rvs(size=1000) # 1000 random samples
Development Environment#
Option 1: Jupyter Notebooks (Recommended)#
Jupyter is excellent for interactive probability work:
# Start Jupyter
jupyter notebook
# Or use JupyterLab (enhanced version)
jupyter lab
Advantages:
- Interactive code execution
- Mix code, text, and visualizations
- Great for experimentation
- Easy to share and document
Option 2: VS Code#
VS Code with Python extension provides:
- Full IDE features
- Integrated terminal
- Jupyter notebook support
- Debugging capabilities
Option 3: PyCharm#
Professional Python IDE with excellent scientific computing support.
Virtual Environments (Recommended)#
Using a virtual environment keeps your project dependencies isolated:
# Create virtual environment
python -m venv probability_env
# Activate (Windows)
probability_env\Scripts\activate
# Activate (macOS/Linux)
source probability_env/bin/activate
# Install packages
pip install numpy scipy matplotlib pandas jupyter
# Deactivate when done
deactivate
Quick Start Example#
Here’s a simple example combining everything:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Generate random data from normal distribution
np.random.seed(42)
data = np.random.normal(100, 15, size=1000) # mean=100, std=15
# Calculate statistics
mean = np.mean(data)
std = np.std(data)
print(f"Mean: {mean:.2f}, Std: {std:.2f}")
# Create normal distribution object
norm_dist = stats.norm(loc=mean, scale=std)
# Plot histogram and theoretical PDF
plt.hist(data, bins=30, density=True, alpha=0.7, label='Data')
x = np.linspace(data.min(), data.max(), 100)
plt.plot(x, norm_dist.pdf(x), 'r-', label='Theoretical PDF')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Normal Distribution Example')
plt.legend()
plt.show()
Common Issues and Solutions#
Issue: ModuleNotFoundError: No module named 'numpy'
- Solution: Make sure you’ve installed NumPy:
pip install numpy - Check you’re using the correct Python interpreter
Issue: Import errors in Jupyter
- Solution: Make sure you installed packages in the same environment where Jupyter is running
- Try:
python -m pip install numpy scipy matplotlib pandas
Issue: Permission errors
- Solution: Use
pip install --user numpyor use a virtual environment
Issue: Version conflicts
- Solution: Use a virtual environment to isolate dependencies
What’s Next?#
Now that you have Python and the essential libraries set up, we’re ready to dive into probability theory! In Chapter 1, we’ll start with basic probability concepts: sample spaces, events, and probability axioms.