📊 Core Topics in Probability & Statistics
Explore the fundamental concepts that form the foundation of modern data science and statistical analysis.
📚 Essential Statistical Methods & Tests
These are the most widely used statistical techniques in research and industry applications.
🖼️ Visualizing Data & Statistical Concepts
🔗 Explore Related Mathematical Disciplines
📖 What is Probability Theory? The Mathematics of Uncertainty
Probability theory provides the mathematical framework for quantifying uncertainty. It is the foundation of statistics, machine learning, risk assessment, and decision-making under uncertainty. From predicting weather patterns to evaluating investment risks, probability theory enables us to make informed decisions in the face of incomplete information.
The Axioms of Probability
Modern probability theory rests on three fundamental axioms established by Andrey Kolmogorov in the 1930s:
- Non-negativity: P(E) ≥ 0 for any event E
- Unit measure: P(Ω) = 1 for the entire sample space
- Countable additivity: For mutually exclusive events, P(∪Eᵢ) = Σ P(Eᵢ)
Conditional Probability and Independence
Conditional probability P(A|B) measures the probability of event A occurring given that B has occurred. Two events are independent if P(A∩B) = P(A)P(B), meaning the occurrence of one provides no information about the other. Bayes' theorem provides a powerful way to update probabilities based on new evidence:
📈 Random Variables and Probability Distributions
Discrete Random Variables
A discrete random variable takes on countable values. Key distributions include:
- Bernoulli: Single trial with success probability p (coin flip)
- Binomial: Number of successes in n independent trials
- Poisson: Number of events in fixed interval (rare events, queue lengths)
- Geometric: Number of trials until first success
- Negative Binomial: Number of trials until r successes
Continuous Random Variables
Continuous random variables take values over an interval. The probability density function (PDF) gives the relative likelihood, and the cumulative distribution function (CDF) gives P(X ≤ x).
- Uniform: Constant probability over [a,b]
- Normal (Gaussian): The bell curve—most important distribution in statistics
- Exponential: Waiting times, memoryless property
- Gamma: Generalization of exponential, waiting times for multiple events
- Beta: Probabilities of probabilities, conjugate prior for binomial
- Chi-square: Sum of squared normals, used in hypothesis testing
📉 Descriptive Statistics: Summarizing Data
Measures of Central Tendency
- Mean (μ or x̄): The arithmetic average—sensitive to outliers
- Median: The middle value—robust to outliers
- Mode: The most frequent value—useful for categorical data
Measures of Dispersion
- Variance (σ²): Average squared deviation from the mean
- Standard Deviation (σ): Square root of variance—same units as data
- Interquartile Range (IQR): Range between first and third quartiles
- Range: Maximum minus minimum
Inferential Statistics: From Samples to Populations
Inferential statistics allows us to draw conclusions about populations based on samples. Key concepts include:
- Sampling Distribution: Distribution of a statistic (like sample mean) across repeated samples
- Standard Error: Standard deviation of the sampling distribution
- Confidence Intervals: Range of plausible values for a population parameter
- Margin of Error: Half-width of confidence interval
✅ Hypothesis Testing: Making Data-Driven Decisions
The Framework
Hypothesis testing provides a structured approach to decision-making under uncertainty:
- Null Hypothesis (H₀): The status quo or no effect claim
- Alternative Hypothesis (H₁): The research hypothesis or effect we seek evidence for
- Test Statistic: Calculated from sample data
- p-value: Probability of observing results as extreme as those obtained, assuming H₀ is true
- Significance Level (α): Threshold for rejecting H₀ (typically 0.05)
Common Statistical Tests
- t-test: Compare means between one or two groups
- ANOVA: Compare means across multiple groups
- Chi-square Test: Test independence between categorical variables
- F-test: Compare variances or test regression significance
- Z-test: Test proportions or means with known variance
• Type I Error: Rejecting H₀ when it is true (false positive)
• Type II Error: Failing to reject H₀ when it is false (false negative)
• Power = 1 - P(Type II Error): Probability of correctly detecting an effect
📈 Regression Analysis: Modeling Relationships
Linear Regression
Linear regression models the relationship between a dependent variable Y and one or more independent variables X. The simple linear regression model is:
The coefficients β₀ (intercept) and β₁ (slope) are estimated by minimizing the sum of squared residuals (ordinary least squares).
Key Regression Concepts
- R²: Proportion of variance explained by the model (0 to 1)
- Adjusted R²: Penalizes adding unnecessary predictors
- Residual Analysis: Check assumptions: linearity, normality, homoscedasticity
- Multicollinearity: High correlation among predictors
- Interaction Terms: Effect of one variable depends on another
Beyond Linear Regression
- Logistic Regression: Binary outcomes (yes/no, success/failure)
- Poisson Regression: Count data (number of events)
- Ridge/Lasso Regression: Regularization to prevent overfitting
- Time Series Analysis: ARIMA, SARIMA for temporal data
🔮 Bayesian Statistics: Updating Beliefs with Data
The Bayesian Paradigm
Unlike frequentist statistics, which treats parameters as fixed unknowns, Bayesian statistics treats parameters as random variables with probability distributions representing our uncertainty.
- Prior Distribution: Initial beliefs before seeing data
- Likelihood: Probability of data given parameters
- Posterior Distribution: Updated beliefs after incorporating data
Conjugate Priors
Conjugate priors are mathematically convenient because the posterior distribution has the same form as the prior. Examples include:
- Beta prior for binomial likelihood → Beta posterior
- Normal prior for normal likelihood → Normal posterior
- Gamma prior for Poisson likelihood → Gamma posterior
Markov Chain Monte Carlo (MCMC)
For complex models, we use computational methods like MCMC to sample from the posterior distribution. This enables Bayesian inference in high-dimensional spaces and has revolutionized applied statistics.
📚 How to Master Probability and Statistics
Recommended Approach
- Start with Probability: Master the fundamentals of probability before moving to statistics. Understanding random variables and distributions is essential.
- Visualize Everything: Draw distributions, plot data, and use visualization to build intuition. Tools like R, Python (matplotlib/seaborn), and Jupyter notebooks are invaluable.
- Work Through Examples: Probability and statistics come alive through real-world examples. Calculate probabilities for games, analyze datasets, and run simulations.
- Practice Hypothesis Testing: Learn to state hypotheses, choose appropriate tests, interpret p-values, and communicate results clearly.
- Code It: Implement statistical methods in Python or R. The act of coding deepens understanding of the mathematics.
Recommended Resources
- Textbooks: Ross's A First Course in Probability, Wackerly's Mathematical Statistics, Gelman's Bayesian Data Analysis, Hastie's Elements of Statistical Learning
- Online Courses: Stanford's Statistical Learning (Hastie/Tibshirani), MIT 18.650 Statistics for Applications, Coursera's Bayesian Statistics specialization
- Software: R (tidyverse, ggplot2), Python (scipy, statsmodels, PyMC3), JASP for GUI-based analysis