Bayesian Statistics - Michael's Notes

![[Pasted image 20240710005946.png]] # Bayesian Statistics **Bayesian statistics** is a [[Probability Theory|probabilistic]] framework that blends *prior beliefs* with *observed data* to update and refine our understanding of uncertainty. Bayes’ Theorem, also known as Bayes’ Rule, is a simple equation to calculate the conditional probability > [!abstract] **Definition:** Bayes’ Theorem > $P(A | B) = \frac{P(B|A) \cdot P(A)}{P(B)} \text{, where } P(B) \ne 0$ > > $P(A)$ is the *prior probability*. It represents what we understand about $A$ before new evidence is taken into account. > a > $P(B)$ > > $P(A|B)$ is the *posterior probability*. It represents the probability of event $A$ occuring given that event $B$ is true. > > $P(B|A)$ is called the ## The Bayesian Framework ![[Pasted image 20240709090337.png|450]] Bayes’ Theorem provides the mathematical framework for the modeling used by Bayesian statistics. Let $P(H)$ be the probability of some hypothesis and $P(e)$ be the probability of $P(H|e) = \frac{P(e|H) \cdot P(H)}{P(e)} = \frac{P(e|H)P(H)}{P(e|H)P(H)+P(e|\neg H)P(\neg H)}$ $P(H | e)$ is the **posterior probability** of the parameters $\theta$ given the evidence $e$. - The probability distribution of the parameter given the evidence, $e$, and a model/hypothesis, $H$. - Sometimes referred to as the “joint a-posteriori probability distribution” $P(e|H)$ is the **likelihood** of the data given the parameters. - $P(H)$ is the **prior** probability of the parameters. - $P(e)$ is the marginal likelihood or **evidence/data** (a normalizing constant) - ### Bayesian Learning Methods **Features of Bayesian learning methods include:** - *Flexibility:* Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct, rather than completely eliminating it if its found to be inconsistent with a single example. ### Conjugate Priors ## Bayesian Inference ### Maximum A Posteriori (MAP) Hypothesis > See also: > - [[Expectation Maximization (EM)]] In simple terms, the **maximum a posteriori (MAP)** is the mode of the computed posterior. *a priori*