Bayesian Networks and Inference
Bayesian Networks
Today:
- Bayesian Networks
- How do we perform exact inference on Bayesian Networks?
- How do we reason about independence in Bayesian Networks?
Review
Review
Independence
\(P(X, Y) = P(X)\, P(Y)\)
Conditional Independence
\(P(X, Y \mid Z) = P(X \mid Z) \, P(Y \mid Z)\)
2022 Quiz 1
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454478/pasted-from-clipboard.png)
Bayesian Network
Bayesian Network: Directed Acyclic Graph (DAG) that represents a joint probability distribution
- Node:
- Edges encode:
Random Variable
\[P(X_{1:n}) = \prod_{i=1}^n P(X_i \mid \text{pa}(X_i))\]
Binary Random Variables \(X_1\), \(X_2\), \(X_3\)
How many independent parameters to specify joint distribution?
7
For \(n\) binary R.V.s, \(2^n-1\) independent parameters specify the joint distribution.
In general \[\prod_{i=1}^n |\text{support}(X_i)| - 1\]
Full Story: "Causality: Models, Reasoning and Inference" by Judea Pearl
Next Year: emphsize structure and params
Counting Parameters
For discrete R.V.s:
\[\text{dim}(\theta_X) = \left(|\text{support}(X)|-1\right)\prod_{Y \in Pa(X)} |\text{support}(Y)|\]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
Inference
Inputs
- Bayesian network structure
- Bayesian network parameters
- Values of evidence variables
Outputs
- Posterior distribution of query variables
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
Given that you have detected a trajectory deviation, and the battery has not failed what is the probability of a solar panel failure?
\(P(S=1 \mid D=1, B=0)\)
Exact
Approximate
Exact Inference
Exact Inference
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
\[P(S{=}1 \mid D{=}1, B{=}0)\]
\[= \frac{P(S{=}1, D{=}1, B{=}0)}{P(D{=}1, B{=}0)}\]
\[P(S{=}1, D{=}1, B{=}0) = \sum_{e, c}P(B{=}0, S{=}1, E{=}e, D{=}1, C{=}c)\]
\[P(B{=}0, S{=}1, E, D{=}1, C)\]
\[= P(B{=}0)\,P(S{=}1)\,P(E\mid B{=}0, S{=}1)\,P(D{=}1\mid E)\,P(C{=}1\mid E)\]
Exact Inference
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463883/pasted-from-clipboard.png)
Product
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463887/pasted-from-clipboard.png)
Condition
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463892/pasted-from-clipboard.png)
Marginalize
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463897/pasted-from-clipboard.png)
Exact Inference: Variable Elimination
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463915/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463916/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463919/pasted-from-clipboard.png)
Start with
Eliminate \(D\) and \(C\) (evidence) to get \(\phi_6(E)\) and \(\phi_7(E)\)
Eliminate \(E\)
Eliminate \(S\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464065/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464066/pasted-from-clipboard.png)
vs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464070/pasted-from-clipboard.png)
Choosing optimal order is NP-hard
Next Year: Skip
Approximate Inference
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
Approximate Inference: Direct Sampling
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464133/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464136/pasted-from-clipboard.png)
Analogous to
unweighted particle filtering
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
Approximate Inference: Weighted Sampling
Analogous to
weighted particle filtering
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464155/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464160/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
Approximate Inference: Gibbs Sampling
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9464172/pasted-from-clipboard.png)
Markov Chain Monte Carlo (MCMC)
Break
What does conditional independence mean?
All of \(X\)'s influence on \(Y\) comes through \(Z\)
\(X \perp Y \mid Z\)
\(\implies\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463389/pasted-from-clipboard.png)
\(A \perp C \mid B\) ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463394/pasted-from-clipboard.png)
Mediator
Yes
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463398/pasted-from-clipboard.png)
\(B \perp C \mid A\) ?
Confounder
Yes
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463399/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463403/pasted-from-clipboard.png)
\(B \perp C \mid A\) ?
Collider
Inconclusive
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9463409/pasted-from-clipboard.png)
https://kunalmenda.com/2019/02/21/causation-and-correlation/
\(P(X \mid Z) = P(X \mid Y, Z)\)
More Complex Example
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454521/pasted-from-clipboard.png)
\((B\perp D \mid A)\) ?
\((B\perp D \mid E)\) ?
Yes!
No
Why is this relevant?
d-Separation
- The path contains a chain \(X \rightarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
- The path contains a fork \(X \leftarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
- The path contains an inverted fork (v-structure) \(X \rightarrow Y \leftarrow Z\) such that \(Y\) is not in \(\mathcal{C}\) and no descendant of \(Y\) is in \(\mathcal{C}\).
Let \(\mathcal{C}\) be a set of random variables.
A path between \(A\) and \(B\) is d-separated* by \(\mathcal{C}\) if any of the following are true
We say that \(A\) and \(B\) are d-separated by \(\mathcal{C}\) if all paths between \(A\) and \(B\) are d-separated by \(\mathcal{C}\).
If \(A\) and \(B\) are d-separated by \(\mathcal{C}\) then \(A \perp B \mid \mathcal{C}\)
*short for "directionally separated"
Proving Conditional Independence
- The path contains a chain \(X \rightarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
- The path contains a fork \(X \leftarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
- The path contains an inverted fork (v-structure) \(X \rightarrow Y \leftarrow Z\) such that \(Y \notin \mathcal{C}\) and no descendant of \(Y\) is in \(\mathcal{C}\).
- Enumerate all (non-cyclic) paths between nodes in question
- Check all paths for d-separation
- If all paths d-separated, then CE
Example: \((B \perp D \mid C, E)\) ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454521/pasted-from-clipboard.png)
Exercise
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
\(D \perp C \mid B\) ?
\(D \perp C \mid E\) ?
- The path contains a chain \(X \rightarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
- The path contains a fork \(X \leftarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
- The path contains an inverted fork (v-structure) \(X \rightarrow Y \leftarrow Z\) such that \(Y \notin \mathcal{C}\) and no descendant of \(Y\) is in \(\mathcal{C}\).
Sampling from a Bayesian Network
Given a Bayesian network, how do we sample from the joint distribution it defines?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9454664/pasted-from-clipboard.png)
- Topoligical Sort (If there is an edge \(A \rightarrow B\), then \(A\) comes before \(B\))
- Sample from conditional distributions in order of the topological sort
Analogous to Simulating a (PO)MDP
Recap
025 Bayesian Networks
By Zachary Sunberg
025 Bayesian Networks
- 110