Bayesian Networks and Inference

Bayesian Networks

Today:

  • Bayesian Networks
  • How do we perform exact inference on Bayesian Networks?
  • How do we reason about independence in Bayesian Networks?

Review

Review

Independence

\(P(X, Y) = P(X)\, P(Y)\)

Conditional Independence

\(P(X, Y \mid Z) = P(X \mid Z) \, P(Y \mid Z)\)

 2022 Quiz 1

Bayesian Network

Bayesian Network: Directed Acyclic Graph (DAG) that represents a joint probability distribution

  • Node:
  • Edges encode:

Random Variable

\[P(X_{1:n}) = \prod_{i=1}^n P(X_i \mid \text{pa}(X_i))\]

Binary Random Variables \(X_1\), \(X_2\), \(X_3\)

How many independent parameters to specify joint distribution?

7

For \(n\) binary R.V.s, \(2^n-1\) independent parameters specify the joint distribution.

In general \[\prod_{i=1}^n |\text{support}(X_i)| - 1\]

Full Story: "Causality: Models, Reasoning and Inference" by Judea Pearl

Next Year: emphsize structure and params

Counting Parameters

For discrete R.V.s:

\[\text{dim}(\theta_X) = \left(|\text{support}(X)|-1\right)\prod_{Y \in Pa(X)} |\text{support}(Y)|\]

Inference

Inputs

  • Bayesian network structure
  • Bayesian network parameters
  • Values of evidence variables

Outputs

  • Posterior distribution of query variables

Given that you have detected a trajectory deviation, and the battery has not failed what is the probability of a solar panel failure?

\(P(S=1 \mid D=1, B=0)\)

Exact

Approximate

Exact Inference

Exact Inference

\[P(S{=}1 \mid D{=}1, B{=}0)\]

\[= \frac{P(S{=}1, D{=}1, B{=}0)}{P(D{=}1, B{=}0)}\]

\[P(S{=}1, D{=}1, B{=}0) = \sum_{e, c}P(B{=}0, S{=}1, E{=}e, D{=}1, C{=}c)\]

\[P(B{=}0, S{=}1, E, D{=}1, C)\]

\[= P(B{=}0)\,P(S{=}1)\,P(E\mid B{=}0, S{=}1)\,P(D{=}1\mid E)\,P(C{=}1\mid E)\]

Exact Inference

Product

Condition

Marginalize

Exact Inference: Variable Elimination

Start with

Eliminate \(D\) and \(C\) (evidence) to get \(\phi_6(E)\) and \(\phi_7(E)\)

Eliminate \(E\)

Eliminate \(S\)

vs

Choosing optimal order is NP-hard

Next Year: Skip

Approximate Inference

Approximate Inference: Direct Sampling

Analogous to

unweighted particle filtering

Approximate Inference: Weighted Sampling

Analogous to

weighted particle filtering

Approximate Inference: Gibbs Sampling

Markov Chain Monte Carlo (MCMC)

Break

What does conditional independence mean?

All of \(X\)'s influence on \(Y\) comes through \(Z\)

\(X \perp Y \mid Z\)

\(\implies\)

\(A \perp C \mid B\)  ?

Mediator

Yes

\(B \perp C \mid A\)  ?

Confounder

Yes

\(B \perp C \mid A\)  ?

Collider

Inconclusive

https://kunalmenda.com/2019/02/21/causation-and-correlation/

\(P(X \mid Z) = P(X \mid Y, Z)\)

More Complex Example

\((B\perp D \mid A)\) ?

\((B\perp D \mid E)\) ?

Yes!

No

Why is this relevant?

d-Separation

  1. The path contains a chain \(X \rightarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
  2. The path contains a fork \(X \leftarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
  3. The path contains an inverted fork (v-structure) \(X \rightarrow Y \leftarrow Z\) such that \(Y\) is not in \(\mathcal{C}\) and no descendant of \(Y\) is in \(\mathcal{C}\).

Let \(\mathcal{C}\) be a set of random variables.

A path between \(A\) and \(B\) is d-separated* by \(\mathcal{C}\) if any of the following are true

We say that \(A\) and \(B\) are d-separated by \(\mathcal{C}\) if all paths between \(A\) and \(B\) are d-separated by \(\mathcal{C}\).

If \(A\) and \(B\) are d-separated by \(\mathcal{C}\) then \(A \perp B \mid \mathcal{C}\)

*short for "directionally separated"

Proving Conditional Independence

  1. The path contains a chain \(X \rightarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
  2. The path contains a fork \(X \leftarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
  3. The path contains an inverted fork (v-structure) \(X \rightarrow Y \leftarrow Z\) such that \(Y \notin \mathcal{C}\) and no descendant of \(Y\) is in \(\mathcal{C}\).
  1. Enumerate all (non-cyclic) paths between nodes in question
  2. Check all paths for d-separation
  3. If all paths d-separated, then CE

Example: \((B \perp D \mid C, E)\) ?

Exercise

\(D \perp C \mid B\) ?

\(D \perp C \mid E\) ?

  1. The path contains a chain \(X \rightarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
  2. The path contains a fork \(X \leftarrow Y \rightarrow Z\) such that \(Y \in \mathcal{C}\)
  3. The path contains an inverted fork (v-structure) \(X \rightarrow Y \leftarrow Z\) such that \(Y \notin \mathcal{C}\) and no descendant of \(Y\) is in \(\mathcal{C}\).

Sampling from a Bayesian Network

Given a Bayesian network, how do we sample from the joint distribution it defines?

  1. Topoligical Sort (If there is an edge \(A \rightarrow B\), then \(A\) comes before \(B\))
  2. Sample from conditional distributions in order of the topological sort

Analogous to Simulating a (PO)MDP

Recap

025 Bayesian Networks

By Zachary Sunberg

025 Bayesian Networks

  • 45