Stochastic Processes and Simple Decisions

Review

Markov Blanket

The Markov blanket of \(X\) is the minimal set of nodes that, if their values were known, would make \(X\) conditionally independent of all other nodes.
Parents, Children, and other parents of Children.

Guiding Question

What does "Markov" mean in "Markov Decision Process"?

Stochastic Process

A stochastic process is a collection of R.V.s indexed by time.
\(\{x_1, x_2, x_3, \ldots\}\)
\(\{x_t\}_{t=1}^\infty\) or just \(\{x_t\}\)

Example:

\(x_0 = 0\)

\(x_{t+1} = x_t + v_t\)

\(v_t \sim \mathcal{U}(\{0,1\})\) (i.i.d.)

Shorthand: \(x' = x + v\)

In a stationary stochastic process (all in this class), this relationship does not change with time

Stochastic Process

Joint

x1	x2	P(x1, x2, x3)
0	0	0.25
0	1	0.25
1	1	0.25
1	2	0.25

\(x_0 = 0\)

\(x_{t+1} = x_t + v_t\)

\(v_t \sim \mathcal{U}(\{0,1\})\) (i.i.d.)

\[P(x_{1:n}) = \prod_{t=1}^n P(x_t \mid \text{pa}(x_t))\]

For this particular process,

\[P(x_{1:n}) = \prod_{t=1}^n P(x_t \mid x_{t-1})\]

For this particular process, since \(\text{pa}(x_t) = x_{t-1}\), if \(P(x_{t-1})\) is known,

\[= 0.5 \, P(x_{t-1} = x_t-1) + 0.5 \, P(x_{t-1} = x_t)\]

\[P(x_t)\]

\[= \sum_{k \in x_{t-1}} P\left(x_t \mid x_{t-1}=k\right) P(x_{t-1} = k)\]

Marginal

Stochastic Process

Expectation

\[E[x_t] = \sum_{x \in x_t} x P(x_t = x)\]

For this particular process, \(x_t = \sum_{i=1}^t v_t\), so

\[E[x_t] = E\left[\sum_{i=1}^t v_t\right] = \sum_{i=1}^t E[v_t] = 0.5 t\]

Expectation of a function (such as reward)

\[E[f(x_t)] = \sum_{x \in x_t} f(x) P(x_t = x)\]

Simulating a Stochastic Process

030-Stochastic-Processes.ipynb

Markov Process

A stochastic process \(\{s_t\}\) is Markov if \[P(s_{t+1} \mid s_{t}, s_{t-1}, \ldots, s_0) = P(s_{t+1} \mid s_{t})\]
\(s_t\) is called the "state" of the process

Break

Suppose you want to create a Markov process model that describes how many new COVID cases will start on a particular day. What information should be in the state of the model?

Assume:

The population mixes thoroughly (i.e. there are no geographic considerations).
COVID patients may be contagious up to 14 days after they contract the disease.
The number of people infected by each person on day \(d\) of their illness is roughly \(\mathcal{N}(\mu_d, \sigma^2)\)

Hidden Markov Model

(Often you can't measure the whole state)

Bayesian Networks

A Bayesian Network is a directed acyclic graph (DAG) that encodes probabilistic relationships between R.V.s

Nodes: R.V.s
Edges: Direct probabilistic relationships

Concretely:

\(P(x_{1:n}) = \prod_i P(x_i \mid pa(x_i))\)

\(P(A, B, C) \\ = P(A)P(B \mid A) P(C \mid A)\)

Markov Process

Hidden Markov Model

Dynamic Bayesian Network

(One step)

Simple Decisions

Outcomes

Probabilities

\(S_1 \ldots S_n\)

\(p_1 \ldots p_n\)

Completeness: Exactly one holds: \(A\succ B\), \(B \succ A\), \(A \sim B\)
Transitivity: If \(A \succeq B\) and \(B \succeq C\), then \(A \succeq C\)
Continuity: If \(A\succeq C \succeq B\), then there exists a probability \(p\) such that
\([A:p; B:1-p] \sim C\)
Independence: If \(A \succ B\), then for any \(C\) and probability \(p\),
\([A:p; C:1-p] \succeq [B:p; C:1-p]\)

von Neumann - Morgenstern Axioms

Lottery

\([S_1: p_1; \ldots; S_n: p_n]\)

These constraints imply a utility function \(U\) with the properties:

\(U(A) > U(B)\) iff \(A \succ B\)
\(U(A) = U(B)\) iff \(A \sim B\)
\(U([S_1: p_1; \ldots; S_n: p_n]) = \sum_{i=1}^n p_i \, U(S_i)\)

Decision Networks

Value of Information

Markov Decision Process

Decision Networks and MDPs

Decision Network

Chance node

Decision node

Utility node

MDP Dynamic Decision Network

MDP Optimization problem

\[\text{maximize} \quad \text{E}\left[\sum_{t=1}^\infty r_t\right]\]

Not well formulated!

Infinite

Finite MDP Objectives

Finite time
Average reward
Discounting
Terminal States

\[\text{E} \left[ \sum_{t=0}^T r_t \right]\]

\[\underset{n \rightarrow \infty}{\text{lim}} \text{E} \left[\sum_{t=0}^n r_t \right] \]

\[\text{E} \left[\sum_{t=0}^\infty \gamma^t r_t\right]\]

Infinite time, but a terminal state (no reward, no leaving) is always reached with probability 1.

discount \(\gamma \in [0, 1)\)

typically 0.9, 0.95, 0.99

if \(\underline{r} \leq r_t \leq \bar{r}\)

then \[\frac{\underline{r}}{1-\gamma} \leq \sum_{t=0}^\infty \gamma^t r_t \leq \frac{\bar{r}}{1-\gamma} \]

Stochastic Processes and Simple Decisions

Review

Markov Blanket

Guiding Question

Stochastic Process

Stochastic Process

Stochastic Process

Simulating a Stochastic Process

Markov Process

Break

Hidden Markov Model

Bayesian Networks

Simple Decisions

Simple Decisions

Decision Networks

Value of Information

Markov Decision Process

Decision Networks and MDPs

Finite MDP Objectives

Guiding Question

030 Stochastic Processes

030 Stochastic Processes

Zachary Sunberg

Stochastic Processes and Simple Decisions

Review

Markov Blanket

Guiding Question

Stochastic Process

Stochastic Process

Stochastic Process

Simulating a Stochastic Process

Markov Process

Break

Hidden Markov Model

Bayesian Networks

Simple Decisions

Simple Decisions

Decision Networks

Value of Information

Markov Decision Process

Decision Networks and MDPs

Finite MDP Objectives

Guiding Question

030 Stochastic Processes

More from Zachary Sunberg