Stochastic Processes and Simple Decisions

Review

Markov Blanket

  • The Markov blanket of \(X\) is the minimal set of nodes that, if their values were known, would make \(X\) conditionally independent of all other nodes.
  • Parents, Children, and other parents of Children.

Guiding Question

  • What does "Markov" mean in "Markov Decision Process"?
  • How do we find an optimal action based on maximizing expected utility?

Stochastic Process

  • A stochastic process is a collection of R.V.s indexed by time.
  • \(\{X_0, X_1, X_2, \ldots\}\) or \(\{X_t\}_{t=0}^\infty\) or just \(\{X_t\}\)

Example: Positive, Uniform Random Walk

\(X_0 = 0\)

\(X_{t+1} = X_t + V_t\)

\(V_t \sim \text{Bernoulli}(0.5)\) (i.i.d.)

In a stationary stochastic process (all in this class), this relationship does not change with time

\(P(X_{t+1} \mid X_{0:t}) = \begin{cases} 0.5 & \text{if } X_{t+1} = X_t\\ 0.5 & \text{if } X_{t+1} = X_t + 1 \\ 0 & \text{otherwise}\end{cases}\)

Bayes Net

Trajectories

Stochastic Process

  • A stochastic process is a collection of R.V.s indexed by time.
  • \(\{x_1, x_2, x_3, \ldots\}\) or \(\{x_t\}_{t=1}^\infty\) or just \(\{x_t\}\)

Example: Positive, Uniform Random Walk

\(x_0 = 0\)

\(x_{t+1} = x_t + v_t\)

\(v_t \sim \mathcal{U}(\{0,1\})\) (i.i.d.)

Shorthand: \(x' = x + v\)

In a stationary stochastic process (all in this class), this relationship does not change with time

\(P(x' \mid x) \\=\text{SparseCat}([x, x + 1], [0.5, 0.5])\)

Bayes Net

Dynamic Bayes Net (DBN)

Switch to Bernoulli

Stochastic Process

Joint

x0 x1 x2 P(x1, x2, x3)
0 0 0 0.25
0 0 1 0.25
0 1 1 0.25
0 1 2 0.25

\(x_0 = 0\)

\(x_{t+1} = x_t + v_t\)

\(v_t \sim \mathcal{U}(\{0,1\})\) (i.i.d.)

\[P(x_{1:n}) = \prod_{t=1}^n P(x_t \mid \text{pa}(x_t))\]

For this particular process,

\[P(x_{1:n}) = \prod_{t=1}^n P(x_t \mid x_{t-1})\]

For this particular process, since \(\text{pa}(x_t) = x_{t-1}\), if \(P(x_{t-1})\) is known,

 

\[= 0.5 \, P(x_{t-1} = x_t-1) + 0.5 \, P(x_{t-1} = x_t)\]

\[P(x_t)\]

\[= \sum_{k \in x_{t-1}} P\left(x_t \mid x_{t-1}=k\right) P(x_{t-1} = k)\]

Marginal

Stochastic Process

Expectation

\[E[x_t] = \sum_{x \in x_t} x P(x_t = x)\]

For this particular process, \(x_t = \sum_{i=1}^t v_t\), so

\[E[x_t] = E\left[\sum_{i=1}^t v_t\right] = \sum_{i=1}^t E[v_t] = 0.5 t\]

 

Expectation of a function (such as reward)

\[E[f(x_t)] = \sum_{x \in x_t} f(x) P(x_t = x)\]

Causal Stochastic Processes

In a causal stochastic process, \(x_t\) may depend on any \(x_\tau\) with \(\tau <t\).

In general, stochastic processes may have connections between any times in their Bayesian Network.

Remove

Simulating a Stochastic Process

030-Stochastic-Processes.ipynb

A More Complex Example

Markov Process

  • A stochastic process \(\{S_t\}\) is Markov if \[P(S_{t+1} \mid S_{0:t}) = P(S_{t+1} \mid S_{t})\] \[S_{t+1}\, \bot \, S_{t-\tau} \mid \, S_t \quad \forall \tau \in 1:t\]
  • \(S_t\) is called the "state" of the process

Hidden Markov Model

(Often you can't measure the whole state)

Bayesian Networks

A Bayesian Network is a directed acyclic graph (DAG) that encodes probabilistic relationships between R.V.s

  • Nodes: R.V.s
  • Edges: Direct probabilistic relationships

Concretely:

\(P(x_{1:n}) = \prod_i P(x_i \mid pa(x_i))\)

\(P(A, B, C) \\ = P(A)P(B \mid A) P(C \mid A)\)

Markov Process

Hidden Markov Model

Dynamic Bayesian Network

(One step)

Dynamic Bayesian Networks

Break

Suppose you want to create a Markov process model that describes how many new COVID cases will start in a particular week. What information should be in the state of the model?

Assume:

  • The population mixes thoroughly (i.e. there are no geographic considerations).
  • COVID patients may be contagious up to 2 weeks after they contract the disease.
  • Researchers have determined a probabilistic model for the number of new cases given the number of people in the first week of the disease and the number of people in the second week of the disease.

Simple Decisions

Simple Decisions

Outcomes

Probabilities

\(S_1 \ldots S_n\)

\(p_1 \ldots p_n\)

  • Completeness: Exactly one holds: \(A\succ B\), \(B \succ A\), \(A \sim B\)
  • Transitivity: If \(A \succeq B\) and \(B \succeq C\), then \(A \succeq C\)
  • Continuity: If \(A\succeq C \succeq B\), then there exists a probability \(p\) such that
    \([A:p; B:1-p] \sim C\)
  • Independence: If \(A \succ B\), then for any \(C\) and probability \(p\),
    \([A:p; C:1-p] \succeq [B:p; C:1-p]\)

von Neumann - Morgenstern Axioms

Lottery

\([S_1: p_1; \ldots; S_n: p_n]\)

These constraints imply a utility function \(U\) with the properties:

  • \(U(A) > U(B)\) iff \(A \succ B\)
  • \(U(A) = U(B)\) iff \(A \sim B\)
  • \(U([S_1: p_1; \ldots; S_n: p_n]) = \sum_{i=1}^n p_i \, U(S_i)\)

Decision Networks

Value of Information

Decision Networks and MDPs

Decision Network

Chance node

Decision node

Utility node

MDP Dynamic Decision Network

MDP Optimization problem

\[\text{maximize} \quad \text{E}\left[\sum_{t=1}^\infty r_t\right]\]

Not well formulated!

Infinite

Markov Decision Process

Finite MDP Objectives

  1. Finite time

     
  2. Average reward

     
  3. Discounting

     
  4. Terminal States

\[\text{E} \left[ \sum_{t=0}^T r_t \right]\]

\[\underset{n \rightarrow \infty}{\text{lim}} \text{E} \left[\sum_{t=0}^n r_t \right] \]

\[\text{E} \left[\sum_{t=0}^\infty \gamma^t r_t\right]\]

Infinite time, but a terminal state (no reward, no leaving) is always reached with probability 1.

discount \(\gamma \in [0, 1)\)

typically 0.9, 0.95, 0.99

if \(\underline{r} \leq r_t \leq \bar{r}\)

then \[\frac{\underline{r}}{1-\gamma} \leq \sum_{t=0}^\infty \gamma^t r_t \leq \frac{\bar{r}}{1-\gamma} \]

Maximizing Expected Utility

Value of Information

Guiding Question

  • What does "Markov" mean in "Markov Decision Process"?