Stochastic Processes and Simple Decisions
Review
Markov Blanket
- The Markov blanket of \(X\) is the minimal set of nodes that, if their values were known, would make \(X\) conditionally independent of all other nodes.
- Parents, Children, and other parents of Children.
Guiding Question
- What does "Markov" mean in "Markov Decision Process"?
Stochastic Process
- A stochastic process is a collection of R.V.s indexed by time.
- \(\{x_1, x_2, x_3, \ldots\}\)
- \(\{x_t\}_{t=1}^\infty\) or just \(\{x_t\}\)
Example:
\(x_0 = 0\)
\(x_{t+1} = x_t + v_t\)
\(v_t \sim \mathcal{U}(\{0,1\})\) (i.i.d.)
Shorthand: \(x' = x + v\)
In a stationary stochastic process (all in this class), this relationship does not change with time
Stochastic Process
Joint
x0 | x1 | x2 | P(x1, x2, x3) |
---|---|---|---|
0 | 0 | 0 | 0.25 |
0 | 0 | 1 | 0.25 |
0 | 1 | 1 | 0.25 |
0 | 1 | 2 | 0.25 |
\(x_0 = 0\)
\(x_{t+1} = x_t + v_t\)
\(v_t \sim \mathcal{U}(\{0,1\})\) (i.i.d.)
\[P(x_{1:n}) = \prod_{t=1}^n P(x_t \mid \text{pa}(x_t))\]
For this particular process,
\[P(x_{1:n}) = \prod_{t=1}^n P(x_t \mid x_{t-1})\]
For this particular process, since \(\text{pa}(x_t) = x_{t-1}\), if \(P(x_{t-1})\) is known,
\[= 0.5 \, P(x_{t-1} = x_t-1) + 0.5 \, P(x_{t-1} = x_t)\]
\[P(x_t)\]
\[= \sum_{k \in x_{t-1}} P\left(x_t \mid x_{t-1}=k\right) P(x_{t-1} = k)\]
Marginal
Stochastic Process
Expectation
\[E[x_t] = \sum_{x \in x_t} x P(x_t = x)\]
For this particular process, \(x_t = \sum_{i=1}^t v_t\), so
\[E[x_t] = E\left[\sum_{i=1}^t v_t\right] = \sum_{i=1}^t E[v_t] = 0.5 t\]
Expectation of a function (such as reward)
\[E[f(x_t)] = \sum_{x \in x_t} f(x) P(x_t = x)\]
Simulating a Stochastic Process
030-Stochastic-Processes.ipynb
Markov Process
- A stochastic process \(\{s_t\}\) is Markov if \[P(s_{t+1} \mid s_{t}, s_{t-1}, \ldots, s_0) = P(s_{t+1} \mid s_{t})\]
- \(s_t\) is called the "state" of the process
Break
Suppose you want to create a Markov process model that describes how many new COVID cases will start on a particular day. What information should be in the state of the model?
Assume:
- The population mixes thoroughly (i.e. there are no geographic considerations).
- COVID patients may be contagious up to 14 days after they contract the disease.
- The number of people infected by each person on day \(d\) of their illness is roughly \(\mathcal{N}(\mu_d, \sigma^2)\)
Hidden Markov Model
(Often you can't measure the whole state)
Bayesian Networks
A Bayesian Network is a directed acyclic graph (DAG) that encodes probabilistic relationships between R.V.s
- Nodes: R.V.s
- Edges: Direct probabilistic relationships
Concretely:
\(P(x_{1:n}) = \prod_i P(x_i \mid pa(x_i))\)
\(P(A, B, C) \\ = P(A)P(B \mid A) P(C \mid A)\)
Markov Process
Hidden Markov Model
Dynamic Bayesian Network
(One step)
Simple Decisions
Simple Decisions
Outcomes
Probabilities
\(S_1 \ldots S_n\)
\(p_1 \ldots p_n\)
- Completeness: Exactly one holds: \(A\succ B\), \(B \succ A\), \(A \sim B\)
- Transitivity: If \(A \succeq B\) and \(B \succeq C\), then \(A \succeq C\)
- Continuity: If \(A\succeq C \succeq B\), then there exists a probability \(p\) such that
\([A:p; B:1-p] \sim C\) - Independence: If \(A \succ B\), then for any \(C\) and probability \(p\),
\([A:p; C:1-p] \succeq [B:p; C:1-p]\)
von Neumann - Morgenstern Axioms
Lottery
\([S_1: p_1; \ldots; S_n: p_n]\)
These constraints imply a utility function \(U\) with the properties:
- \(U(A) > U(B)\) iff \(A \succ B\)
- \(U(A) = U(B)\) iff \(A \sim B\)
- \(U([S_1: p_1; \ldots; S_n: p_n]) = \sum_{i=1}^n p_i \, U(S_i)\)
Decision Networks
Value of Information
Markov Decision Process
Decision Networks and MDPs
Decision Network
Chance node
Decision node
Utility node
MDP Dynamic Decision Network
MDP Optimization problem
\[\text{maximize} \quad \text{E}\left[\sum_{t=1}^\infty r_t\right]\]
Not well formulated!
Infinite
Finite MDP Objectives
- Finite time
- Average reward
- Discounting
- Terminal States
\[\text{E} \left[ \sum_{t=0}^T r_t \right]\]
\[\underset{n \rightarrow \infty}{\text{lim}} \text{E} \left[\sum_{t=0}^n r_t \right] \]
\[\text{E} \left[\sum_{t=0}^\infty \gamma^t r_t\right]\]
Infinite time, but a terminal state (no reward, no leaving) is always reached with probability 1.
discount \(\gamma \in [0, 1)\)
typically 0.9, 0.95, 0.99
if \(\underline{r} \leq r_t \leq \bar{r}\)
then \[\frac{\underline{r}}{1-\gamma} \leq \sum_{t=0}^\infty \gamma^t r_t \leq \frac{\bar{r}}{1-\gamma} \]
Guiding Question
- What does "Markov" mean in "Markov Decision Process"?
030 Stochastic Processes
By Zachary Sunberg
030 Stochastic Processes
- 245