define short notation for tex: $\newcommand{\Embb}{\mathbb{E}}$ $\newcommand{\Fmbb}{\mathbb{F}}$ $\newcommand{\Gmbb}{\mathbb{G}}$

$\newcommand{\Acal}{\mathcal{A}}$ $\newcommand{\Fcal}{\mathcal{F}}$ $\newcommand{\Gcal}{\mathcal{G}}$

$\newcommand{\given}{\:\vert\:}$

The goal is to find a policy for ordering a single product over $T+1$ periods ($t=0,1,\cdots,T$) to minimize the expected total costs, which consists of ordering costs, holding costs, and backordering costs. The inventory level is $x_t$, the demand $d_t$ is uncertain, and the ordered amount is $a_t$. The inventory level evolves according to

$$ x_{t+1} = x_t + a_t - d_t,\quad x_0 = 0 $$

Cost functions are linear: order cost is $k_t a_t$, holding cost is $h_t[x_{t+1}]^+$, and backordering cost is $-p_t [x_{t+1}]^-$. The objective is

\begin{align*} \inf_{a_0,\cdots,a_T}\;\; & \mathbb{E}\left[\sum_{t=0}^T k_t a_t + \max\{h_t x_{t+1}, -p_t x_{t+1}\}\right] \\ \text{s.t.}\;\; & x_{t+1} = x_t + a_t - d_t \\ & x_0 = 0 \\ & a_t \geq 0, \quad \forall\, t \\ \end{align*}

In the framework of information relaxation BS2022, we can define a sequence of sets of information $\Fmbb=(\Fcal_0,\cdots,\Fcal_T)$, which is called filtration in the oringinal paper, and $\Fcal_t$ is the set of available information at period $t$ such that $\Fcal_t \subseteq \Fcal_{t+1} \subseteq \Fcal$.

A policy denoted as $\alpha_\Fmbb \in \Acal_\Fmbb$ indicates under the available information $\Fcal_t$ we will select action $a_t \in \Acal_t$. If we define the cumulative cost as

$$r(\alpha) = \sum_{t=0}^T r_t(a_t, d_t) = \sum_{t=0}^T k_t a_t + \max\{h_t(x_t + a_t - d_t), -p_t(x_t + a_t - d_t)\}$$

The objective can simply be

$$\inf_{\alpha_\Fmbb \in \Acal_\Fmbb} \Embb[r(\alpha_\Fmbb)]$$

Confer Example numerical test in Section 6.3 in BS2022, we know the cost structures, the demand distribution, we can directly solve a DP problem with backward induction for $T=4$.

We reformulation the objective function as DP

\begin{equation} V_t(x_t) = \min_{a_t \geq 0} k_t a_t + \Embb[\max\{h_t x_{t+1}, -p_t x_{t+1}\} + V_{t+1}(x_{t+1})] \label{} \end{equation}

where $x_0=0$ and $V_5(\cdot)=0$.

Notice the complexity is $|S| \times |A| \times T \times |D|$

The central idea of information relaxation is to use something not available at the current period with a penalty. Let's consider a perfect information relaxation $\Gmbb=(\Fcal,\cdots, \Fcal)$. That is, we know all the realized demand before making any order decisions. Under this new set of information, we can have $\Acal_\Gmbb = \Acal$.

According to BS2022, Theorem 3.1, with a dual feasible penalty $\pi$, we can have a lower bound (performance bound) for the inventory management problem

$$\Embb[\inf_{\alpha \in \Acal_\Gmbb} r(\alpha) - \pi(\alpha)]$$

We can select $\pi=0$, which provides a looser performance bound, and in most cases such a policy is called (hindsight) clairvoyant policy. The above problem can be reformulated as a dynamic lot-sizing problem and solved as a shortest path problem. For simplicity, define $f_t(x):= \max\{h_t x, -p_t x\}$, and then

$$\inf_{a_0,\cdots,a_T \in \Acal} \sum_{t=0}^T k_t a_t + f_t(x_{t+1})$$

The finite horizon problem can easily be solved by drawing a graph, e.g., see this paper. In the case of constant costs, the optimal decision is $a_t = \Embb[d_t]$ and hence th optimal value equals to $$\sum_{t=0}^T k_t \Embb[d_t]$$ and in the case of time-varying costs, one has to find the equivalent ordering cost using the graph, and the optimal value is $$\sum_{t=0}^T \hat{k}_t \Embb[d_t]$$

For example, if $k_{t-1} + h_{t-1} < k_t < k_{t+1} - p_t$, then $\hat{k}_t = k_{t-1} + h_{t-1}$. In this case, $a_t = 0$.


The Proposition 3.1 in BS2022 says that a dual feasible penalty function can be in the following form

$$\pi_t(a_t) = w_t(a_t) - \Embb[w_t(a_t) \given \Fcal_t]$$

where $w_t(\cdot)$ is a generating function that depends only on the set of actions until the current period $t$.

Take the generating function to be the period cost, that is,

$$\pi_t(a_t) = r_t(a_t) - \Embb[r_t(a) \given \Fcal_t]$$

The performance bound is equivalent to

$$\Embb[\inf_{\alpha \in \Acal_\Gmbb} \Embb[r(\alpha)\given \Fcal_t]]$$

where $\Fcal_t = (d_0, \cdots, d_{t-1})$. More explicitly, we reformulate the inner problem as

$$ \inf_{a_0\cdots a_T \in \Acal} \sum_{t=0}^T \left\{k_t a_t + \Embb[f_t(x_{t+1}) \given \Fcal_t]\right\} $$

This is still a dynamic lot-sizing problem with a smoothed function

$$\hat{f}_t(x_{t+1}) = \Embb[f_t(x_{t+1}) \given \Fcal_t]$$

which "takes expectation over the uncertain demand $d_t$ given a particular earlier realized demand sequence $(d_0,\cdots, d_{t-1})$."