# enable equation numbering
%%javascript
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } }
});
UsageError: Line magic function `%%javascript` not found.
define short notation for tex: $\newcommand{\Embb}{\mathbb{E}}$ $\newcommand{\Fmbb}{\mathbb{F}}$ $\newcommand{\Gmbb}{\mathbb{G}}$
$\newcommand{\Acal}{\mathcal{A}}$ $\newcommand{\Fcal}{\mathcal{F}}$ $\newcommand{\Gcal}{\mathcal{G}}$
$\newcommand{\given}{\:\vert\:}$
The goal is to find a policy for ordering a single product over $T+1$ periods ($t=0,1,\cdots,T$) to minimize the expected total costs, which consists of ordering costs, holding costs, and backordering costs. The inventory level is $x_t$, the demand $d_t$ is uncertain, and the ordered amount is $a_t$. The inventory level evolves according to
$$ x_{t+1} = x_t + a_t - d_t,\quad x_0 = 0 $$Cost functions are linear: order cost is $k_t a_t$, holding cost is $h_t[x_{t+1}]^+$, and backordering cost is $-p_t [x_{t+1}]^-$. The objective is
\begin{align*} \inf_{a_0,\cdots,a_T}\;\; & \mathbb{E}\left[\sum_{t=0}^T k_t a_t + \max\{h_t x_{t+1}, -p_t x_{t+1}\}\right] \\ \text{s.t.}\;\; & x_{t+1} = x_t + a_t - d_t \\ & x_0 = 0 \\ & a_t \geq 0, \quad \forall\, t \\ \end{align*}In the framework of information relaxation BS2022, we can define a sequence of sets of information $\Fmbb=(\Fcal_0,\cdots,\Fcal_T)$, which is called filtration in the oringinal paper, and $\Fcal_t$ is the set of available information at period $t$ such that $\Fcal_t \subseteq \Fcal_{t+1} \subseteq \Fcal$.
A policy denoted as $\alpha_\Fmbb \in \Acal_\Fmbb$ indicates under the available information $\Fcal_t$ we will select action $a_t \in \Acal_t$. If we define the cumulative cost as
$$r(\alpha) = \sum_{t=0}^T r_t(a_t, d_t) = \sum_{t=0}^T k_t a_t + \max\{h_t(x_t + a_t - d_t), -p_t(x_t + a_t - d_t)\}$$The objective can simply be
$$\inf_{\alpha_\Fmbb \in \Acal_\Fmbb} \Embb[r(\alpha_\Fmbb)]$$Confer Example numerical test in Section 6.3 in BS2022, we know the cost structures, the demand distribution, we can directly solve a DP problem with backward induction for $T=4$.
We reformulation the objective function as DP
\begin{equation} V_t(x_t) = \min_{a_t \geq 0} k_t a_t + \Embb[\max\{h_t x_{t+1}, -p_t x_{t+1}\} + V_{t+1}(x_{t+1})] \label{} \end{equation}where $x_0=0$ and $V_5(\cdot)=0$.
Notice the complexity is $|S| \times |A| \times T \times |D|$
import numpy as np
from scipy.stats import poisson
def opt_DP(t, param, distri=poisson,
srange=(-100, 100, 1), arange=(0, 100, 1), disp=False):
"""
Return a table of (inventory level, minimized expected total costs)
Inputs:
t ~ period index, integer
param ~ parameters, dict
distri ~ distribution, see scipy.stats
srange ~ value range of inventory level, (min, max, step)
arange ~ value range for action, (min, max, step)
Output:
table ~ hash table (dict),
key is the inventory level (integer)
value is the min expected total cost
"""
# parse parameter
T = param['T']
k = param['k'][t]
h = param['h'][t]
p = param['p'][t]
Ed = param['Ed'][t]
if t == T:
# there is no next value function
nV = None
else:
# fetch next period value function
nV = opt_DP(t+1, param, distri, srange, arange)
# solve min problem at the current period
table = dict()
for x in range(*srange):
# action indexed current value
Va = inner_problem(x, arange, distri, Ed, k, h, p, nV)
a, Vmin = min(Va.items(), key=lambda x: x[1])
table[x] = {'V': Vmin, 'a': a}
# display progress
if disp: print(f"Period {t} is finished.")
return table
def inner_problem(x, arange, distri, mu, k, h, p, tbl):
"""
Return (action, value) list when current inventory is x
Inputs:
x ~ current inventory level
arange ~ set of available actions
distri ~ distribution function
mu ~ parameter of the distribution
k ~ unit order cost
h ~ unit hold cost
p ~ unit backorder cost
tbl ~ value function of the next period
Output:
value ~ hash table
"""
value = dict()
for a in range(*arange):
value[a] = k * a
# calculate expectation
# todo, make the parameter changeable
for d in range(70):
pr = distri.pmf(d, mu)
nx = x + a - d
f = max(h*nx, -p*nx)
nV = {'V': 0, 'a':0} if tbl is None else tbl.get(nx, {'V': np.inf, 'a': np.inf})
value[a] = value[a] + pr * (f + nV['V'])
return value
# time-varying demand and constant costs
param1 = {
'T': 4,
'k': [2, 2, 2, 2, 2],
'h': [1, 1, 1, 1, -1],
'p': [9, 9, 9, 9, 11],
'Ed': [40, 40, 40, 2, 2]
}
V = opt_DP(0, param1)
print(f"The optimal value is: {V[0]['V']:.2f}")
The optimal value is: 293.93
# constant demand and time-varying costs
param2 = {
'T': 4,
'k': [7, 8, 3, 4, 1.5],
'h': [1, 1, 1, 1, -1],
'p': [9, 9, 9, 9, 11],
'Ed': [30, 30, 30, 30, 30]
}
V = opt_DP(0, param2)
print(f"The optimal value is: {V[0]['V']:.2f}")
The optimal value is: 752.45
The central idea of information relaxation is to use something not available at the current period with a penalty. Let's consider a perfect information relaxation $\Gmbb=(\Fcal,\cdots, \Fcal)$. That is, we know all the realized demand before making any order decisions. Under this new set of information, we can have $\Acal_\Gmbb = \Acal$.
According to BS2022, Theorem 3.1, with a dual feasible penalty $\pi$, we can have a lower bound (performance bound) for the inventory management problem
$$\Embb[\inf_{\alpha \in \Acal_\Gmbb} r(\alpha) - \pi(\alpha)]$$We can select $\pi=0$, which provides a looser performance bound, and in most cases such a policy is called (hindsight) clairvoyant policy. The above problem can be reformulated as a dynamic lot-sizing problem and solved as a shortest path problem. For simplicity, define $f_t(x):= \max\{h_t x, -p_t x\}$, and then
$$\inf_{a_0,\cdots,a_T \in \Acal} \sum_{t=0}^T k_t a_t + f_t(x_{t+1})$$The finite horizon problem can easily be solved by drawing a graph, e.g., see this paper. In the case of constant costs, the optimal decision is $a_t = \Embb[d_t]$ and hence th optimal value equals to $$\sum_{t=0}^T k_t \Embb[d_t]$$ and in the case of time-varying costs, one has to find the equivalent ordering cost using the graph, and the optimal value is $$\sum_{t=0}^T \hat{k}_t \Embb[d_t]$$
For example, if $k_{t-1} + h_{t-1} < k_t < k_{t+1} - p_t$, then $\hat{k}_t = k_{t-1} + h_{t-1}$. In this case, $a_t = 0$.
The Proposition 3.1 in BS2022 says that a dual feasible penalty function can be in the following form
$$\pi_t(a_t) = w_t(a_t) - \Embb[w_t(a_t) \given \Fcal_t]$$where $w_t(\cdot)$ is a generating function that depends only on the set of actions until the current period $t$.
Take the generating function to be the period cost, that is,
$$\pi_t(a_t) = r_t(a_t) - \Embb[r_t(a) \given \Fcal_t]$$The performance bound is equivalent to
$$\Embb[\inf_{\alpha \in \Acal_\Gmbb} \Embb[r(\alpha)\given \Fcal_t]]$$where $\Fcal_t = (d_0, \cdots, d_{t-1})$. More explicitly, we reformulate the inner problem as
$$ \inf_{a_0\cdots a_T \in \Acal} \sum_{t=0}^T \left\{k_t a_t + \Embb[f_t(x_{t+1}) \given \Fcal_t]\right\} $$This is still a dynamic lot-sizing problem with a smoothed function
$$\hat{f}_t(x_{t+1}) = \Embb[f_t(x_{t+1}) \given \Fcal_t]$$which "takes expectation over the uncertain demand $d_t$ given a particular earlier realized demand sequence $(d_0,\cdots, d_{t-1})$."
def inner_problem_2(dt, x, arange, distri, mu, k, h, p, tbl):
"""
Return (action, value) list when current inventory is x
Inputs:
x ~ current inventory level
arange ~ set of available actions
distri ~ distribution function
mu ~ parameter of the distribution
k ~ unit order cost
h ~ unit hold cost
p ~ unit backorder cost
tbl ~ value function of the next period
Output:
value ~ hash table
"""
value = dict()
for a in range(*arange):
value[a] = k * a
# calculate expectation
d = np.arange(100)
pr = distri.pmf(d, mu)
f = np.zeros((100, 2))
f[:,0] = h * (x + a - d)
f[:,1] = -p * (x + a - d)
nx = x + a - dt
nV = {'V': 0, 'a':0} if tbl is None else tbl.get(nx, {'V': np.inf, 'a': np.inf})
value[a] = k * a + pr @ np.max(f, axis=1) + nV['V']
return min(value.items(), key=lambda x: x[1])
def smooth_DP(t, demands, param, distri=poisson,
srange=(-100, 100, 1), arange=(0, 100, 1), disp=False):
# parse parameter
T = param['T']
k = param['k'][t]
h = param['h'][t]
p = param['p'][t]
Ed = param['Ed'][t]
dt = demands[t]
if t == T:
# there is no next value function
nV = None
else:
# fetch next period value function
nV = smooth_DP(
t+1, demands, param, distri=distri
)
# solve min problem at the current period
table = dict()
for x in range(*srange):
# action indexed current value
a, Vmin = inner_problem_2(
dt, x, arange, distri, Ed, k, h, p, nV
)
table[x] = {'V': Vmin, 'a': a}
# display progress
if disp: print(f"Period {t} is finished.")
return table
def Monte_Carlo(repeats, param, distri):
T = param['T'] + 1
demands = np.zeros((repeats, 5))
for t in range(T):
mu = param['Ed'][t]
demands[:, t] = distri.rvs(mu, size=repeats)
costs = np.zeros(repeats)
for i in range(repeats):
tbl = smooth_DP(0, demands[i,:], param)
costs[i] = tbl[0]['V']
return costs
param1 = {
'T': 4,
'k': [2, 2, 2, 2, 2],
'h': [1, 1, 1, 1, -1],
'p': [9, 9, 9, 9, 11],
'Ed': [40, 40, 40, 2, 2]
}
costs = Monte_Carlo(100, param1, poisson)
costs.mean()
293.2754128462732
costs.std()
18.45823672806469