Behzad Samadi

Behzad Samadi

Life's a garden. Dig it.

  • An observation is defined as: y = h ( x ) + w y=h(x)+w y=h(x)+w where x ∈ R n x\in\mathbb{R}^n xRn and y ∈ R m y\in\mathbb{R}^m yRm denote the unknown vector and the measurement vector. h : R n → R m h:\mathbb{R}^n\rightarrow\mathbb{R}^m h:RnRm is a function of x x x and w w w is the observation noise with the power density f w ( w ) f_w(w) fw(w).
  • It is assumed that x x x is a random variable with an a priori power density f x ( x ) f_x(x) fx(x) before the observation.
  • The goal is to compute the "best" estimation of x x x using the observation.

Optimal Estimation

  • The optimal estimation x ^ \hat x x^ is defined based on a cost function J J J: x ^ o p t = arg ⁡ min ⁡ x ^ E [ J ( x − x ^ ) ] \hat x_{opt}=\underset{\hat x}{\arg\min} E[J(x-\hat x)] x^opt=x^argminE[J(xx^)]
  • Some typical cost functions:
    • Minimum Mean Square Error ( x ^ M M S E \hat x_{MMSE} x^MMSE): J ( x − x ^ ) = ( x − x ^ ) T W ( x − x ^ ) , W > 0 J(x-\hat x)=(x-\hat x)^\text{T} W(x-\hat x), W>0 J(xx^)=(xx^)TW(xx^),W>0
    • Absolute Value ( x ^ A B S \hat x_{ABS} x^ABS): J ( x − x ^ ) = ∣ x − x ^ ∣ J(x-\hat x)=|x-\hat x| J(xx^)=xx^
    • Maximum a Posteriori ( x ^ M A P \hat x_{MAP} x^MAP): J ( x − x ^ ) = { 0 if  ∣ x − x ^ ∣ ≤ ϵ 1 if  ∣ x − x ^ ∣ > ϵ , ϵ → 0 J(x-\hat x)=\begin{cases} 0 & \text {if }|x-\hat x|\leq \epsilon \\\\\\ 1 & \text {if }|x-\hat x| \gt \epsilon\end{cases}, \epsilon\rightarrow 0 J(xx^)= 01if xx^ϵif xx^>ϵ,ϵ0
  • It can be shown that: x ^ M M S E ( y ) = E [ x ∣ y ] \hat x_{MMSE}(y)=E[x|y] x^MMSE(y)=E[xy] ∫ − ∞ x ^ A B S ( y ) f x ∣ y ( x ∣ y ) d x = ∫ x ^ A B S ( y ) + ∞ f x ∣ y ( x ∣ y ) d x \int_{-\infty}^{\hat x_{ABS}(y)}f_{x|y}(x|y)dx=\int_{\hat x_{ABS}(y)}^{+\infty}f_{x|y}(x|y)dx x^ABS(y)fxy(xy)dx=x^ABS(y)+fxy(xy)dx x ^ M A P = arg ⁡ max ⁡ x f x ∣ y ( x ∣ y ) \hat x_{MAP}=\arg\max_{x} f_{x|y}(x|y) x^MAP=argmaxxfxy(xy)
  • If the a posteriori density function f x ∣ y ( x ∣ y ) f_{x|y}(x|y) fxy(xy) has only one maximum and it is symmetric with respect to E [ x ∣ y ] E[x|y] E[xy] then all the above estimates are equal to E [ x ∣ y ] E[x|y] E[xy].
  • In fact, assuming these conditions for f x ∣ y ( x ∣ y ) f_{x|y}(x|y) fxy(xy), E ( x ∣ y ) E(x|y) E(xy) is the optimal estimation for any cost function J J J if J ( 0 ) = 0 J(0)=0 J(0)=0 and J ( x − x ^ ) J(x-\hat x) J(xx^) is nondecreasing with distance (Sherman's Theorem).
  • Maximum Likelihood Estimation: x ^ M L \hat x_{ML} x^ML is the value of x x x that maximizes the probability of observing y y y: x ^ M L ( y ) = arg ⁡ max ⁡ x f y ∣ x ( y ∣ x ) \hat x_{ML}(y)=\underset{x}{\arg\max} f_{y|x}(y|x) x^ML(y)=xargmaxfyx(yx)
  • It can be shown that x ^ M L = x ^ M A P \hat x_{ML}=\hat x_{MAP} x^ML=x^MAP if there is no a priori information about x x x.

Linear Gaussian Observation

  • Consider the following observation: y = A x + B w y=Ax+Bw y=Ax+Bw where w ∼ N ( 0 , I r ) w\sim\mathcal{N}(0,I_r) wN(0,Ir) is a Gaussian random vector and matrices A m × n A_{m\times n} Am×n and B m × r B_{m\times r} Bm×r are known.
  • In this observation, x x x is estimable if A A A has full column rank otherwise there will be infinite solutions for the problem.
  • If B B T BB^\text{T} BBT is invertible, then: f y ∣ x ( y ∣ x ) = 1 ( 2 π ) n / 2 ∣ B B T ∣ 1 / 2 × exp ⁡ ( − 1 2 [ ( y − A x ) T ( B B T ) − 1 ( y − A x ) ] ) \begin{align*} f_{y|x}(y|x) &= \frac{1}{(2\pi)^{n / 2}|BB^\text{T}|^{1 / 2}} \\\\\\ &\quad \times \exp\left( -\frac{1}{2} \left[(y-Ax)^\text{T} (BB^\text{T})^{-1} (y-Ax)\right] \right) \end{align*} fyx(yx)=(2π)n/2BBT1/21×exp(21[(yAx)T(BBT)1(yAx)])
  • The maximum likelihood estimation can be computed as: x ^ M L = arg ⁡ min ⁡ x ( y − A x ) T ( B B T ) − 1 ( y − A x ) = ( A T ( B B T ) − 1 A ) − 1 A T ( B B T ) − 1 y \begin{align}\hat x_{ML}=&\arg\min_x (y-Ax)^\text{T}(BB^\text{T})^{-1}(y-Ax)\\\\=&(A^\text{T}(BB^\text{T})^{-1}A)^{-1}A^\text{T}(BB^\text{T})^{-1}y\end{align} x^ML==argxmin(yAx)T(BBT)1(yAx)(AT(BBT)1A)1AT(BBT)1y
  • It is very interesting that x ^ M L \hat x_{ML} x^ML is the Weighted Least Square (WLS) solution to the following equation: y = A x y=Ax y=Ax with the weight matrix W = B B T W=BB^\text{T} W=BBT i.e. x ^ W L S = arg ⁡ min ⁡ x ( y − A x ) T W ( y − A x ) \hat x_{WLS}=\arg\min_x (y-Ax)^\text{T} W(y-Ax) x^WLS=argminx(yAx)TW(yAx)
  • x ^ M L \hat x_{ML} x^ML is an unbiased estimation: b ( x ) = E [ x − x ^ M L ∣ x ] = E [ ( A T ( B B T ) − 1 A ) − 1 A T ( B B T ) − 1 y − x ∣ x ] = 0 \begin{align*} b(x) &= E[x - \hat{x}_{ML} | x] \\ &= E\left[(A^\text{T} (BB^\text{T})^{-1} A)^{-1} A^\text{T} (BB^\text{T})^{-1} y - x \mid x\right] = 0 \end{align*} b(x)=E[xx^MLx]=E[(AT(BBT)1A)1AT(BBT)1yxx]=0
  • The covariance of the estimation error is: P e ( X ) = E [ ( x − x ^ M L ) ( x − x ^ M L ) T ∣ x ] = ( A T ( B B T ) − 1 A ) − 1 P_e(X)=E[(x-\hat x_{ML})(x-\hat x_{ML})^\text{T}|x]=(A^\text{T}(BB^\text{T})^{-1}A)^{-1} Pe(X)=E[(xx^ML)(xx^ML)Tx]=(AT(BBT)1A)1
  • x ^ M L \hat x_{ML} x^ML is efficient in the sense of Cramér Rao bound.
  • Example: Consider the following linear Gaussian observation: y = a x + w y=ax+w y=ax+w where a a a is a nonzero real number and w ∼ N ( 0 , r ) w\sim\mathcal{N}(0,r) wN(0,r) is the observation noise.
  • Maximum a Posteriori Estimation: To compute x ^ M A P \hat x_{MAP} x^MAP, it is assumed that the a priori density of x x x is Gaussian with mean m x m_x mx and variance p x p_x px: x ∼ N ( m x , p x ) x\sim\mathcal{N}(m_x,p_x) xN(mx,px)
  • The conditions of Sherman's Theorem is satisfied and therefore: x ^ M A P = E [ x ∣ y ] = m x + p x y p y ( y − m y ) = m x + a p x a 2 p x + r ( y − a m x ) = a p x y + m x r a 2 p x + r \begin{align*} \hat x_{MAP}=&E[x|y]\\ =&m_x+\frac{p_{xy}}{p_y}(y-m_y)\\ =&m_x+\frac{ap_{x}}{a^2p_x+r}(y-am_x)\\ =&\frac{ap_xy+m_xr}{a^2p_x+r} \end{align*} x^MAP====E[xy]mx+pypxy(ymy)mx+a2px+rapx(yamx)a2px+rapxy+mxr
  • Estimation bias: b M A P = E [ x − x ^ M A P ] = m x − a p x e [ y ] + r m x a 2 p x + r = m x − a 2 p x m x + r m x a 2 p x + r = 0 b_{MAP}=E[x-\hat x_{MAP}]=m_x-\frac{ap_xe[y]+rm_x}{a^2p_x+r}=m_x-\frac{a^2p_xm_x+rm_x}{a^2p_x+r}=0 bMAP=E[xx^MAP]=mxa2px+rapxe[y]+rmx=mxa2px+ra2pxmx+rmx=0
  • Estimation error covariance: p M A P = E [ ( x − x ^ M A P ) 2 ] = p x − a 2 p x 2 a 2 p x + r = p x r a 2 p x + r p_{MAP}=E[(x-\hat x_{MAP})^2]=p_x-\frac{a^2p_x^2}{a^2p_x+r}=\frac{p_xr}{a^2p_x+r} pMAP=E[(xx^MAP)2]=pxa2px+ra2px2=a2px+rpxr
  • Maximum Likelihood Estimation: For this example, we have: f y ∣ x ( y ∣ x ) = f w ( y − a x ) = 1 2 π r exp ⁡ ( − ( y − a x ) 2 2 r ) f_{y|x}(y|x)=f_w(y-ax)=\frac{1}{\sqrt{2\pi r}}\exp(-\frac{(y-ax)^2}{2r}) fyx(yx)=fw(yax)=2πr 1exp(2r(yax)2)
  • With this information: x ^ M L = arg ⁡ max ⁡ x f y ∣ x ( y ∣ x ) = y a \hat x_{ML}=\arg\max_x f_{y|x}(y|x)=\frac{y}{a} x^ML=argmaxxfyx(yx)=ay
  • Estimation bias: b M L = E [ x − x ^ M L ∣ x ] = x − a x x = 0 b_{ML}=E[x-\hat x_{ML}|x]=x-\frac{ax}{x}=0 bML=E[xx^MLx]=xxax=0
  • Estimation error covariance: p M L = E [ ( x − x ^ M L ) 2 ∣ x ] = E [ ( x − a x + w a ) 2 ] = r a 2 p_{ML}=E[(x-\hat x_{ML})^2|x]=E[(x-\frac{ax+w}{a})^2]=\frac{r}{a^2} pML=E[(xx^ML)2x]=E[(xaax+w)2]=a2r
  • Comparing x M A P x_{MAP} xMAP and x M L x_{ML} xML, we have: lim ⁡ p x → + ∞ x ^ M A P = x ^ M L \lim_{p_x\rightarrow +\infty}\hat x_{MAP}=\hat x_{ML} limpx+x^MAP=x^ML It means that if there is no a priori information about x x x, the two estimations are equal.
  • For the error covariance, we have: 1 p M A P = 1 p M L + 1 p x \frac{1}{p_{MAP}}=\frac{1}{p_{ML}}+\frac{1}{p_x} pMAP1=pML1+px1
  • In other words, information after observation is the sum of information of the observation and information before the observation.
  • Estimation error covariance: lim ⁡ p x → + ∞ p ^ M A P = p ^ M L \lim_{p_x\rightarrow +\infty}\hat p_{MAP}=\hat p_{ML} limpx+p^MAP=p^ML
  • It is possible to include a priori information in maximum likelihood estimation.
  • A priori distribution of x x x, N ( m x , p x ) \mathcal{N}(m_x,p_x) N(mx,px), can be rewritten as the following observation: m x = x + v m_x=x+v mx=x+v where v ∼ N ( 0 , p x ) v\sim\mathcal{N}(0,p_x) vN(0,px) is the observation noise.
  • Combined observation: z = A x + u z=Ax+u z=Ax+u where: z = [ m x y ] , A = [ 1 0 0 a ] , u = [ v w ] z=\left[\begin{matrix} m_x\\\\y\end{matrix}\right], A=\left[\begin{matrix} 1 & 0\\\\0 & a\end{matrix}\right], u=\left[\begin{matrix} v\\\\ w\end{matrix}\right] z= mxy ,A= 100a ,u= vw
  • The assumption is that v v v and w w w are independent. Therefore: u ∼ N ( 0 , [ p x 0 0 r ] ) u\sim\mathcal{N}\left(0,\left[\begin{matrix} p_x& 0\\\\0 & r\end{matrix}\right]\right) uN 0, px00r
  • Maximum liklihood estimation: x ^ M L p ( z ) = arg ⁡ max ⁡ x f z ∣ x ( z ∣ x ) = arg ⁡ min ⁡ x ( ( m x − x ) 2 p x + ( y − a x ) 2 r ) = a p x y + m x r a 2 p x + r = x ^ M A P \begin{align*}\hat x_{MLp}(z)=&\arg\max_x f_{z|x}(z|x)\\\\\\ =&\arg\min_x \left(\frac{(m_x-x)^2}{p_x}+\frac{(y-ax)^2}{r}\right)\\\\\\ =&\frac{ap_xy+m_xr}{a^2p_x+r}=\hat x_{MAP}\end{align*} x^MLp(z)===argxmaxfzx(zx)argxmin(px(mxx)2+r(yax)2)a2px+rapxy+mxr=x^MAP
  • x ^ M L p \hat x_{MLp} x^MLp is unbiased and has the same error covariance as x ^ M A P \hat x_{MAP} x^MAP.
  • Therefore x ^ M L p \hat x_{MLp} x^MLp and x ^ M A P \hat x_{MAP} x^MAP are equivalent.

Standard Kalman Filter

  • Consider the following linear system: { x ( k + 1 ) = A ( k ) x ( k ) + w ( k ) y ( k ) = C ( k ) x ( k ) + v ( k ) \begin{cases}x(k+1)&=&A(k)x(k)+w(k) \\\\ y(k)&=&C(k)x(k)+v(k)\end{cases} x(k+1)y(k)==A(k)x(k)+w(k)C(k)x(k)+v(k) where x ( k ) ∈ R n x(k)\in\mathbb{R}^n x(k)Rn, y ( k ) ∈ R m y(k)\in\mathbb{R}^m y(k)Rm denote the state vector and measurement vector at time t k t_k tk.
  • w ( k ) ∼ N ( 0 , Q ( k ) ) w(k)\sim\mathcal{N}(0,Q(k)) w(k)N(0,Q(k)) and v ( k ) ∼ N ( 0 , R ( k ) ) v(k)\sim\mathcal{N}(0,R(k)) v(k)N(0,R(k)) are independent Gaussian white noise processes where R ( k ) R(k) R(k) is invertible.
  • It is assumed that there is an a priori estimation of x, denoted by x ^ − ( k ) \hat x^-(k) x^(k), which is assumed to be unbiased with a Guassian estimation error, independent of w w w and v v v: e − ( k ) ∼ N ( 0 , P − ( k ) ) e^-(k)\sim\mathcal{N}(0,P^-(k)) e(k)N(0,P(k)) where P − ( k ) P^-(k) P(k) is invertible.
  • Kalman filter is a recursive algorithm to compute the state estimation.
  • Output Measurement: Information in x ^ − ( k ) \hat x^-(k) x^(k) and y ( k ) y(k) y(k) can be written as the following observation: [ x ^ − ( k ) y ( k ) ] = [ I C ( k ) ] x ( k ) + [ e − ( k ) v ( k ) ] \left[\begin{matrix}\hat x^-(k)\\\\ y(k) \end{matrix}\right]=\left[\begin{matrix} I\\\\C(k)\end{matrix}\right] x(k)+\left[\begin{matrix} e^-(k)\\\\v(k)\end{matrix}\right] x^(k)y(k) = IC(k) x(k)+ e(k)v(k) Considering the independence of e − ( k ) e^-(k) e(k) and v ( k ) v(k) v(k), we have: [ e − ( k ) v ( k ) ] ∼ N ( 0 , [ P − ( k ) 0 0 R ( k ) ] ) \left[\begin{matrix} e^-(k)\\\\v(k) \end{matrix}\right]\sim\mathcal{N}\left(0,\left[\begin{matrix} P^-(k) & 0\\\\0& R(k) \end{matrix}\right]\right) e(k)v(k) N 0, P(k)00R(k)
  • Using the Weighted Least Square (WLS) and matrix inversion formula: ( A + B D − 1 C ) − 1 = A − 1 − A − 1 B ( D + C A − 1 B ) − 1 C A − 1 (A+BD^{-1}C)^{-1}=A^{-1}-A^{-1}B(D+CA^{-1}B)^{-1}CA^{-1} (A+BD1C)1=A1A1B(D+CA1B)1CA1
  • Assuming: K ( k ) = P − ( k ) C T ( k ) [ C ( k ) P − ( k ) C T ( k ) + R ( k ) ] − 1 K(k)=P^-(k)C^\text{T}(k)[C(k)P^-(k)C^\text{T}(k)+R(k)]^{-1} K(k)=P(k)CT(k)[C(k)P(k)CT(k)+R(k)]1
  • We have: x ^ ( k ) = x ^ − ( k ) + K ( k ) ( y ( k ) − C ( k ) x ^ − ( k ) ) \hat x(k)=\hat x^-(k)+K(k)(y(k)-C(k)\hat x^-(k)) x^(k)=x^(k)+K(k)(y(k)C(k)x^(k))
  • State estimation is the sum of a priori estimation and a multiplicand of output prediction error. Since: y ^ − ( k ) = C ( k ) x ^ − ( k ) \hat y^-(k)=C(k)\hat x^-(k) y^(k)=C(k)x^(k)
  • K ( k ) K(k) K(k) is the Kalman filter gain.
  • Estimation error covariance: P ( k ) = ( I − K ( k ) C ( k ) ) P − ( k ) P(k)=(I-K(k)C(k))P^-(k) P(k)=(IK(k)C(k))P(k)
  • Information: x ^ ( k ) = x ( k ) + e ( k ) \hat x(k)=x(k)+e(k) x^(k)=x(k)+e(k) where e ( k ) ∼ N ( 0 , P ( k ) ) e(k)\sim\mathcal{N}(0,P(k)) e(k)N(0,P(k))
  • State Update: To complete a recursive algorithm, we need to compute x ^ − ( k + 1 ) \hat x^-(k+1) x^(k+1) and P − ( k + 1 ) P^-(k+1) P(k+1).
  • Information: x ^ ( k ) = x ( k ) + e ( k )   0 = [ − I A ( k ) ] [ x ( k + 1 ) x ( k ) ] + w ( k ) \begin{align}\hat x(k)=&x(k)+e(k)\\\\\ 0=&\left[\begin{matrix} -I& A(k)\end{matrix}\right]\left[\begin{matrix} x(k+1)\\\\x(k) \end{matrix}\right]+w(k)\end{align} x^(k)= 0=x(k)+e(k)[IA(k)] x(k+1)x(k) +w(k)
  • By removing x ( k ) x(k) x(k) from the above observation, we have: A ( k ) x ^ ( k ) = x ( k + 1 ) + A ( k ) e ( k ) − w ( k ) A(k)\hat x(k)=x(k+1)+A(k)e(k)-w(k) A(k)x^(k)=x(k+1)+A(k)e(k)w(k)
  • It is easy to see: x ^ − ( k + 1 ) = A ( k ) x ^ ( k ) \hat x^-(k+1)=A(k)\hat x(k) x^(k+1)=A(k)x^(k)
  • Estimation error: e − ( k + 1 ) = A ( k ) e ( k ) − w ( k ) e^-(k+1)=A(k)e(k)-w(k) e(k+1)=A(k)e(k)w(k)
  • Estimation covariance: P − ( k + 1 ) = A ( k ) P ( k ) A T ( k ) + Q ( k ) P^-(k+1)=A(k)P(k)A^\text{T}(k)+Q(k) P(k+1)=A(k)P(k)AT(k)+Q(k)

Summary:

  • Initial Conditions: x ^ − ( k ) \hat x^-(k) x^(k) and its error covariance P − ( k ) P^-(k) P(k).
  • Gain Calculation: K ( k ) = P − ( k ) C T ( k ) [ C ( k ) P − ( k ) C T ( k ) + R ( k ) ] − 1 K(k)=P^-(k)C^\text{T}(k)[C(k)P^-(k)C^\text{T}(k)+R(k)]^{-1} K(k)=P(k)CT(k)[C(k)P(k)CT(k)+R(k)]1
  • x ^ ( k ) \hat x(k) x^(k): x ^ ( k ) = x ^ − ( k ) + K ( k ) ( y ( k ) − C ( k ) x ^ − ( k ) ) P ( k ) = ( I − K ( k ) C ( k ) ) P − ( k ) \begin{align*}\hat x(k)=&\hat x^-(k)+K(k)(y(k)-C(k)\hat x^-(k))\\\\\\ P(k)=&(I-K(k)C(k))P^-(k) \end{align*} x^(k)=P(k)=x^(k)+K(k)(y(k)C(k)x^(k))(IK(k)C(k))P(k)
  • x ^ − ( k + 1 ) \hat x^-(k+1) x^(k+1): x ^ − ( k + 1 ) = A ( k ) x ^ ( k ) P − ( k + 1 ) = A ( k ) P ( k ) A T ( k ) + Q ( k ) \begin{align*}\hat x^-(k+1)=&A(k)\hat x(k)\\\\ P^-(k+1)=&A(k)P(k)A^\text{T}(k)+Q(k)\end{align*} x^(k+1)=P(k+1)=A(k)x^(k)A(k)P(k)AT(k)+Q(k)
  • Go to gain calculation and continue the loop for k + 1 k+1 k+1.

Remarks:

  • Estimation residue: γ ( k ) = y ( k ) − C ( k ) x ^ − ( k ) \gamma(k)=y(k)-C(k)\hat x^-(k) γ(k)=y(k)C(k)x^(k)
  • Residue covariance: P γ ( k ) = C ( k ) P − ( k ) C T ( k ) + R ( k ) P_\gamma(k)=C(k)P^-(k)C^\text{T}(k)+R(k) Pγ(k)=C(k)P(k)CT(k)+R(k)
  • The residue signal is used for monitoring the performance of Kalman filter.
  • Modeling error, round off error, disturbance, correlation between input and measurement noise and other factors might cause a biased and colored residue.
  • The residue signal can be used in Fault Detection and Isolation (FDI).
  • The standard Kalman filter is not numerically robust because it contains matrix inversion. For example, the calculated error covariance matrix might not be positive definite because of computational errors.
  • There are different implementations of Kalman filter to improve the standard Kalman filter in the following aspects:
    • Computational efficiency
    • Dealing with disturbance or unknown inputs
    • Dealing singular systems (difference algebraic equations)