DFFS and DIFS
PCA
1 Introduction
In this lecture, we look at how to use eigenspaces defined with principal components computed with PCA:
Difference-From-Feature-Space (DFFS)
Distance-In-Feature-Space (DIFS)
which are both defined in Probabilistic visual learning for object representation B. Moghaddam and A. Pentland (PAMI 1997).
2 Representation with Principal Components
Having applied PCA to a dataset {xn}n=1,⋯,N
with ∀n,xn∈RD, we note ¯x∈RD the
mean vector and U=[u1,u2,⋯,uD] the D eigenvectors (principal components) of
the covariance matrix S
are sorted such that:
λ1≥λ2≥⋯≥λD≥0 with λn the
eigenvalue associated with eigenvector un, ∀n.
Because uTiuj=0,∀i≠j (orthogonality), uTnun=1,∀n (unit norm), the D eigenvectors form a orthonormal basis of the space RD.
Any vector x∈RD can be reconstructed in that basis of eigenvectors: x=¯x+D∑i=1αi ui or with notation ˜x=x−¯x ˜x=D∑i=1αi ui with the coordinates αi=(x−¯x)Tui, ∀i=1,⋯,D.
The vector of coordinates αα=[α1,⋯,αD]T represents x in a space centered on the mean ¯x and orthonormal basis defined by the eigenvectors {ui}i=1,⋯,D:
x=[x1x2⋮xD]=[¯x1¯x2⋮¯xD]+D∑i=1αi ui or x=[¯x1¯x2⋮¯xD]⏟¯x+α1[u1,1u1,2⋮u1,D]⏟u1+α2[u2,1u2,2⋮u2,D]⏟u2+ ... +αD[uD,1uD,2⋮uD,D]⏟uD
This representation with αα is a more useful
representation than the original representation of x in the space centered on the
origin 00D and with the
standard
basis:
x=[x1x2⋮xD]=[00⋮0]+D∑i=1xi ei=[00⋮0]⏟00D+x1[10⋮0]⏟e1+x2[01⋮0]⏟e2+ ... +xD[00⋮1]⏟eD
3 PCA Whitening
PCA Whitening corresponds to normalising (dividing) coordinates α’s with their respective standard deviation (square root of their corresponding eigenvalue):
αWi=αi√λi
See Whitening transformation on Wikipedia for additional information.
Whitening (or batch normalisation) is a standard operation computed in Neural Networks (e.g. see Decorrelated Batch Normalization (CVPR 2018)).
4 DFFS and DIFS
A subspace F is created using the first d principal components associated with the d highest eigenvalues with: 0≤d≤D The remaining subspace noted F⊥ is described by the (D−d) remaining eigenvectors.
RD=F⊕F⊥ and F∩F⊥={0} Any vector ˜x∈RD centered on the mean ¯x can be decomposed into a vector in F and another vector in F⊥: ˜x=d∑i=1αi ui⏟˜xF∈F+D∑i=d+1αi ui⏟˜xF⊥∈F⊥
Question: Show ‖
We define two distances as a measure of similarity between any vector \mathbf{x}\in \mathbb{R}^D and a PCA-learned eigenspace F.
4.1 Distance-In-Feature-Space (DIFS)
In the feature space F, the Mahalanobis distance is often used to define the Distance-In-Feature-Space (DIFS): DIFS(\mathbf{x})=\sum_{i=1}^d \frac{\alpha_i^2}{\lambda_i}
4.2 Difference-From-Feature-Space (DFFS)
In F^{\perp}, the Distance From Feature Space (DFFS) is defined as: DFFS(\mathbf{x})=\|\tilde{\mathbf{x}}_{F^{\perp}}\|^2=\sum_{i=d+1}^D \alpha_i^2=\|\tilde{\mathbf{x}}\|^2- \|\mathbf{\tilde{x}}_{F}\|^2
In practice only the first d coordinates \lbrace \alpha_i\rbrace_{i=1,\cdots,d} are computed and these are used to compute DIFS and \|\tilde{\mathbf{x}}_{F}\|^2.
4.3 Choosing d=\dim(F)
The dimension d of the space F can be chosen using the percentage of explained variance: \text{Explained variance}(d)=100\times\frac{\sum_{i=1}^d \lambda_{i}}{\sum_{i=1}^D \lambda_{i}} F can be set (and d chosen) so that 90% of variance is explained for instance.
5 Relation to Multivariate Normal Distribution
The Multivariate Normal Distribution is a Probability Density Function defined that can be used for describing the distribution of a random vector \mathbf{x} in \mathbb{R}^D.
Only 2 parameters are needed to compute the Multivariate Normal:
\pmb{\mu} its mean
\Sigma its covariance matrix
Having a set of observations (dataset \lbrace \mathbf{x}_n\rbrace_{n=1,\cdots,N}) for random vector \mathbf{x}, then:
\pmb{\mu} can be estimated by its sample mean \overline{\mathbf{x}},
\Sigma can be estimated by the covariance matrix \mathrm{S} computed with the dataset.
p(\mathbf{x})=\frac{1}{(\sqrt{2\pi})^D} \det(\mathrm{S})^{1/2} \exp\left(-\frac{1}{2} (\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{S}^{-1} (\mathrm{x}-\overline{\mathrm{x}}) \right) Since \mathrm{S}=\mathrm{U}^T\Lambda\mathrm{U} with \mathrm{U}=[\mathbf{u}_1,\cdots,\mathbf{u}_D] (eigenvectors) and \Lambda the diagonal matrix with the eigenvalues: \Lambda= \left\lbrack \begin{array}{cccc} \lambda_1&0&0&0\\ 0&\lambda_2&0&0\\ \vdots&&\ddots&\\ 0&0&&\lambda_D\\ \end{array} \right\rbrack We have (using properties of linear algebra with \mathrm{U} and orthonormal matrix): \mathrm{S}^{-1}=\mathrm{U}^T\Lambda^{-1}\mathrm{U} so we have \begin{array}{lcl} (\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{S}^{-1} (\mathrm{x}-\overline{\mathrm{x}}) &=&(\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{U}^T\Lambda^{-1}\mathrm{U} (\mathrm{x}-\overline{\mathrm{x}})\\ &=& \lbrack\mathrm{U}(\mathbf{x}-\overline{\mathbf{x}})\rbrack^T\Lambda^{-1}\lbrack\mathrm{U} (\mathrm{x}-\overline{\mathrm{x}})\rbrack\\ &=&\pmb{\alpha}^T\Lambda^{-1}\pmb{\alpha}\\ &=&\sum_{i=1}^D \frac{\alpha_i^2}{\lambda_i}\\ &\simeq& \underbrace{\sum_{i=1}^d \frac{\alpha_i^2}{\lambda_i}}_{DIFS} +\frac{1}{\rho}\ \underbrace{\sum_{i=d+1}^D \alpha_i^2}_{DFFS}\\ \end{array} The term (\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{S}^{-1} (\mathrm{x}-\overline{\mathrm{x}}) is the Mahalanobis distance in the space \mathbb{R}^D and it can be approximated as a weighted sum of the DFFS and DIFS. The parameter \rho can be approximated as the average of the eigenvalues that were left out: \rho=\frac{1}{D-d} \sum_{i=d+1}^D \lambda_i
See wikipedia example of a Multivariate Normal Distribution in \mathbb{R}^2