Processing math: 54%

DFFS and DIFS

PCA

1 Introduction

In this lecture, we look at how to use eigenspaces defined with principal components computed with PCA:

  • Difference-From-Feature-Space (DFFS)

  • Distance-In-Feature-Space (DIFS)

which are both defined in Probabilistic visual learning for object representation B. Moghaddam and A. Pentland (PAMI 1997).

2 Representation with Principal Components

Having applied PCA to a dataset {xn}n=1,,N with n,xnRD, we note ¯xRD the mean vector and U=[u1,u2,,uD] the D eigenvectors (principal components) of the covariance matrix S are sorted such that:
λ1λ2λD0 with λn the eigenvalue associated with eigenvector un, n.

Because uTiuj=0,ij (orthogonality), uTnun=1,n (unit norm), the D eigenvectors form a orthonormal basis of the space RD.

Any vector xRD can be reconstructed in that basis of eigenvectors: x=¯x+Di=1αi ui or with notation ˜x=x¯x ˜x=Di=1αi ui with the coordinates αi=(x¯x)Tui, i=1,,D.

The vector of coordinates αα=[α1,,αD]T represents x in a space centered on the mean ¯x and orthonormal basis defined by the eigenvectors {ui}i=1,,D:

x=[x1x2xD]=[¯x1¯x2¯xD]+Di=1αi ui or x=[¯x1¯x2¯xD]¯x+α1[u1,1u1,2u1,D]u1+α2[u2,1u2,2u2,D]u2+ ... +αD[uD,1uD,2uD,D]uD

This representation with αα is a more useful representation than the original representation of x in the space centered on the origin 00D and with the standard basis:
x=[x1x2xD]=[000]+Di=1xi ei=[000]00D+x1[100]e1+x2[010]e2+ ... +xD[001]eD

3 PCA Whitening

PCA Whitening corresponds to normalising (dividing) coordinates α’s with their respective standard deviation (square root of their corresponding eigenvalue):

αWi=αiλi

See Whitening transformation on Wikipedia for additional information.

Whitening (or batch normalisation) is a standard operation computed in Neural Networks (e.g. see Decorrelated Batch Normalization (CVPR 2018)).

4 DFFS and DIFS

A subspace F is created using the first d principal components associated with the d highest eigenvalues with: 0dD The remaining subspace noted F is described by the (Dd) remaining eigenvectors.

RD=FF and FF={0} Any vector ˜xRD centered on the mean ¯x can be decomposed into a vector in F and another vector in F: ˜x=di=1αi ui˜xFF+Di=d+1αi ui˜xFF

Question: Show

We define two distances as a measure of similarity between any vector \mathbf{x}\in \mathbb{R}^D and a PCA-learned eigenspace F.

4.1 Distance-In-Feature-Space (DIFS)

In the feature space F, the Mahalanobis distance is often used to define the Distance-In-Feature-Space (DIFS): DIFS(\mathbf{x})=\sum_{i=1}^d \frac{\alpha_i^2}{\lambda_i}

4.2 Difference-From-Feature-Space (DFFS)

In F^{\perp}, the Distance From Feature Space (DFFS) is defined as: DFFS(\mathbf{x})=\|\tilde{\mathbf{x}}_{F^{\perp}}\|^2=\sum_{i=d+1}^D \alpha_i^2=\|\tilde{\mathbf{x}}\|^2- \|\mathbf{\tilde{x}}_{F}\|^2

In practice only the first d coordinates \lbrace \alpha_i\rbrace_{i=1,\cdots,d} are computed and these are used to compute DIFS and \|\tilde{\mathbf{x}}_{F}\|^2.

4.3 Choosing d=\dim(F)

The dimension d of the space F can be chosen using the percentage of explained variance: \text{Explained variance}(d)=100\times\frac{\sum_{i=1}^d \lambda_{i}}{\sum_{i=1}^D \lambda_{i}} F can be set (and d chosen) so that 90% of variance is explained for instance.

5 Relation to Multivariate Normal Distribution

The Multivariate Normal Distribution is a Probability Density Function defined that can be used for describing the distribution of a random vector \mathbf{x} in \mathbb{R}^D.

Only 2 parameters are needed to compute the Multivariate Normal:

  • \pmb{\mu} its mean

  • \Sigma its covariance matrix

Having a set of observations (dataset \lbrace \mathbf{x}_n\rbrace_{n=1,\cdots,N}) for random vector \mathbf{x}, then:

  • \pmb{\mu} can be estimated by its sample mean \overline{\mathbf{x}},

  • \Sigma can be estimated by the covariance matrix \mathrm{S} computed with the dataset.

p(\mathbf{x})=\frac{1}{(\sqrt{2\pi})^D} \det(\mathrm{S})^{1/2} \exp\left(-\frac{1}{2} (\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{S}^{-1} (\mathrm{x}-\overline{\mathrm{x}}) \right) Since \mathrm{S}=\mathrm{U}^T\Lambda\mathrm{U} with \mathrm{U}=[\mathbf{u}_1,\cdots,\mathbf{u}_D] (eigenvectors) and \Lambda the diagonal matrix with the eigenvalues: \Lambda= \left\lbrack \begin{array}{cccc} \lambda_1&0&0&0\\ 0&\lambda_2&0&0\\ \vdots&&\ddots&\\ 0&0&&\lambda_D\\ \end{array} \right\rbrack We have (using properties of linear algebra with \mathrm{U} and orthonormal matrix): \mathrm{S}^{-1}=\mathrm{U}^T\Lambda^{-1}\mathrm{U} so we have \begin{array}{lcl} (\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{S}^{-1} (\mathrm{x}-\overline{\mathrm{x}}) &=&(\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{U}^T\Lambda^{-1}\mathrm{U} (\mathrm{x}-\overline{\mathrm{x}})\\ &=& \lbrack\mathrm{U}(\mathbf{x}-\overline{\mathbf{x}})\rbrack^T\Lambda^{-1}\lbrack\mathrm{U} (\mathrm{x}-\overline{\mathrm{x}})\rbrack\\ &=&\pmb{\alpha}^T\Lambda^{-1}\pmb{\alpha}\\ &=&\sum_{i=1}^D \frac{\alpha_i^2}{\lambda_i}\\ &\simeq& \underbrace{\sum_{i=1}^d \frac{\alpha_i^2}{\lambda_i}}_{DIFS} +\frac{1}{\rho}\ \underbrace{\sum_{i=d+1}^D \alpha_i^2}_{DFFS}\\ \end{array} The term (\mathbf{x}-\overline{\mathbf{x}})^T\mathrm{S}^{-1} (\mathrm{x}-\overline{\mathrm{x}}) is the Mahalanobis distance in the space \mathbb{R}^D and it can be approximated as a weighted sum of the DFFS and DIFS. The parameter \rho can be approximated as the average of the eigenvalues that were left out: \rho=\frac{1}{D-d} \sum_{i=d+1}^D \lambda_i

See wikipedia example of a Multivariate Normal Distribution in \mathbb{R}^2