Definition probability density function
Definition Kernel Density Estimate
Mean Shift Algorithm
Application to image segmentation
Exercises
Definition probability density function
Definition Kernel Density Estimate
Mean Shift Algorithm
Application to image segmentation
Exercises
A Probability Density Function (or PDF) is a positive function \[p:\mathbb{R}^d \rightarrow \mathbb{R}^+ \quad \text{such that}\ p(\mathbf{x})\geq 0, \forall \mathbf{x}\in \mathbb{R}^d\] with constraint that \(p\) integrates to 1 \[\int_{\mathbb{R}^d} p(\mathbf{x}) \ d\mathbf{x} =1\]
We may not know the PDF \(p(\mathbf{x})\) but it can be estimated using a set of samples or observations \(\lbrace \mathbf{x}^{(i)}\rbrace_{i=1,\cdots,n}\) drawn independently from \(p\) e.g. with:
a Kernel Density Estimate (KDE) that uses a kernel \(K\) with a bandwidth \(h\geq 0\) such that \[ \mathrm{kde}(\mathbf{x})=\frac{1}{nh^d} \sum_{i=1}^n K\left(\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right) \] with the dimension \(d=\dim(\mathbf{x})\).
the Empirical Probability Density Function that uses the Dirac kernel \(\delta\) (bandwidth \(h=0\)): \[ p_e(\mathbf{x})=\frac{1}{n} \sum_{i=1}^n \delta(\mathbf{x}-\mathbf{x}^{(i)}) \]
PDF are used to compute probabilities e.g. the probability that random variable \(x\in\mathbb{R}\) (\(d=1\)) is between \(a\) and \(b\) is computed by integration : \[ \mathbb{P}( a<x<b) = \int_a^b p(x) \ dx \quad \left(\text{Note:} \ 0\leq\mathbb{P}( a<x<b) \leq 1 \right) \]
When the PDF \(p\) is not known, an estimate from the data can be used, e.g. using KDE:
\[
\mathbb{P}( a<x<b) \simeq \int_a^b \mathrm{kde}(x) \ dx = \frac{1}{nh} \sum_{i=1}^n \int_a^b K\left(\frac{x-x^{(i)}}{h}\right)\ dx
\] when choosing \(K\) as the Dirac kernel:
\[
\int_a^b \delta\left(x-x^{(i)}\right)\ dx =
\left\lbrace \begin{array}{l}
1 \quad \text{if $a\leq x^{(i)}\leq b$}\\
0 \quad \text{otherwise}\\
\end{array}\right.
\]
Kernel \(K:\mathbb{R}^d\rightarrow \mathbb{R}^{+}\) acts likewise as a elementary PDF, and KDE is a weighted sum of these elementary PDF centered on the data points \[ \mathrm{kde}(\mathbf{x})=\frac{1}{nh^d} \sum_{i=1}^n K\left(\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right) \] When using symmetric kernels, the profile can be used: as \[ \mathrm{kde}(\mathbf{x})=\frac{c_{k,d}}{nh^d} \sum_{i=1}^n k\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2\right) \]
Example: profile \(k(x)=\exp\left( \frac{-x}{2}\right), \ x \geq 0\) is associated wih the multivariate normal kernel: \[ K_N(\mathbf{x})=(2\pi)^{-d/2} \exp\left(-\frac{1}{2} \|\mathbf{x} \|^2 \right) \]
\[ \nabla \mathrm{kde}(\mathbf{x})= \frac{2 c_{k,d}}{nh^{d+2}} \sum_{i=1}^n (\mathbf{x}-\mathbf{x}^{(i)})\ k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2\right) \]
\[ \nabla \mathrm{kde}(\mathbf{x})= \frac{2 c_{k,d}}{nh^{d+2}} \left\lbrack \sum_{i=1}^n -k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2 \right) \right\rbrack \ \underbrace{\left\lbrack \frac{\sum_{i=1}^n \mathbf{x}^{(i)} \ k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2\right)}{\sum_{i=1}^n k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2 \right)} -\mathbf{x}\right\rbrack}_{m(\mathbf{x})} \] The mean-shift term \(m(\mathbf{x})\) starts with a weighted mean in the neighborhood of \(\mathbf{x}\).
For each data point \(\mathbf{x}^{(k)}\in\lbrace \mathbf{x}^{(k)}\rbrace_{k=1,\cdots,n}\)
Initialisation: iteration \(t=0\), copy data point \(k\): \(\mathbf{x}^{(k)}_t =\mathbf{x}^{(k)}\)
Compute Mean shift \(m(\mathbf{x}^{(k)}_{t})\)
Push/Update the data point \[ \mathbf{x}^{(k)}_{t+1} \leftarrow \mathbf{x}^{(k)}_t+m(\mathbf{x}^{(k)}_t) \]
Until convergence \(\nabla \mathrm{kde}(\mathbf{x}^{(k)}_{\infty})=0\) or when \(\mathbf{x}^{(k)}_{t+1}\simeq\mathbf{x}^{(k)}_{t}\)
Example of a dataset \(\mathbf{x}^{(i)} \in \mathbb{R}^2, \forall i=1,\cdots,n\) |
source: https://github.com/mattnedrich/MeanShift_py |
Kernel Density Estimate KDE using bandwidth \(h=2\) |
source: https://github.com/mattnedrich/MeanShift_py |
Meanshift algorithm is an gradient ascent algorithm that allows to move every datum uphill until reaching a local maxima of the KDE. Using Kernel Density Estimate with bandwidth \(h=2\) |
source: https://github.com/mattnedrich/MeanShift_py |
Meanshift algorithm is an gradient ascent algorithm that allows to move every datum uphill until reaching a local maxima of the KDE. Using Kernel Density Estimate with bandwidth \(h=0.8\) |
source: https://github.com/mattnedrich/MeanShift_py |
The goal of image segmentation algorithms is to group pixels that have similar characteristics (e.g. colour statistics).
Mean Shift algorithm can be applied to pixels of an image.
Each pixel will see its colour values changed to the value of the local nearest maximum of the PDF.
Pixels/Data reaching the same local maximum are grouped together into a cluster.
source: https://github.com/mattnedrich/MeanShift_py |
source: https://github.com/mattnedrich/MeanShift_py |
source: https://github.com/mattnedrich/MeanShift_py |
What parameter controls the result of the Mean Shift algorithm?
Does the value of the bandwidth affect the number of clusters found?
Would choosing different colour systems for encoding pixels (e.g. Lab, RGB) produce the same image segmentation using Mean Shift algorithm?
What is the dimension \(d\) when applying Mean Shift segmentation to colour pixels? Explain.
What additional information can be added to the colour information of pixels when applying Mean Shift? What would be the dimension \(d\) in that case?
What other technique(s) (than KDE) do you know for estimating the PDF from a dataset ?
Why would histograms not be suitable as PDF estimates for defining the Mean Shift algorithm?
Section 7.5.2, in Computer Vision: Algorithms and Applications 2nd Edition Richard Szeliski, 2021 https://szeliski.org/Book/
Mean shift: a robust approach toward feature space analysis, IEEE Trans. Pattern Analysis and Machine Intelligence (2002), https://doi.org/10.1109/34.1000236