Introduction

  • Definition probability density function

  • Definition Kernel Density Estimate

  • Mean Shift Algorithm

  • Application to image segmentation

  • Exercises

PDF Definition

A Probability Density Function (or PDF) is a positive function \[p:\mathbb{R}^d \rightarrow \mathbb{R}^+ \quad \text{such that}\ p(\mathbf{x})\geq 0, \forall \mathbf{x}\in \mathbb{R}^d\] with constraint that \(p\) integrates to 1 \[\int_{\mathbb{R}^d} p(\mathbf{x}) \ d\mathbf{x} =1\]

PDF estimates

We may not know the PDF \(p(\mathbf{x})\) but it can be estimated using a set of samples or observations \(\lbrace \mathbf{x}^{(i)}\rbrace_{i=1,\cdots,n}\) drawn independently from \(p\) e.g. with:

  • a Kernel Density Estimate (KDE) that uses a kernel \(K\) with a bandwidth \(h\geq 0\) such that \[ \mathrm{kde}(\mathbf{x})=\frac{1}{nh^d} \sum_{i=1}^n K\left(\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right) \] with the dimension \(d=\dim(\mathbf{x})\).

  • the Empirical Probability Density Function that uses the Dirac kernel \(\delta\) (bandwidth \(h=0\)): \[ p_e(\mathbf{x})=\frac{1}{n} \sum_{i=1}^n \delta(\mathbf{x}-\mathbf{x}^{(i)}) \]

PDF estimates (case d=1)

PDF are used to compute probabilities e.g. the probability that random variable \(x\in\mathbb{R}\) (\(d=1\)) is between \(a\) and \(b\) is computed by integration : \[ \mathbb{P}( a<x<b) = \int_a^b p(x) \ dx \quad \left(\text{Note:} \ 0\leq\mathbb{P}( a<x<b) \leq 1 \right) \]

When the PDF \(p\) is not known, an estimate from the data can be used, e.g. using KDE:

\[ \mathbb{P}( a<x<b) \simeq \int_a^b \mathrm{kde}(x) \ dx = \frac{1}{nh} \sum_{i=1}^n \int_a^b K\left(\frac{x-x^{(i)}}{h}\right)\ dx \] when choosing \(K\) as the Dirac kernel:
\[ \int_a^b \delta\left(x-x^{(i)}\right)\ dx = \left\lbrace \begin{array}{l} 1 \quad \text{if $a\leq x^{(i)}\leq b$}\\ 0 \quad \text{otherwise}\\ \end{array}\right. \]

PDF estimates (case d=1)

PDF estimates (case d=1)

KDE

Kernel \(K:\mathbb{R}^d\rightarrow \mathbb{R}^{+}\) acts likewise as a elementary PDF, and KDE is a weighted sum of these elementary PDF centered on the data points \[ \mathrm{kde}(\mathbf{x})=\frac{1}{nh^d} \sum_{i=1}^n K\left(\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right) \] When using symmetric kernels, the profile can be used: as \[ \mathrm{kde}(\mathbf{x})=\frac{c_{k,d}}{nh^d} \sum_{i=1}^n k\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2\right) \]

Example: profile \(k(x)=\exp\left( \frac{-x}{2}\right), \ x \geq 0\) is associated wih the multivariate normal kernel: \[ K_N(\mathbf{x})=(2\pi)^{-d/2} \exp\left(-\frac{1}{2} \|\mathbf{x} \|^2 \right) \]

Derivatives KDE

\[ \nabla \mathrm{kde}(\mathbf{x})= \frac{2 c_{k,d}}{nh^{d+2}} \sum_{i=1}^n (\mathbf{x}-\mathbf{x}^{(i)})\ k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2\right) \]

\[ \nabla \mathrm{kde}(\mathbf{x})= \frac{2 c_{k,d}}{nh^{d+2}} \left\lbrack \sum_{i=1}^n -k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2 \right) \right\rbrack \ \underbrace{\left\lbrack \frac{\sum_{i=1}^n \mathbf{x}^{(i)} \ k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2\right)}{\sum_{i=1}^n k'\left(\left\|\frac{\mathbf{x}-\mathbf{x}^{(i)}}{h}\right\|^2 \right)} -\mathbf{x}\right\rbrack}_{m(\mathbf{x})} \] The mean-shift term \(m(\mathbf{x})\) starts with a weighted mean in the neighborhood of \(\mathbf{x}\).

Mean shift Algorithm

For each data point \(\mathbf{x}^{(k)}\in\lbrace \mathbf{x}^{(k)}\rbrace_{k=1,\cdots,n}\)

  • Initialisation: iteration \(t=0\), copy data point \(k\): \(\mathbf{x}^{(k)}_t =\mathbf{x}^{(k)}\)

    1. Compute Mean shift \(m(\mathbf{x}^{(k)}_{t})\)

    2. Push/Update the data point \[ \mathbf{x}^{(k)}_{t+1} \leftarrow \mathbf{x}^{(k)}_t+m(\mathbf{x}^{(k)}_t) \]

  • Until convergence \(\nabla \mathrm{kde}(\mathbf{x}^{(k)}_{\infty})=0\) or when \(\mathbf{x}^{(k)}_{t+1}\simeq\mathbf{x}^{(k)}_{t}\)

Example in 2D

Example in 2D

Example in 2D

Meanshift algorithm is an gradient ascent algorithm that allows to move every datum uphill until reaching a local maxima of the KDE.

Using Kernel Density Estimate with bandwidth \(h=2\)

source: https://github.com/mattnedrich/MeanShift_py

Example in 2D

Meanshift algorithm is an gradient ascent algorithm that allows to move every datum uphill until reaching a local maxima of the KDE.

Using Kernel Density Estimate with bandwidth \(h=0.8\)

source: https://github.com/mattnedrich/MeanShift_py

Application to Image Segmentation

The goal of image segmentation algorithms is to group pixels that have similar characteristics (e.g. colour statistics).

Mean Shift algorithm can be applied to pixels of an image.

Each pixel will see its colour values changed to the value of the local nearest maximum of the PDF.

Pixels/Data reaching the same local maximum are grouped together into a cluster.

Application to Image Segmentation

Application to Image Segmentation

Application to Image Segmentation

Exercises

  1. What parameter controls the result of the Mean Shift algorithm?

  2. Does the value of the bandwidth affect the number of clusters found?

  3. Would choosing different colour systems for encoding pixels (e.g. Lab, RGB) produce the same image segmentation using Mean Shift algorithm?

  4. What is the dimension \(d\) when applying Mean Shift segmentation to colour pixels? Explain.

  5. What additional information can be added to the colour information of pixels when applying Mean Shift? What would be the dimension \(d\) in that case?

  6. What other technique(s) (than KDE) do you know for estimating the PDF from a dataset ?

  7. Why would histograms not be suitable as PDF estimates for defining the Mean Shift algorithm?

Further readings

Section 7.5.2, in 
Computer Vision: Algorithms and Applications 2nd Edition
Richard Szeliski,   2021
https://szeliski.org/Book/
Mean shift: a robust approach toward feature space analysis, 
IEEE Trans. Pattern Analysis and Machine Intelligence (2002), 
https://doi.org/10.1109/34.1000236