class: center, middle, inverse, title-slide .title[ # Computer Vision ] .subtitle[ ## with Machine Learning ] .author[ ### Prof. Rozenn Dahyot ] .institute[ ###
] --- <!-- https://pkg.garrickadenbuie.com/xaringanthemer/articles/xaringanthemer.html --> ## Introduction - Human visual perception - Questions to ask before deploying ML for solving a problem - Examples of bad CV applications - ML need training data but: - dataset can be biased - data may be copyrighted - data may be private - guideline for creating datasets - Data Labelling tools - Not Enough Data? Data Augmentation! - Supervised Machine Learning: - Training and validating a machine - Testing and Using a Machine - Machine Evaluation --- ## Vision Is Our Dominant Sense .left-column[ ![eye](data:image/png;base64,#images/girl-g1f9ef4bfd_1920.jpg) ] .right-column[ - ```"Research estimates that eighty to eighty-five percent of our perception, learning, cognition, and activities are mediated through vision."``` - from [Vision Is Our Dominant Sense](https://www.brainline.org/article/vision-our-dominant-sense), T. Politzer, brainline.org - "An image is worth a 1000 words" - ```"How fake news can exploit pictures to make people believe lies"``` - from https://www.abc.net.au/news/2018-11-22/fake-news-image-information-believe-anu/10517346 ] --- ## Knowing your task & your data - *"What question(s) am I trying to answer? Do I think the data collected can answer that question?"* - *"What is the best way to phrase my question(s) as a machine learning problem?"* Is ML needed to solve this problem? - *"Have I collected enough data to represent the problem I want to solve?"* - *"What features of the data did I extract, and will these enable the right predictions?"* - *"How will I measure success in my application?"* - *"How will the machine learning solution interact with other parts of my research or business product?"* - Who can be hurt by the ML solution? What is the carbon footprint of the ML solution? .footnote[From *Introduction to Machine Learning with Python- a guide for data scientists*, A. C. Muller & s. Guido ] --- ## On using computer vision .left-column[ ![ethics](data:image/png;base64,#images/ethics-g209289325_1920.jpg) ] .right-column[ Computer vision & pseudoscience ```"Facial emotional recognition systems operate on the premise that it is possible to automatically and systematically infer the emotional state of human beings from their facial expressions, which lacks a solid scientific basis."``` ```"Given these concerns, the use of emotion recognition systems by public authorities, for instance for singling out individuals for police stops or arrests or to assess the veracity of statements during interrogations, risks undermining human rights, such as the rights to privacy, to liberty and to a fair trial."``` ] .footnote[ [From *The right to privacy in the digital age* - Report of the United Nations High Commissioner for Human Rights]( https://www.ohchr.org/EN/HRBodies/HRC/RegularSessions/Session48/Documents/A_HRC_48_31_AdvanceEditedVersion.docx), Sept. 2021 ] --- ## Discriminatory Data .left-column[ ![ethics](data:image/png;base64,#images/ethics-g209289325_1920.jpg) ] .right-column[ ```"It found that the biased datasets relied on by AI systems can lead to discriminatory decisions, which are acute risks for already marginalized groups."``` <img src="data:image/png;base64,#images/play-figures-g26cb9cb57_1280.jpg" width="50%" style="display: block; margin: auto;" /> ] .footnote[ From *Urgent action needed over artificial intelligence risks to human rights* https://news.un.org/en/story/2021/09/1099972 ] --- ## Data: Copyrights & Privacy .left-column[ ![ethics](data:image/png;base64,#images/ethics-g209289325_1920.jpg) ] .right-column[ <img src="data:image/png;base64,#images/barbie-g16d3a743a_1280.jpg" width="50%" style="display: block; margin: auto;" /> - privacy - copyrights: ```"The image was also used despite it being protected by copyright. Permission was never sought from (...) Lena herself"``` ] .footnote[From https://pursuit.unimelb.edu.au/articles/it-s-time-to-retire-lena-from-computer-science ] --- ## Going forward with data .left-column[ ```"Documentation to facilitate communication between dataset creators and consumers."``` ] .right-column[ <iframe title="vimeo-player" src="https://player.vimeo.com/video/639588440?h=a28bc8a741" width="640" height="360" frameborder="0" allowfullscreen></iframe> ] .footnote[ *Datasheets for datasets*, T. Gebru et al., Communications of the ACM December 2021 https://doi.org/10.1145/3458723 ] --- ## Data Labelling Tools .pull-left[ To create a labelled dataset suitable for training ML: taxonomy definition + *Ground truth(s)* labeling Examples software: - https://github.com/heartexlabs/label-studio - https://github.com/tzutalin/labelImg - https://github.com/wkentaro/labelme - https://github.com/opencv/cvat - https://roboflow.com/ ] .pull-right[ ![label-studio](data:image/png;base64,#https://raw.githubusercontent.com/heartexlabs/label-studio/master/images/annotation_examples.gif) ] --- ## Data Augmentation .right-column[ ![](data:image/png;base64,#https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2Fs40537-019-0197-0/MediaObjects/40537_2019_197_Fig2_HTML.png?as=webp) ] .left-column[ From *A survey on Image Data Augmentation for Deep Learning*, C. Shorten & T. M. Khoshgoftaar, J Big Data 6, 60 (2019). [DOI:10.1186/s40537-019-0197-0](https://doi.org/10.1186/s40537-019-0197-0) ] --- ## Data Augmentation .center[ Using Game Engines: https://microsoft.github.io/AirSim/ ```"Our goal is to develop AirSim as a platform for AI research to experiment with deep learning, computer vision and reinforcement learning algorithms for autonomous vehicles."``` ![Airsim](data:image/png;base64,#https://microsoft.github.io/AirSim/images/AirSimDroneManual.gif) ] --- ### Dataset example .center[ ![Supervised Machine Learning](data:image/png;base64,#images/CelebA.svg)] .footnote[https://github.com/switchablenorms/CelebAMask-HQ] --- ### Dataset split .center[ ![Supervised Machine Learning](data:image/png;base64,#images/Datasets.svg) ] Labels `\(\lbrace l_k \rbrace\)` are encoded as suitable machine output `\(\lbrace y_k \rbrace\)`. - **Training set**: for tuning the parameters `\(\theta\)` of the machine M - **Validation set**: for tuning the hyperparameters (e.g. learning rate, number of layers, etc) - **Test set**: to compete against other machines! --- ## Machine Evaluation: performance .pull-left[ - **Top-1 accuracy**: the percentage of test samples for which the machine’s top prediction matches the true label - **Top-5 accuracy**: percentage of test samples for which the true label is among the top 5 predictions made by the machine - Number of parameters `\(\equiv\dim(\theta)\)` in machine `\(f_{\theta}\)` - **FLOPS**: floating point operations per second ] .pull-right[ ![carbon footprint](data:image/png;base64,#images/metricsMatej.png) ] .footnote[ Example from *Harmonic Convolutional Networks based on Discrete Cosine Transform*, M. Ulicny et al (2022) <a href="http://doi.org/10.1016/j.patcog.2022.108707" target="_blank">DOI:10.1016/j.patcog.2022.108707</a> ] --- ## Machine Evaluation: Carbon fooprint .center[ Example ML Carbon footprint calculator https://mlco2.github.io/impact/ ![](data:image/png;base64,#https://spectrum.ieee.org/media-library/a-chart-showing-computations-billions-of-floating-point-operations.png?id=27527446&width=450) ] .footnote[From *DEEP LEARNING’S DIMINISHING RETURNS The cost of improvement is becoming unsustainable*, https://spectrum.ieee.org/deep-learning-computational-cost ] --- ### Machine Evaluation: environment impact ```"AI uses huge amounts of electricity and water to work..."``` .center[ <img src="data:image/png;base64,#images/flower-gec6773603_1280.jpg" width="50%" style="display: block; margin: auto;" /> ] .footnote[From https://www.theguardian.com/technology/2023/aug/01/techscape-environment-cost-ai-artificial-intelligence ] --- ## Supervised Machine Learning .center[ ![Supervised Machine Learning](data:image/png;base64,#images/Machine.drawio.svg)] --- ## Training a machine M .pull-left[ Example of training a Convolutional neural network for classification (MNIST convnet) <iframe width="560" height="315" src="https://www.youtube.com/embed/inN8seMm7UI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ] .pull-right[ ![](data:image/png;base64,#images/mnist.png) ] .footnote[ https://keras.io/examples/vision/mnist_convnet/ ] --- ## Using the machine M .pull-left[ Example of application: Depth from a single image MiDaS models for computing relative depth from a single image. https://pytorch.org/hub/intelisl_midas_v2/ ] .pull-right[![Midas](data:image/png;base64,#https://pytorch.org/assets/images/midas_samples.png) ] --- ## Using the machine M .pull-left[ <img src="data:image/png;base64,#https://github.com/vt-vl-lab/3d-photo-inpainting/blob/de0446740a3726f3de76c32e78b43bd985d987f9/image/moon.jpg?raw=true" width="280" style="display: block; margin: auto;" /> Example: 3D photography ] .pull-right[ <video width="320" height="240" controls> <source src="data:image/png;base64,#https://github.com/vt-vl-lab/3d-photo-inpainting/blob/de0446740a3726f3de76c32e78b43bd985d987f9/video/moon_zoom-in.mp4?raw=true" type="video/mp4"> </video> Output [video](https://github.com/vt-vl-lab/3d-photo-inpainting/blob/de0446740a3726f3de76c32e78b43bd985d987f9/video/moon_zoom-in.mp4?raw=true) from input image (left) ] .footnote[ [*3D Photography using Context-aware Layered Depth Inpainting*](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shih_3D_Photography_Using_Context-Aware_Layered_Depth_Inpainting_CVPR_2020_paper.pdf) M.-L. Shih et al, IEEE Computer Vision and Pattern Recognition (CVPR) 2020. https://github.com/vt-vl-lab/3d-photo-inpainting ]