AI & ML

for statisticians in Academia

Rozenn Dahyot

Young-ISA Meeting Irish Statistical Association

2025-12-01

Context

timeline
    1960-1990: Intel (1968)
        : Microsoft (1975)
        : Apple (1976)
    1991-2005: Nvidia (1993)
        : Amazon (1994)
        : Google (1998)
        : Meta (2004)
    Languages : C (1972-)
             : Matlab (1984-) 
             : Python (1991-)
             : R (1993-)
    Library : OpenCV (2000-) 
             : Scikitlearn (2007-)
             : Tensorflow (2015-)
             : Pytorch (2016-)

Academic life

mindmap
    (Academic Tasks)
        Research
            ((Writing))
                Publications
                Proposals
                [Evaluation: e.g.reviews]
            ))Coding((
            )Reading & Thinking(
            (Open Science IT)
        Teaching
            ((Writing))
                Lecturenotes
                Exams
            )Reading  & Thinking(
            ))Coding((
            [Evaluation e.g. corrections]   

        Admin
            Recruitment 
                [Evaluation CVs] 
            Event organisation
            ))Coding((
            )Reading & Thinking(
        

Teaching

timeline 
    2005-2007 (TCD): Maths (CS-Y1) 
        : Numerical methods (CS-Y3)
        : Advanced Mathematical (CS-MSc-Y5)
        : Quantitative Analysis (ST-Y2)
    2008-2017 (TCD): Forecasting (ST-Y3+4)
        : Linear Statistical Methods (ST-Y3+4)
        : Stochastic Processes in Space and Time (ST-Y3+4)

    2018-2020 (TCD): Computer Vision (CS-MSc-Y5)
                   : Forecasting (ST-Y3+4)
    2021-Now (MU): Computer Vision (CS-Y4)
                 : Deep Learning  (CS+EE-MSc-Y5) 
                 

Example of automation

Admin task. Writing the annual research report for CS Department in MU using openalex.org.

timeline 
    title Data Source 
    Not Open Access: Scopus Elsevier (2004)
    Open Access: OpenAlex (2022)
        : ORCiD (2012)

library(openalexR)
# Harvesting DOIs published by MU CS staff
# in a given period:
startdate<-"2024-10-01"
enddate<- "2025-09-30"
# (...)
for(i in  1:NbCSorcid){ 
  MUCS<-oa_fetch(
    entity = "works",
    author.orcid = canonical_orcids[i], # orcid id
    from_publication_date = startdate,  
    to_publication_date = enddate,
    options = list(sort = "cited_by_count:desc")
    ) 
    # (...)
}
for(i in  1:NbCSOA){ 
 MUCS<-oa_fetch(
    entity = "works",
    author.id = OA[i], # openalex id
    from_publication_date = startdate,  
    to_publication_date = enddate,
    options = list(sort = "cited_by_count:desc")
    ) 
 # (...)
}  

Automation of Research

timeline 
    title AI tools
    2009-Now: Grammarly (2009-)
            : ChatGPT (OpenAI, 2022-)    
            : CoPilot (Microsoft, 2023-)
 

  • 📰 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, (2024)

  • 📰 Towards an AI co-scientist, (2025)

  • 🤖 Agents4Science 2025 is “the 1st open conference where AI serves as both primary authors and reviewers of research papers”

Automation of Research

🤮

  • 📰 More than 10,000 research papers were retracted in 2023 — a new record, Nature 2023
  • 📰 Major AI conference flooded with peer reviews written fully by AI, Nature 2025

“around 21% of the ICLR peer reviews were fully AI-generated, and more than half contained signs of AI use”

“Researchers have been sneaking secret messages into their papers in an effort to trick artificial intelligence (AI) tools into giving them a positive peer-review report.”

Automation of Research

👮‍♀️ 👮 Research Integrity Risk Index is a bibliometric-based risk indicator to assess research integrity vulnerabilities in global academic institutions.

Current research

… with humans 😉

  • Fast Moving Object Tracking in videos, led by Senem Aktas (MU PhD candidate)

Upcoming collaborations:

  • RI FPP on the development of machine learning tools to analyse large astronomical datasets from the Mauve satellite and MUSE instrument, led by Emma Whelan

  • Deep Learning Spectroscopy led by Tim McNamara and Bryan Hennelly

Object Geolocation

https://roznn.github.io/Seminars/IEEEWISP2025.html

https://roznn.github.io/Seminars/IEEEWISP2025.html

Object geolocation is performed by analysing images recorded with GPS location, and camera directional information (IMU). The aim is to provide a list of (unique) GPS coordinates for all instances of an object class of interest (e.g. telegraph poles).

Our pipeline is composed of several DNNs (for object segmentation, depth/distance inference), and a markov random field used for final decision from a set of multiple view images covering an area.