The course will take place as live web course.


This two-day short course focuses on drone vision, imaging, surveillance and cinematography, while providing an overview and in-depth presentation of the various computer vision and deep learning problems encountered in autonomous systems perception. It consists of two parts (A, B) each of them including 8 one-hour lectures and related material (slide pdfs, lecture videos, understanding questionnaires).

Part A lectures (8 hours) provide an in-depth presentation to drone systems, mission planning/control and imaging. First, an introduction to multiple drone systems is presented. Then, drone mission planning and control is overviewed, to be complemented by a lecture on drone mission simulations. After reviewing image acquisition, camera geometry (mapping the 3D world on a 2D image plane) and camera calibration, stereo and multi-view imaging systems are presented for recovering 3D world geometry from 2D images. This is complemented by Structure from Motion (SfM) towards Simultaneous Localization and Mapping (SLAM) for vehicle and/or target localization and visual object tracking and 3D localization. Finally, drone communications are overviewed, focusing on drone2ground multiple drone LTE communications, notably on multiple source video compression and streaming.

Part B lectures (8 hours) provide first an in-depth presentation of drone computational cinematography that are useful in many applications, besides media production. Then, an introduction to neural networks, provides rigorous formulation of the optimization problems for their training, starting with Perceptron. It continues with Multilayer perceptron training through Backpropagation, presenting many related problems, such as over-/under-fitting and generalization. Deep neural networks, notably Convolutional NNs are the core of this domain nowadays and they are overviewed in great detail. Their application on deep learning for object detection is well presented, as it is a very important issue as well, complemented with a presentation of deep semantic image segmentation. As embedded computing is such an important issue, CVML software development tools and their use in drone imaging is overviewed. This part is concluded with an extremely important drone imaging application, notably, UAV infrastructure inspection.  

You can also self-assess your CVML knowledge by filling appropriate questionnaires (one per lecture) and you will be provided programming exercises to improve your CV programming skills.


You may want to self-assess your knowledge on Computer Vision, Machine Learning and Autonomous System topics, by trying the assessment exercises in before and after studying this course.

It takes you only 15 min per questionnaire. You can do this double self-assessment (before/after study) for free, using the sample lecture study material provided below.


The course will take place as a live web course on 18-19 November 2020.


You can click on the lecture title to view its description.

Time*/date 18/11/2020 19/11/2020
08:00 – 09:00 Registration Registration
09:00 – 10:00 Introduction to multiple drone systems Drone cinematography
10:00 – 11:00 Drone mission planning and control Introduction to neural networks, Perceptron
11:00 – 11:30 Coffee break Coffee break
11:30 – 12:30 Image acquisition, camera geometry Multilayer perceptron. Backpropagation
12:30 – 13:30 Stereo and Multiview imaging Deep neural networks. Convolutional NNs 
13:30 – 14:30 Lunch break Lunch break
14:30 – 15:30 Localization and mapping Deep learning for object/target detection
15:30 – 16:30 Object tracking and 3D localization Deep Semantic Image Segmentation
16:30 – 17:00
Coffee break
Coffee break
17:00 – 18:00
Drone communications CVML software development tools
18:00 – 19:00
Drone mission simulations UAV infrastructure inspection

*Eastern European Time (EET)

**This programme is indicative and may be modified without prior notice by announcing (hopefully small) changes in lectures/lecturers.

***Each topic will include a 45-minute lecture and a 15-minute break.


Υou can improve your programming knowledge on Computer Vision, Machine Learning and Image/Video Processing topics through programming exercises using OpenCV, PyTorch and CUDA on the following topics:

  1. Introduction to OpenCV Programming
  2. CNN image classification
  3. PyTorch for deep object detection
  4. OpenCV programming for object tracking
  5. CUDA programming of 2D convolution algorithms

You will be provided the programming exercise solutions to check your progress.

More information can be found in:


Prof. Ioannis Pitas (IEEE Fellow, IEEE Distinguished Lecturer, EURASIP fellow) received the Diploma and Ph.D. degree in Electrical Engineering, both from the Aristotle University of Thessaloniki, Greece. Since 1994, he has been a Professor at the Department of Informatics of the same University. His current interests are in the areas of image/video processing, machine learning, computer vision, intelligent digital media, human-centered interfaces, affective computing, 3D imaging, and biomedical imaging. He has published over 860 papers, contributed in 44 books in his areas of interest and edited or (co-)authored another 11 books. He has also been member of the program committee of many scientific conferences and workshops. In the past he served as Associate Editor or co-Editor of 9 international journals and General or Technical Chair of 4 international conferences. He participated in 69 R&D projects, primarily funded by the European Union and is/was principal investigator/researcher in 41 such projects.

He has 31600+ citations to his work and h-index 85+ (Google Scholar).

Prof. Pitas lead the big European H2020 R&D project MULTIDRONE and is principal investigator (AUTH)  in H2020 projects Aerial Core and AI4Media. He is chair of the Autonomous Systems initiative

Professor Pitas will deliver 16 lectures on deep learning and computer vision.

Educational record of Prof. I. Pitas: He was Visiting/Adjunct/Honorary Professor/Researcher and lectured at several Universities: University of Toronto (Canada), University of British Columbia (Canada), EPFL (Switzerland), Chinese Academy of Sciences (China),  University of Bristol (UK), Tampere University of Technology (Finland), Yonsei University (Korea), Erlangen-Nurnberg University (Germany), National University of Malaysia, Henan University (China). He delivered 90 invited/keynote lectures in prestigious international Conferences and top Universities worldwide. He run 17 short courses and tutorials on Autonomous Systems, Computer Vision and Machine Learning, most of them in the past 3 years in many countries, e.g., USA, UK, Italy, Finland, Greece, Australia, N. Zealand, Korea, Taiwan, Sri Lanka, Bhutan.



Early registration (till 13/11/2020):

• Standard: 300 Euros

• Undergraduate/MSc/PhD student*: 200 Euros

Later or on-site registration (after 13/11/2020):

• Standard: 350 Euros

• Undergraduate/MSc/PhD student*: 250 Euros

*Proof of student status should be provided upon registration.


After the completion of your payment, please fill in the form below:

Complete your registration



Lectures will be in English. PDF slides will be available to course attendees.

A certificate of attendance will be provided.

***The short course on «Deep Learning and Computer Vision for Autonomous Systems» will take place as live web course on 18-19 November 2020. Lectures will be prerecorded to facilitate attendees in case they experience problems due to time difference. Remote participation will be available via teleconferencing.***

Cancelation policy:

  • 70% refund for cancelation up to 15/10/2020
  • 50% refund for cancelation up to 6/11/2020
  • 0% refund afterwards


18/11/2020 – Part A (first day, 8 lectures):

1. Introduction to multiple drone systems

Abstract: This lecture will provide the general context for this new and emerging topic, presenting the aims of multiple drone systems, focusing on their sensing and perception. Drone mission formalization, planning and control will be overviewed. Then multiple drone communication issues will be presented, notably drone2ground communication and multisource video streaming. In drone vision, the challenges (especially from an image/video analysis and computer vision point of view), the important issues to be tackled, the limitations imposed by drone hardware, regulations and safety considerations will be presented. A multiple drone platform will be the detailed during the second part of the lecture, beginning with platform hardware overview, issues and requirements and proceeding by discussing safety and privacy protection issues. Finally, platform integration will be the closing topic of the lecture, elaborating on drone mission planning, object detection and tracking, UAV-based cinematography, target pose estimation, privacy protection, ethical and regulatory issues, potential landing site detection, crowd detection, semantic map annotation and simulations. Two drone use cases will be overviewed: a) multiple drones in media production and b) drone-based infrastructure surveillance, notably of electrical installations.

2. Drone mission planning and control

Abstract: In this lecture, first the audiovisual shooting mission is formally defined. The introduced audiovisual shooting definitions are encoded in mission planning comannds, i.e., navigation and shooting action vocabulary, and their corresponding parameters. The drone mission commands, as well as the hardware/software architecture required for manual/autonomous mission execution are described. The software infrastructure includes the planning modules, that assign, monitor and schedule different behaviours/tasks to the drone swarm team according to director and enviromental requirements, and the control modules, which execute the planning mission by translating high-level commands to intro desired drone+camera configurations, producing commands for autopilot, camera and gimbal of the drone swarm.

3. Image acquisition, camera geometry

Abstract: After a brief introduction to image acquisition and light reflection, the building blocks of modern cameras will be surveyed, along with geometric camera modeling. Several camera models, like pinhole and weak-perspective camera model, will subsequently be presented, with the most commonly used camera calibration techniques closing the lecture.

Pinhole camera model
Calibration pattern

Sample Lecture material: Download

4. Stereo and Multiview imaging

Abstract: The workings of stereoscopic and multiview imaging will be explored in depth, focusing mainly on stereoscopic vision, geometry and camera technologies. Subsequently, the main methods of 3D scene reconstruction from stereoscopic video will be described, along with the basics of multiview imaging.


Stereo vision
Multiview vision

Sample Lecture material: Download

5. Localization and Mapping

Abstract:The lecture includes the essential knowledge about how we obtain/get 2D and/or 3D maps that robots/drones need, taking measurements that allow them to perceive their environment with appropriate sensors. Semantic mapping includes how to add semantic annotations to the maps such as POIs, roads and landing sites. Section Localization is exploited to find the 3D drone or target location based on sensors using specifically Simultaneous Localization And Mapping (SLAM). Finally, drone localization fusion describes improves accuracy on localization and mapping by exploiting the synergies between different sensors.


3D Thessaloniki map.

Sample Lecture material: Download

6. Object tracking and 3D localization

Abstract: Target tracking is a crucial component of many vision systems. Many approaches regarding person/object detection and tracking in videos have been proposed. In this lecture, video tracking methods using correlation filters or convolutional neural networks are presented, focusing on video trackers that are capable of achieving real-time performance for long-term tracking on embedded computing platforms.

Sample Lecture material: Download

7. Drone communications

Abstract: Drone swarms should communicate with a ground station and with each other. As distances are long, a combination of LTE and WiFi technologies is important. Communication issues will be reviewed, primarily ones arising from LTE/5G/IoT systems, notably latency and synchronization, Quality of Service (QoS) and security. As video is one of heaviest information types to be communicated to the ground station, video compression and streaming issues will be presented. As multiple video sources are employed, multisource video synchronization will be overviewed as well.

8. Drone mission simulations

Abstract: Drone mission simulations are mainly needed for three reasons. The first use is very important, in order to test the control and behavior of the overall drone system (drones, ground/supervision station). In this case, Gazebo was chosen as the most appropriate environment.

The second reason is the use of simulations to characterize the optimal drone parameters for specific scenarios and shot types in term of viewing experience. A third reason for carrying out drone simulations is to generate UAV large-scale training and test data. Machine learning algorithms need large amounts of quality data to be trained efficiently. Gathering and annotating that sheer amount of data is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has become increasingly popular due to fast generation and automatic annotation. For the last two cases Unreal Engine 4 and AirSim are the tools of choice, because of their high-level graphics capabilities. Examples of using Gazebo, Unreal Engine 4 and AirSim software tools and their capabilities will be presented in this lecture.

19/11/2020 – Part B (second day, 8 lectures):

1. Drone cinematography

Abstract: The main building blocks of drone cinematography will be surveyed, especially focusing on UAV shot types (framing and camera motion types). Additionally, the state-of-the-art on autonomous capture of cinematic UAV footage will be described, with an emphasis on relevant algorithms, commercial products and tools-of-the-trade. Drones have already made their way into media production, be it cinema movies (Spectre, Captain America: Civil War etc) or  TV content (e.g. documentaries and news coverage), and they have done so for a good reason. Indeed, the versatility provided by camera-carrying drones is expected to revolutionize aerial shooting,  allowing faster and more flexible camera positioning  and movements (including low-altitude ones or shots close to the subject) than those provided by helicopters,  while at the same reducing cost and increasing safety and ease of operation. Drones are expected to enable film-makers and TV crews to develop a new cinematographic language, especially in combination with techniques that enable automated and intelligent shooting (a topic that has just started to emerge). This part of the tutorial will review characteristic cases of drone usage in cinematography, provide a taxonomy of existing drone cinematography static & dynamic shot types and shot sequencing, delve into the new horizons that open for the creation of new visual effects and shot types and discuss technical/research  issues and challenges as well as issues related to viewer’s experience and perceived quality. It will also review recent approaches for automatic drone cinematography or approaches for the “virtual” planning of drone shots. The opportunities and challenges stemming from the use of multiple drones will also be discussed

Lecture material: Download

2. Introduction to neural networks, Perceptron

Abstract: This lecture will cover the basic concepts of Artificial Neural Networks (ANNs): Biological neural models, Perceptron, Activation functions, Loss types, Steepest Gradient Descent, On-line Perceptron training, Batch Perceptron training.

Lecture material: Download

3. Multilayer perceptron. Backpropagation

Abstract:  This lecture will cover the basic concepts of Multi-Layer Perceptron (MLP), Training MLP neural networks, Activation functions, Loss types, Gradient descent, Error Backpropagation, Stochastic Gradient Descent, Adaptive Learning Rate Algorithms, Regularization, Evaluation, Generalization.

Sample Lecture material: Download

4. Deep neural networks. Convolutional NNs

Abstract: From multilayer perceptrons to deep architectures. Fully connected layers. Convolutional layers. Tensors and mathematical formulations. Pooling. Training convolutional NNs. Initialization. Data augmentation. Batch Normalization. Dropout. Deployment on embedded systems. Lightweight deep learning.

Sample Lecture material: Download

5. Deep learning for object/target detection

Abstract: Recently, Convolutional Neural Networks (CNNs) have been used for object/target (e.g., car, pedestrian, road sign) detection with great results. However, using such CNN models on embedded processors for real-time processing is prohibited by HW constraints. In that sense, various architectures and settings will be examined in order to facilitate and accelerate the use of embedded CNN-based object detectors with limited computational capabilities. The following target detection topics will be presented: Object detection as search and classification task. Detection as classification and regression task. Modern architectures for target detection (e.g., RCNN, Faster-RCNN, YOLO, SSD). Lightweight architectures. Data augmentation. Deployment. Evaluation and benchmarking.

Sample Lecture material: Download

6. Deep Semantic Image Segmentation

Abstract: Semantic image segmentation is a very important computer vision task with several applications in autonomous systems perception, robotic vision and medical imaging. Recent semantic image segmentation methods rely on deep neural networks and aim to assign a specific class label to each pixel of the input image. This lecture overviews the topic and addresses some of the semantic image segmentation challenges, notably: Deep semantic Image Segmentation architectures. Skip connections. U-nets. BiSeNet. Semantic image segmentation performance, computational complexity and generalization. 

Sample Lecture material: Download

7. CVML software development tools

Abstract: In this lecture, various GPU and multicore CPU architectures will be reviewed, used notably in GPU cards and in embedded boards, like NVIDIA TX1, TX2, Xavier. The principles of the parallelization of various algorithms on GPU and multicore CPU architectures are reviewed. Sequentially the essentials of GPU programming are presented. Finally, special attention is paid on: a.) fast and parallel linear algebra operations (e.g., using cuBLAS) and, b.) convolution FFS algorithms, as all of them have particular importance in deep machine learning (CNNs) and in real-time computer vision.

Sample Lecture material: Download

8. UAV infrastructure inspection

Abstract: Infrastructure inspection is one of the most important drone mission tasks. In most cases, long-range and/or local very accurate inspection of the infrastructure is needed, e.g., for bridges, road infrastructure or electrical installations. Various inspection modes are overviewed, e.g., visual inspection, thermography and 3D imaging (LIDAR). Furthermore,   infrastructure maintenance activities based on aerial manipulation involving force interactions are presented. Finally, aerial co-working safely and efficiently helping human workers in inspection and maintenance is overviewed. To this end, human-centered computing, e.g., human pose estimation, action recognition are needed.




Any engineer or scientist practicing or student having some knowledge of computer vision and/or machine learning, notable CS, CSE, ECE, EE students, graduates or industry professionals with relevant background.





2020 (held as a web course due to Covid-19 circumstances)

Countries: Belgium, Ireland, Greece, Finland, China, Italy, France, Croatia, Spain, United Kingdom

Registrants comments:

  • «Very interesting lecture topics regarding the autonomous systems perception.»
  • «The overall course was quite interesting and fulfilling in terms of the context promised.»
  • «The lectures were very appealing and satisfactorily delivered.»



Participants: 39

Countries: United Kingdom, Scotland, Germany, Italy, Norway, Slovakia, Spain, Croatia, Czech Republic, Greece

Registrants comments: 

  • «Very adequate information about the topics of DL, CV and autonomous systems.»
  • «Very good coverage of autonomous systems vision perception.»
  • «Course’s content was greatly explanatory with many application examples.»
  • «Very well structured course, knowledgeable lecturers.»



1.) C. Regazzoni, I. Pitas, Perspectives in Autonomous Systems research’, Signal Processing Magazine, September 2019

2.) 3D Shape Reconstruction from 2D Images

3.) I. Pitas, ‘3D imaging science and technologies’, Amazon CreateSpace preprint, 2019

4.)  R. Fan, U. Ozgunalp, B. Hosking, M. Liu, I. Pitas, «Pothole Detection Based on Disparity Transformation and Road Surface Modeling«, IEEE Transactions on Image Processing (accepted for publication 2019).

5.) Rui Fan, Xiao Ai, Naim Dahnoun, «Road Surface 3D Reconstruction Based on Dense Subpixel Disparity Map Estimation«, IEEE Transactions on Image Processing, vol 27, no. 6, June 2018

6.) Umar Ozganalp, Rui Fan, Xiao Ai, Naim Dahnoun, «Multiple Lane Detection Algorithm Based on Novel Dense Vanishing Point Estimation«, IEEE Transactions on Intelligent Transportation Systems, vol. 18, no.3, March 2017

7.) Rui Fan, Jianhao Jiao, Jie Pan, Huaiyang Huang, Shaojie Shen, Ming Liu, «Real-Time Dense Stereo Embedded in A UAV for Road Inspection«, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

8.) Semi-Supervised Subclass Support Vector Data Description for image and video classification, V. Mygdalis, A. Iosifidis, A. Tefas, I. Pitas, Neurocomputing, 2017

9.) Neurons With Paraboloid Decision Boundaries for Improved Neural Network Classification Performance, N. Tsapanos, A. Tefas, N. Nikolaidis and I. Pitas IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 14 June 2018, pp 1-11

10.) Convolutional Neural Networks for Visual Information Analysis with Limited Computing Resources, P. Nousi, E. Patsiouras, A. Tefas, I. Pitas, 2018 IEEE International Conference on Image Processing (ICIP), Athens, Greece, October 7-10, 2018

11.) Learning Multi-graph regularization for SVM classification, V. Mygdalis, A. Tefas, I. Pitas, 2018 IEEE International Conference on Image Processing (ICIP), Athens, Greece, October 7-10, 2018

12.) Quality Preserving Face De-Identification Against Deep CNNs, P. Chriskos, R. Zhelev, V. Mygdalis, I. Pitas 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Aalborg,
Denmark, September 2018

13.) P. Chriskos, O. Zoidi, A. Tefas, and I. Pitas, «De-identifying facial images using singular value decomposition and projections», Multimedia Tools and Applications, 2016

14.) Deep Convolutional Feature Histograms for Visual Object Tracking, P. Nousi, A. Tefas, I. Pitas, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

15.) Exploiting multiplex data relationships in Support Vector Machines, V. Mygdalis, A. Tefas, I. Pitas, Pattern Recognition 85, pp. 70-77, 2019



•Prof. Ioannis Pitas:

•Department of Computer Science, Aristotle University of Thessaloniki (AUTH):

Laboratory of Artificial Intelligence and  Information Analysis: