Transformers In Computer Vision – English Version-0daytown

Published 1/2023
MP4 | Video: h264, 1280×720 | Audio: AAC, 44.1 KHz
Language: English | Size: 3.21 GB | Duration: 5h 31m

Transformers in Computer Vision – English version

What you’ll learn
What are transformer networks?
State of the Art architectures for CV Apps like Image Classification, Semantic Segmentation, Object Detection and Video Processing
Practical application of SoTA architectures like ViT, DETR, SWIN in Huggingface vision transformers
Attention mechanisms as a general Deep Learning idea
Inductive Bias and the landscape of DL models in terms of modeling assumptions
Transformers application in NLP and Machine Translation
Transformers in Computer Vision
Different types of attention in Computer Vision

Requirements
Practical Machine Learning course
Practical Computer Vision course (ConvNets)
Introduction to NLP course

Description
Transformer Networks are the new trend in Deep Learning nowadays. Transformer models have taken the world of NLP by storm since 2017. Since then, they become the mainstream model in almost ALL NLP tasks. Transformers in CV are still lagging, however they started to take over since 2020. We will start by introducing attention and the transformer networks. Since transformers were first introduced in NLP, they are easier to be described with some NLP example first. From there, we will understand the pros and cons of this architecture. Also, we will discuss the importance of unsupervised or semi supervised pre-training for the transformer architectures, discussing Large Scale Language Models (LLM) in brief, like BERT and GPT.This will pave the way to introduce transformers in CV. Here we will try to extend the attention idea into the 2D spatial domain of the image. We will discuss how convolution can be generalized using self attention, within the encoder-decoder meta architecture. We will see how this generic architecture is almost the same in image as in text and NLP, which makes transformers a generic function approximator. We will discuss the channel and spatial attention, local vs. global attention among other topics.In the next three modules, we will discuss the specific networks that solve the big problems in CV: classification, object detection and segmentation. We will discuss Vision Transformer (ViT) from Google, Shifter Window Transformer (SWIN) from Microsoft, Detection Transformer (DETR) from Facebook research, Segmentation Transformer (SETR) and many others. Then we will discuss the application of Transformers in video processing, through Spatio-Temporal Transformers with application to Moving Object Detection, along with Multi-Task Learning setup.Finally, we will show how those pre-trained arcthiectures can be easily applied in practice using the famous Huggingface library using the Pipeline interface.

Overview
Section 1: Introduction

Lecture 1 Introduction

Section 2: Overview of Transformer Networks

Lecture 2 The Rise of Transformers

Lecture 3 Inductive Bias in Deep Neural Network Models

Lecture 4 Attention is a General DL idea

Lecture 5 Attention in NLP

Lecture 6 Attention is ALL you need

Lecture 7 Self Attention Mechanisms

Lecture 8 Self Attention Matrix Equations

Lecture 9 Multihead Attention

Lecture 10 Encoder-Decoder Attention

Lecture 11 Transformers Pros and Cons

Lecture 12 Unsupervised Pre-training

Section 3: Transformers in Computer Vision

Lecture 13 Module roadmap

Lecture 14 Encoder-Decoder Design Pattern

Lecture 15 Convolutional Encoders

Lecture 16 Self Attention vs. Convolution

Lecture 17 Spatial vs. Channel vs. Temporal Attention

Lecture 18 Generalization of self attention equations

Lecture 19 Local vs. Global Attention

Lecture 20 Pros and Cons of Attention in CV

Section 4: Transformers in Image Classification

Lecture 21 Transformers in image classification

Lecture 22 Vistion Transformers (ViT and DeiT)

Lecture 23 Shifted Window Transformers (SWIN)

Section 5: Transformers in Object Detection

Lecture 24 Transformers in Object detection

Lecture 25 Obejct Detection methods review

Lecture 26 Object Detection with ConvNet – YOLO

Lecture 27 DEtection TRansformers (DETR)

Lecture 28 DETR vs. YOLOv5 use case

Section 6: Transformers in Semantic Segmentation

Lecture 29 Module roadmap

Lecture 30 Image Segmentation using ConvNets

Lecture 31 Image Segmentation using Transformers

Section 7: Spatio-Temporal Transformers

Lecture 32 Spatio-Temporal Transformers – Moving Object Detection and Multi-trask Learning

Section 8: Huggingface Vision Transformers

Lecture 33 Module roadmap

Lecture 34 Huggingface Pipeline overview

Lecture 35 Huggingface vision transformers

Lecture 36 Huggingface Demo using Gradio

Section 9: Conclusion

Lecture 37 Course conclusion

Section 10: Material

Lecture 38 Slides

Intermediate to Advanced CV Engineers,Intermediate to Advanced CV Researchers

Password/解压密码www.tbtos.com

资源下载此资源仅限VIP下载，请先

转载请注明：0daytown » Transformers In Computer Vision – English Version

Transformers In Computer Vision – English Version

与本文相关的文章

您必须登录才能发表评论！

与本文相关的文章

您必须 登录 才能发表评论！

您必须登录才能发表评论！