Evolution of YOLO: A Timeline of Versions and Advancements in Object Detection

We will delve deeper into the technical details of each YOLO algorithm in the following sections. So, let’s embark on a journey to explore the world of YOLO and find the algorithm that best suits your needs!

May 2023

YOLO-NAS

YOLO-NAS is a new object detection model developed by the researchers at Deci, with a little bit of help from Automated AI. YOLO-NAS’s architecture employs quantization-aware blocks and selective quantization for optimized performance. The model’s design features adaptive quantization, skipping quantization in specific layers based on the balance between latency/throughput improvement and accuracy loss. The YOLO-NAS architecture and pre-trained weights define a new frontier in low-latency inference and an excellent starting point for fine-tuning downstream tasks. GitHub:- https://github.com/Deci-AI/super-gradients

January 2023

YOLOv8

YOLOv8 is the latest iteration of the YOLO family of models. YOLO stands for You Only Look Once and these series of models are thus named because of their ability to predict every object present in an image with one forward pass. The main distinction introduced by the YOLO models was the framing of the task at hand. The authors of the paper reframed the object detection task as a regression problem (predict the bounding box coordinates) instead of classification. GitHub:- https://github.com/ultralytics/ultralytics

November 2022

DAMO-YOLO

DAMO-YOLO was published by Alibaba Group. Inspired by the current technologies, DAMO-YOLO included a Neural architecture search (NAS), large neck, small head, AlignedOTA label assignment and Knowledge distillation. GitHub:- https://github.com/tinyvision/DAMO-YOLO

July 2022

YOLOv7

YOLOv7 is the fastest and most accurate real-time object detection model for computer vision tasks. The official YOLOv7 paper named “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors” was released in July 2022 by Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. The YOLOv7 research paper has become immensely popular in a matter of days. The source code was released as open source under the GPL-3.0 license, a free copyleft license, and can be found in the official YOLOv7 GitHub repository that was awarded over 4.3k stars in the first month after release. There is also a complete appendix of the YOLOv7 paper. GitHub:- https://github.com/WongKinYiu/yolov7

June 2022

YOLOv6

MT-YOLOv6 was inspired by the original one-stage YOLO architecture and thus was (bravely) named YOLOv6 by its authors. Though it provides outstanding results, it’s important to note that MT-YOLOv6 is not part of the official YOLO series. YOLOv6 is a single-stage object detection framework dedicated to industrial applications, with hardware-friendly efficient design and high performance. It outperforms YOLOv5 in detection accuracy and inference speed, making it the best OS version of YOLO architecture for production applications. GitHub:- https://github.com/meituan/YOLOv6

March 2022

PP-YOLOE

PP-YOLOE added improvements upon PP-YOLOv2 achieving a performance of 51.4% AP at 78.1 FPS on NVIDIA V100. The main changes of PP-YOLOE concerning PP-YOLOv2 are its anchor-free, new backbone and bench, Task Alignment Learning (TAL), Efficient Task-aligned Head (ET-head), Varifocal (VFL) and Distribution focal loss (DFL). GitHub:- https://github.com/PaddlePaddle/PaddleDetection

August 2021

YOLOX

YOLOX was published in ArXiv in July 2021 by Megvii Technology. Developed in Pytorch and using YOLOV3 from Ultralytics as starting point, it has five principal changes: an anchor-free architecture, multiple positives, a decoupled head, advanced label assignment, and strong augmentations. It achieved state-of-the-art results in 2021 with an optimal balance between speed and accuracy with 50.1% AP at 68.9% FPS on Tesla V100. GitHub:- https://github.com/Megvii-BaseDetection/YOLOX

October 2021

YOLOR

YOLOR stands for You Only Learn One Representation. In this paper, the authors followed a different approach; they developed a multi-task learning approach that aims to create a single model for various tasks (e.g., classification, detection, pose estimation) by learning a general representation and using sub-networks to create task-specific representations. With the insight that the traditional joint learning method often leads to suboptimal feature generation, YOLOR aims to overcome this by encoding the implicit knowledge of neural networks to be applied to multiple tasks, similar to how humans use past experiences to approach new problems. GitHub:- https://github.com/WongKinYiu/yolor

April 2021

PP-YOLOv2

PP-YOLOv2 added four refinements to PP-YOLO that increased performance from 45.9% AP to 49.5% AP at 69 FPS on NVIDIA V100. The changes of PP-YOLOv2 concerning PP-YOLO are Backbone changed from ResNet50 to ResNet101, Path aggregation network (PAN) instead of FPN similar to YOLOv4, Mish Activation Function, Unlike YOLOv4 and YOLOv5, they only applied the mish activation function in the detection neck to keep the backbone unchanged with ReLU. Larger input sizes help to increase performance on small objects and modified IoU aware branch. GitHub:- https://github.com/PaddlePaddle/PaddleDetection

June 2021

Scaled-YOLOv4

One year after YOLOv4, the same authors presented Scaled-YOLOv4. Differently from YOLOv4, Scaled YOLOv4 was developed in Pytorch instead of Darknet. The main novelty was the introduction of scaling-up and scaling-down techniques. Scaling up means producing a model that increases accuracy at the expense of a lower speed; on the other hand, scaling down entails producing a model that increases speed sacrificing accuracy. In addition, scaled-down models need less computing power and can run on embedded systems. GitHub:- https://github.com/WongKinYiu/ScaledYOLOv4

July 2020

PP-YOLO

PP-YOLO models have been growing parallel to the YOLO models we described. However, we decided to group them in a single section because they began with YOLOv3 and had been gradually improving upon the previous PP-YOLO version. Nevertheless, these models have been influential in the evolution of YOLO. PP-YOLO similar to YOLOv4 and YOLOv5 was based on YOLOv3. It was published in ArXiv by researchers from Baidu Inc. The authors used the PaddlePaddle deep learning platform, hence its PP name. Following the trend we have seen starting with YOLOv4, PP-YOLO added ten existing tricks to improve the detector’s accuracy, keeping the speed unchanged. GitHub:- https://github.com/PaddlePaddle/PaddleDetection

June 2020

YOLOv5

YOLO v5 is different from all other prior releases, as this is a PyTorch implementation rather than a fork from original Darknet. Same as YOLO v4, the YOLO v5 has a CSP backbone and PA-NET neck. The major improvements includes mosaic data augmentation and auto learning bounding box anchors.GitHub:- https://github.com/ultralytics/yolov5

April 2020

YOLOv4

YOLO v4 takes the influence of state of art BoF (bag of freebies) and several BoS (bag of specials). The BoF improve the accuracy of the detector, without increasing the inference time. They only increase the training cost. On the other hand, the BoS increase the inference cost by a small amount however they significantly improve the accuracy of object detection. GitHub:- https://github.com/AlexeyAB/darknet

March 2018

YOLOv3

The previous version has been improved for an incremental improvement which is now called YOLO v3. As many object detection algorithms are been there for a while now the competition is all about how accurate and quickly objects are detected. YOLO v3 has all we need for object detection in real-time with accurately and classifying the objects. The authors named this as an incremental improvement. GitHub:- https://github.com/pjreddie/darknet

2017

YOLO9000v2

The second version of the YOLO is named as YOLO9000 which has been published by Joseph Redmon and Ali Farhadi at the end of 2016. The major improvements of this version are better , faster and more advanced to meet the Faster R-CNN which also an object detection algorithm which uses a Region Proposal Network to identify the objects from the image input and SSD(Single Shot Multibox Detector). GitHub:- https://github.com/pjreddie/darknet

2016

YOLOv1

It uses Darknet framework which is trained on ImageNet-1000 dataset. This works as mentioned above but has many limitations because of it the use of the YOL v1 is restricted. It could not find small objects if they are appeared as a cluster. This architecture found difficulty in generalisation of objects if the image is of other dimensions different from the trained image. The major issue is localization of objects in the input image. GitHub:- https://github.com/pjreddie/darknet

Conclusion

It’s worth noting that beyond new YOLO versions, the timeline may have continued with even more exciting developments in the YOLO series or related object detection models. The field of computer vision and deep learning is constantly evolving, and researchers and developers continuously strive to improve the performance, speed, and versatility of object detection algorithms.

In conclusion, YOLO’s journey has exemplified the potential of deep learning in the realm of object detection, inspiring advancements that have impacted a wide range of applications, from autonomous vehicles to surveillance systems and beyond. As we move forward, we can expect further progress and innovation, ultimately driving us closer to more comprehensive and efficient solutions for real-world challenges