We will delve deeper into the technical details of each YOLO algorithm in the following sections. So, let’s embark on a journey to explore the world of YOLO and find the algorithm that best suits your needs!
May 2023
YOLO-NAS
YOLO-NAS is a new object detection model developed by the researchers at Deci, with a little bit of help from Automated AI. YOLO-NAS’s architecture employs quantization-aware blocks and selective quantization for optimized performance. The model’s design features adaptive quantization, skipping quantization in specific layers based on the balance between latency/throughput improvement and accuracy loss. The YOLO-NAS architecture and pre-trained weights define a new frontier in low-latency inference and an excellent starting point for fine-tuning downstream tasks. GitHub:- https://github.com/Deci-AI/super-gradients
January 2023
YOLOv8
YOLOv8 is the latest iteration of the YOLO family of models. YOLO stands for You Only Look Once and these series of models are thus named because of their ability to predict every object present in an image with one forward pass. The main distinction introduced by the YOLO models was the framing of the task at hand. The authors of the paper reframed the object detection task as a regression problem (predict the bounding box coordinates) instead of classification. GitHub:- https://github.com/ultralytics/ultralytics
November 2022
DAMO-YOLO
DAMO-YOLO was published by Alibaba Group. Inspired by the current technologies, DAMO-YOLO included a Neural architecture search (NAS), large neck, small head, AlignedOTA label assignment and Knowledge distillation. GitHub:- https://github.com/tinyvision/DAMO-YOLO
July 2022
YOLOv7
YOLOv7 is the fastest and most accurate real-time object detection model for computer vision tasks. The official YOLOv7 paper named “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors” was released in July 2022 by Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. The YOLOv7 research paper has become immensely popular in a matter of days. The source code was released as open source under the GPL-3.0 license, a free copyleft license, and can be found in the official YOLOv7 GitHub repository that was awarded over 4.3k stars in the first month after release. There is also a complete appendix of the YOLOv7 paper. GitHub:- https://github.com/WongKinYiu/yolov7
June 2022
YOLOv6
MT-YOLOv6 was inspired by the original one-stage YOLO architecture and thus was (bravely) named YOLOv6 by its authors. Though it provides outstanding results, it’s important to note that MT-YOLOv6 is not part of the official YOLO series. YOLOv6 is a single-stage object detection framework dedicated to industrial applications, with hardware-friendly efficient design and high performance. It outperforms YOLOv5 in detection accuracy and inference speed, making it the best OS version of YOLO architecture for production applications. GitHub:- https://github.com/meituan/YOLOv6
March 2022
PP-YOLOE
PP-YOLOE added improvements upon PP-YOLOv2 achieving a performance of 51.4% AP at 78.1 FPS on NVIDIA V100. The main changes of PP-YOLOE concerning PP-YOLOv2 are its anchor-free, new backbone and bench, Task Alignment Learning (TAL), Efficient Task-aligned Head (ET-head), Varifocal (VFL) and Distribution focal loss (DFL). GitHub:- https://github.com/PaddlePaddle/PaddleDetection
August 2021
YOLOX
YOLOX was published in ArXiv in July 2021 by Megvii Technology. Developed in Pytorch and using YOLOV3 from Ultralytics as starting point, it has five principal changes: an anchor-free architecture, multiple positives, a decoupled head, advanced label assignment, and strong augmentations. It achieved state-of-the-art results in 2021 with an optimal balance between speed and accuracy with 50.1% AP at 68.9% FPS on Tesla V100. GitHub:- https://github.com/Megvii-BaseDetection/YOLOX
October 2021
YOLOR
YOLOR stands for You Only Learn One Representation. In this paper, the authors followed a different approach; they developed a multi-task learning approach that aims to create a single model for various tasks (e.g., classification, detection, pose estimation) by learning a general representation and using sub-networks to create task-specific representations. With the insight that the traditional joint learning method often leads to suboptimal feature generation, YOLOR aims to overcome this by encoding the implicit knowledge of neural networks to be applied to multiple tasks, similar to how humans use past experiences to approach new problems. GitHub:- https://github.com/WongKinYiu/yolor
April 2021
PP-YOLOv2
PP-YOLOv2 added four refinements to PP-YOLO that increased performance from 45.9% AP to 49.5% AP at 69 FPS on NVIDIA V100. The changes of PP-YOLOv2 concerning PP-YOLO are Backbone changed from ResNet50 to ResNet101, Path aggregation network (PAN) instead of FPN similar to YOLOv4, Mish Activation Function, Unlike YOLOv4 and YOLOv5, they only applied the mish activation function in the detection neck to keep the backbone unchanged with ReLU. Larger input sizes help to increase performance on small objects and modified IoU aware branch. GitHub:- https://github.com/PaddlePaddle/PaddleDetection
June 2021
Scaled-YOLOv4
One year after YOLOv4, the same authors presented Scaled-YOLOv4. Differently from YOLOv4, Scaled YOLOv4 was developed in Pytorch instead of Darknet. The main novelty was the introduction of scaling-up and scaling-down techniques. Scaling up means producing a model that increases accuracy at the expense of a lower speed; on the other hand, scaling down entails producing a model that increases speed sacrificing accuracy. In addition, scaled-down models need less computing power and can run on embedded systems. GitHub:- https://github.com/WongKinYiu/ScaledYOLOv4
July 2020
PP-YOLO
PP-YOLO models have been growing parallel to the YOLO models we described. However, we decided to group them in a single section because they began with YOLOv3 and had been gradually improving upon the previous PP-YOLO version. Nevertheless, these models have been influential in the evolution of YOLO. PP-YOLO similar to YOLOv4 and YOLOv5 was based on YOLOv3. It was published in ArXiv by researchers from Baidu Inc. The authors used the PaddlePaddle deep learning platform, hence its PP name. Following the trend we have seen starting with YOLOv4, PP-YOLO added ten existing tricks to improve the detector’s accuracy, keeping the speed unchanged. GitHub:- https://github.com/PaddlePaddle/PaddleDetection
June 2020
YOLOv5
YOLO v5 is different from all other prior releases, as this is a PyTorch implementation rather than a fork from original Darknet. Same as YOLO v4, the YOLO v5 has a CSP backbone and PA-NET neck. The major improvements includes mosaic data augmentation and auto learning bounding box anchors.GitHub:- https://github.com/ultralytics/yolov5
April 2020
YOLOv4
YOLO v4 takes the influence of state of art BoF (bag of freebies) and several BoS (bag of specials). The BoF improve the accuracy of the detector, without increasing the inference time. They only increase the training cost. On the other hand, the BoS increase the inference cost by a small amount however they significantly improve the accuracy of object detection. GitHub:- https://github.com/AlexeyAB/darknet
March 2018
YOLOv3
The previous version has been improved for an incremental improvement which is now called YOLO v3. As many object detection algorithms are been there for a while now the competition is all about how accurate and quickly objects are detected. YOLO v3 has all we need for object detection in real-time with accurately and classifying the objects. The authors named this as an incremental improvement. GitHub:- https://github.com/pjreddie/darknet
2017
YOLO9000v2
The second version of the YOLO is named as YOLO9000 which has been published by Joseph Redmon and Ali Farhadi at the end of 2016. The major improvements of this version are better , faster and more advanced to meet the Faster R-CNN which also an object detection algorithm which uses a Region Proposal Network to identify the objects from the image input and SSD(Single Shot Multibox Detector). GitHub:- https://github.com/pjreddie/darknet
2016
YOLOv1
It uses Darknet framework which is trained on ImageNet-1000 dataset. This works as mentioned above but has many limitations because of it the use of the YOL v1 is restricted. It could not find small objects if they are appeared as a cluster. This architecture found difficulty in generalisation of objects if the image is of other dimensions different from the trained image. The major issue is localization of objects in the input image. GitHub:- https://github.com/pjreddie/darknet
Conclusion
It’s worth noting that beyond new YOLO versions, the timeline may have continued with even more exciting developments in the YOLO series or related object detection models. The field of computer vision and deep learning is constantly evolving, and researchers and developers continuously strive to improve the performance, speed, and versatility of object detection algorithms.
In conclusion, YOLO’s journey has exemplified the potential of deep learning in the realm of object detection, inspiring advancements that have impacted a wide range of applications, from autonomous vehicles to surveillance systems and beyond. As we move forward, we can expect further progress and innovation, ultimately driving us closer to more comprehensive and efficient solutions for real-world challenges