Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Tetragramm
GitHub Repository: Tetragramm/opencv
Path: blob/master/samples/dnn/README.md
16337 views

OpenCV deep learning module samples

Model Zoo

Object detection

ModelScaleSize WxHMean subtractionChannels order
MobileNet-SSD, Caffe0.00784 (2/255)300x300127.5 127.5 127.5BGR
OpenCV face detector1.0300x300104 177 123BGR
SSDs from TensorFlow0.00784 (2/255)300x300127.5 127.5 127.5RGB
YOLO0.00392 (1/255)416x4160 0 0RGB
VGG16-SSD1.0300x300104 117 123BGR
Faster-RCNN1.0800x600102.9801 115.9465 122.7717BGR
R-FCN1.0800x600102.9801 115.9465 122.7717BGR
Faster-RCNN, ResNet backbone1.0300x300103.939 116.779 123.68RGB
Faster-RCNN, InceptionV2 backbone0.00784 (2/255)300x300127.5 127.5 127.5RGB

Face detection

An origin model with single precision floating point weights has been quantized using TensorFlow framework. To achieve the best accuracy run the model on BGR images resized to 300x300 applying mean subtraction of values (104, 177, 123) for each blue, green and red channels correspondingly.

The following are accuracy metrics obtained using COCO object detection evaluation tool on FDDB dataset (see script) applying resize to 300x300 and keeping an origin images' sizes.

AP - Average Precision | FP32/FP16 | UINT8 | FP32/FP16 | UINT8 | AR - Average Recall | 300x300 | 300x300 | any size | any size | --------------------------------------------------|-----------|----------------|-----------|----------------| AP @[ IoU=0.50:0.95 | area= all | maxDets=100 ] | 0.408 | 0.408 | 0.378 | 0.328 (-0.050) | AP @[ IoU=0.50 | area= all | maxDets=100 ] | 0.849 | 0.849 | 0.797 | 0.790 (-0.007) | AP @[ IoU=0.75 | area= all | maxDets=100 ] | 0.251 | 0.251 | 0.208 | 0.140 (-0.068) | AP @[ IoU=0.50:0.95 | area= small | maxDets=100 ] | 0.050 | 0.051 (+0.001) | 0.107 | 0.070 (-0.037) | AP @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] | 0.381 | 0.379 (-0.002) | 0.380 | 0.368 (-0.012) | AP @[ IoU=0.50:0.95 | area= large | maxDets=100 ] | 0.455 | 0.455 | 0.412 | 0.337 (-0.075) | AR @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] | 0.299 | 0.299 | 0.279 | 0.246 (-0.033) | AR @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] | 0.482 | 0.482 | 0.476 | 0.436 (-0.040) | AR @[ IoU=0.50:0.95 | area= all | maxDets=100 ] | 0.496 | 0.496 | 0.491 | 0.451 (-0.040) | AR @[ IoU=0.50:0.95 | area= small | maxDets=100 ] | 0.189 | 0.193 (+0.004) | 0.284 | 0.232 (-0.052) | AR @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] | 0.481 | 0.480 (-0.001) | 0.470 | 0.458 (-0.012) | AR @[ IoU=0.50:0.95 | area= large | maxDets=100 ] | 0.528 | 0.528 | 0.520 | 0.462 (-0.058) |

Classification

ModelScaleSize WxHMean subtractionChannels order
GoogLeNet1.0224x224104 117 123BGR
SqueezeNet1.0227x2270 0 0BGR

Semantic segmentation

ModelScaleSize WxHMean subtractionChannels order
ENet0.00392 (1/255)1024x5120 0 0RGB
FCN8s1.0500x5000 0 0BGR

References