YOLOv5 自适应图片缩放

自适应图片缩放

按照以往的经验，目标检测算法在训练和推理阶段都会resize到统一的图像尺寸，YOLOv5在推理阶段采用了自适应的图片缩放trick。

在YOLOv5 官方github下有这样一段解释，采用32整数倍的矩形框推理要比resize到等长宽的正方形进行推理的时间减少很多（416 ,416)->(256 , 416）。

训练阶段

假设原图尺寸为（523， 699）

（1）计算长边缩放比例 r = 416 / 699 = 0.5951

（2）将原图等比例缩放 (523，699) —>> (311, 416)

（3）填充为（416，416），H侧上下需要填充的大小 pad = (416 - 311) / 2 = 52.5

推理阶段

（1）计算长边缩放比例 r = 416 / 699 = 0.5951

（2）将原图等比例缩放 (523，699) —>> (311, 416)

（3）原始输入图像缩放后的分辨率（设定为32的倍数）： np.ceil(0.5951 x 523 / 32) x 32, np.ceil(1 x 699 / 32) x 32 = (320,416)

（4）计算需要的padding, 宽 padding = (416 - 416) / 2 = 0, 高padding = (320 - 311) / 2 = 4.5 (top 4 , bottom 5)

（5）填充像素值（144，144，144）灰色像素

所以推理阶段的分辨率为（320，416），在保证图像不失真的情况下，可以显著减少计算量，加快推理速度。

code

import numpy as np
import cv2 as cv

def letterbox(im, new_shape, color=(140,140,140), stride=32, auto=True):
    shape = im.shape[:2] # current shape [height, width]

    # calculate scale ratio r
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # compute padding   new_unpad : [w, h]
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    print('new_unpad::',new_unpad)
    dw = new_shape[1] - new_unpad[0]
    dh = new_shape[0] - new_unpad[1]
   
    # minimum rectangle
    if auto:
        dw = np.mod(dw, stride)
        dh = np.mod(dh, stride)
    
    # dw dh for every side
    dw /= 2
    dh /= 2
    
   
    if shape[::-1] != new_unpad: 
        im = cv.resize(im, new_unpad, interpolation=cv.INTER_LINEAR)
    # padding   if dw < 1： dw = 0
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    print('top',top, bottom)
    left, right = int(round(dw - 0.1)), int(round(dw + 0,1))

    im = cv.copyMakeBorder(im, top,bottom,left,right, cv.BORDER_CONSTANT, value=color)
    
    print(im.shape)
    
    return im, r, (dw, dh)


if __name__ == '__main__':
    img_path = "D:\\person\\py_code\\list\\R-C.png"
    img = cv.imread(img_path)

    im, r, _ = letterbox(img, (416,416))
    
    print(r)
    cv.imwrite('D:\\person\\py_code\\list\\new.jpg', im)