自适应图片缩放
按照以往的经验,目标检测算法在训练和推理阶段都会resize到统一的图像尺寸,YOLOv5在推理阶段采用了自适应的图片缩放trick。
在YOLOv5 官方github下有这样一段解释,采用32整数倍的矩形框推理要比resize到等长宽的正方形进行推理的时间减少很多(416 ,416)->(256 , 416)。
训练阶段
假设原图尺寸为(523, 699)
(1) 计算长边缩放比例 r = 416 / 699 = 0.5951
(2)将原图等比例缩放 (523,699) —>> (311, 416)
(3) 填充为(416,416),H侧上下需要填充的大小 pad = (416 - 311) / 2 = 52.5
推理阶段
(1) 计算长边缩放比例 r = 416 / 699 = 0.5951
(2)将原图等比例缩放 (523,699) —>> (311, 416)
(3)原始输入图像缩放后的分辨率(设定为32的倍数): np.ceil(0.5951 x 523 / 32) x 32, np.ceil(1 x 699 / 32) x 32 = (320,416)
(4)计算需要的padding, 宽 padding = (416 - 416) / 2 = 0, 高padding = (320 - 311) / 2 = 4.5 (top 4 , bottom 5)
(5)填充像素值 (144,144,144)灰色像素
所以推理阶段的分辨率为(320,416), 在保证图像不失真的情况下,可以显著减少计算量,加快推理速度。
code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| import numpy as np import cv2 as cv
def letterbox(im, new_shape, color=(140,140,140), stride=32, auto=True): shape = im.shape[:2]
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) print('new_unpad::',new_unpad) dw = new_shape[1] - new_unpad[0] dh = new_shape[0] - new_unpad[1] if auto: dw = np.mod(dw, stride) dh = np.mod(dh, stride) dw /= 2 dh /= 2 if shape[::-1] != new_unpad: im = cv.resize(im, new_unpad, interpolation=cv.INTER_LINEAR) top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) print('top',top, bottom) left, right = int(round(dw - 0.1)), int(round(dw + 0,1))
im = cv.copyMakeBorder(im, top,bottom,left,right, cv.BORDER_CONSTANT, value=color) print(im.shape) return im, r, (dw, dh)
if __name__ == '__main__': img_path = "D:\\person\\py_code\\list\\R-C.png" img = cv.imread(img_path)
im, r, _ = letterbox(img, (416,416)) print(r) cv.imwrite('D:\\person\\py_code\\list\\new.jpg', im)
|