0%

YOLOv5 自适应图片缩放

自适应图片缩放

按照以往的经验,目标检测算法在训练和推理阶段都会resize到统一的图像尺寸,YOLOv5在推理阶段采用了自适应的图片缩放trick。

img

在YOLOv5 官方github下有这样一段解释,采用32整数倍的矩形框推理要比resize到等长宽的正方形进行推理的时间减少很多(416 ,416)->(256 , 416)。

训练阶段

假设原图尺寸为(523, 699)

(1) 计算长边缩放比例 r = 416 / 699 = 0.5951

(2)将原图等比例缩放 (523,699) —>> (311, 416)

image-20211221174250645

(3) 填充为(416,416),H侧上下需要填充的大小 pad = (416 - 311) / 2 = 52.5

new

推理阶段

(1) 计算长边缩放比例 r = 416 / 699 = 0.5951

(2)将原图等比例缩放 (523,699) —>> (311, 416)

(3)原始输入图像缩放后的分辨率(设定为32的倍数): np.ceil(0.5951 x 523 / 32) x 32, np.ceil(1 x 699 / 32) x 32 = (320,416)

(4)计算需要的padding, 宽 padding = (416 - 416) / 2 = 0, 高padding = (320 - 311) / 2 = 4.5 (top 4 , bottom 5)

(5)填充像素值 (144,144,144)灰色像素

所以推理阶段的分辨率为(320,416), 在保证图像不失真的情况下,可以显著减少计算量,加快推理速度。

new

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
import cv2 as cv

def letterbox(im, new_shape, color=(140,140,140), stride=32, auto=True):
shape = im.shape[:2] # current shape [height, width]

# calculate scale ratio r
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

# compute padding new_unpad : [w, h]
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
print('new_unpad::',new_unpad)
dw = new_shape[1] - new_unpad[0]
dh = new_shape[0] - new_unpad[1]

# minimum rectangle
if auto:
dw = np.mod(dw, stride)
dh = np.mod(dh, stride)

# dw dh for every side
dw /= 2
dh /= 2


if shape[::-1] != new_unpad:
im = cv.resize(im, new_unpad, interpolation=cv.INTER_LINEAR)
# padding if dw < 1: dw = 0
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
print('top',top, bottom)
left, right = int(round(dw - 0.1)), int(round(dw + 0,1))

im = cv.copyMakeBorder(im, top,bottom,left,right, cv.BORDER_CONSTANT, value=color)

print(im.shape)

return im, r, (dw, dh)


if __name__ == '__main__':
img_path = "D:\\person\\py_code\\list\\R-C.png"
img = cv.imread(img_path)

im, r, _ = letterbox(img, (416,416))

print(r)
cv.imwrite('D:\\person\\py_code\\list\\new.jpg', im)