问题来源

唉上学期最难受的就是刚结束的软件杯了，由于众多的错误只有国三，太难受了。在做软件杯的时候碰到一个问题：我们原先使用的是YCG09/chinese_ocr这里的CTPN。然后我们改为使用eragonruan/text-detection-ctpn这里的CTPN，转换时遇到问题如下：

两个CTPN图像预处理不同，输入识别模型Densenet+CTC的图像灰度和resize形式不一样，对识别结果影响很大。对策为使用eragonruan/text-detection-ctpn的定位结果，对Chinese_ocr的预处理图像进行倾斜纠正和裁切

#YCG09/chinese_ocr的
def resize_im(im, scale, max_scale=None):
    f = float(scale) / min(im.shape[0], im.shape[1])
    if max_scale != None and f * max(im.shape[0], im.shape[1]) > max_scale:
        f = float(max_scale) / max(im.shape[0], im.shape[1])
    return cv2.resize(im, None, None, fx=f, fy=f, interpolation=cv2.INTER_LINEAR), f

#eragonruan/text-detection-ctpn的
def resize_image(img):
    img = cv2.resize(img,(816,608))
    img_size = img.shape
    im_size_min = np.min(img_size[0:2])
    im_size_max = np.max(img_size[0:2])

    im_scale = float(600) / float(im_size_min)
    if np.round(im_scale * im_size_max) > 1200:
        im_scale = float(1200) / float(im_size_max)
    new_h = int(img_size[0] * im_scale)
    new_w = int(img_size[1] * im_scale)

    new_h = new_h if new_h // 16 == 0 else (new_h // 16 + 1) * 16
    new_w = new_w if new_w // 16 == 0 else (new_w // 16 + 1) * 16

    re_im = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
    return re_im, (new_h / img_size[0], new_w / img_size[1])

两个CTPN输出的结果bounding_boxes(即boxes变量)，前8维表示定位4角的xy坐标，但其内xy的坐标次序不一样，经测试：
- chinese_ocr的boxes维度对应xy坐标：
- text-detection-ctpn的boxes维度对应xy坐标：

简单来说，就是需要经过 cv2.resize(…, interpolation=cv2.INTER_LINEAR) 后的坐标结果

解决思路

对YCG09/chinese_ocr使用eragonruan/text-detection-ctpn的定位结果输入识别部分
建立值全为0，大小与text-detection-ctpn结果相等的掩膜，将其boxes上4点在掩膜上值设为255
对掩膜进行变换，转为chinese_ocr的定位结果boxes
另建变量保留原chinese_ocr预处理结果，对其使用变换后boxes进行后续处理

实现

# 建立掩膜
zeros = np.zeros((608,816))
zeros[box[1],box[0]] = 255
zeros[box[3],box[2]] = 255
zeros[box[5],min(box[4], zeros.shape[1]-1)] = 255
zeros[box[7],box[6]+2] = 255

# 变换为chinese_ocr项目内定位输出大小
zeros = cv2.resize(zeros, (image.shape[1], image.shape[0]))#(1473,900)
np.where(zeros>0)

# 提取变换后的四点坐标——这里取中间值处理INTER_LINEAR双线性插值的变换
x = np.where(zeros>0)[0]
y = np.where(zeros>0)[1]
d1 = [x[(len(x)//4)*0+(len(x)//4)//2],y[(len(y)//4)*0+(len(y)//4)//2]]
d2 = [x[(len(x)//4)*1+(len(x)//4)//2],y[(len(y)//4)*1+(len(y)//4)//2]]
d3 = [x[(len(x)//4)*2+(len(x)//4)//2],y[(len(y)//4)*2+(len(y)//4)//2]]
d4 = [x[(len(x)//4)*3+(len(x)//4)//2],y[(len(y)//4)*3+(len(y)//4)//2]]

# 四点重新标记左右上下，采用先分上下点再分左右点的方法
dd = [d1,d2,d3,d4]
dd.sort(key=takeY)
dd1 = dd[:2]
dd2 = dd[2:]
dd1.sort(key=takeX)
dd2.sort(key=takeX)
zuoshang = dd1[0]
zuoxia = dd1[1]
youshang = dd2[0]
youxia = dd2[1]

# 得到转为chinese_ocr后的定位结果boxes
adjust = False
box = [zuoshang[1],zuoshang[0],youshang[1],youshang[0],zuoxia[1],zuoxia[0],youxia[1],youxia[0], 0]

结果

这是对原text-detection-ctpn的boxes定位结果可视化：

result-bofore

这是对处理后boxes后续作用的结果：

result-after