目标检测中的mAP和AP计算原理

1TP、FN、FP、TN这4个概念来源于下面两个表格，两个表格只是在不同领域中的叫法，实际上是一样的，其中，第一个表是在机器学习领域的叫法，第二个表格是在医学领域上的叫法。在二分类问题上，根据预测结果与真实情况的差别，会存在以下4种情况。True positive (TP)False negative (FN)False positive (FP)True negative (T...

dekiang

4500人浏览 · 2020-09-18 22:46:01

dekiang · 2020-09-18 22:46:01 发布

前言

一. 基本概念

1.1 分类问题中的四类样本

这4个概念来源于下面两个表格，这两个表格只是在不同领域中的叫法，实际上是一样的，其中，第一个表是在机器学习领域的叫法，第二个表格是在医学领域上的叫法。

真实情况	预测结果
真实情况	正例	反例
正例	TP(真正例)	FN(假反例)
反例	FP(假正例)	TN(真反例)

真实情况	预测结果
真实情况	阳性	阴性
阳性	TP(真阳例)	FN(假阴例)
阴性	FP(假阳例)	TN(真阴例)

在二分类问题上，根据预测结果与真实情况的差别，会存在以下4种情况。

True positive (TP)
False negative (FN)
False positive (FP)
True negative (TN)

初学者很容易混淆以上4种情况，但实际上是非常好记的，关键是理解以下两点：
（1）True or False 表示预测是否准确
（2）positive or negative 表示预测结果

1.2 精确率（precision）

精确率，又称查准率。查准率是衡量某一检索系统的信号噪声比的一种指标，即检出的相关文献与检出的全部文献的百分比。
在分类问题中，精确率（查准率）表征的是某算法识别出的正样本中，有多少确实为正样本，其计算公式为：TP / ( TP +FP )。
在目标检测领域，precision表示某一类样本预测有多准，离开类别谈precision是没有意义的。在二分类问题中，样本只有两类，不是正样本就是负样本，此时计算precision无需说明类别。但是在目标检测领域，边界框是有多个类别的，这时在计算precision时就需要说明是计算的是哪一类样本的precision。

1.3 召回率（recall）

召回率，又称查全率。查全率是衡量某一检索系统从文献集合中检出相关文献成功度的一项指标，即检出的相关文献与全部相关文献的百分比。
在分类问题中，召回率（查全率）表征某算法将所有正样本正确识别出来的能力，其计算公式为：TP / (TP + FN ) 。
在目标检测领域，recall表示某一类样本中，预测正确的边界框与所有同类别GT框的比例。同样，离开类别谈recall是没有意义的。

1.4 准确率（accuracy）

准确率=预测正确的样本数/所有样本数，即预测正确的样本比例（包括预测正确的正样本和预测正确的负样本，其计算公式为：Acc = ( TP + TN ) / (AllSamples )。不过在目标检测领域，没有预测正确的负样本这一说法，所以目标检测里面没有用Accuracy的）。

1.5 其他

真阳性率（true positive rate）：TPR = TP / ( TP+FN ) = TP / T （敏感性 sensitivity）
真阴性率（true negative rate）：TNR= TN / (FP + TN) （特异性：specificity)

1.6 讨论

1. 为什么在衡量一个分类算法好坏的时候，要同时引入precision和recall这两个指标。

精确度和召回率从两个不同方面衡量分类算法的好坏。在图像分类任务中，虽然很多时候考察的是accuracy，比如ImageNet的评价标准。但具体到单个类别，如果recall比较高，但precision较低，比如大部分的汽车都被识别出来了，但把很多卡车也误识别为了汽车，这时候对应一个原因。如果recall较低，precision较高，比如检测出的飞机结果很准确，但是还有很多飞机没有被识别出来，这时候又有一个原因。

recall度量的是「查全率」，所有的正样本是不是都被检测出来了。比如在肿瘤预测场景中，要求模型有更高的recall，不能放过每一个肿瘤。

precision度量的是「查准率」，在所有检测出的正样本中是不是实际都为正样本。比如在垃圾邮件判断等场景中，要求有更高的precision，确保放到回收站的都是垃圾邮件。

二. mAP来度量目标检测的准确度。

假设以测试集的一张图像为例，该图像存在类别标签为1、2、3的三类物体，GT框总数为6，其中，类别1有1个物体，类别2有2个物体，类别3有3个物体。检测器对这张图像进行检测后，给出了10个DT框，如下图所示。
在这里插入图片描述

2.1 计算某一类别在某一IoU阈值下的AP：

2.1.1 判断DT框的正负性

（1）首先选出所有同一类别的预测框，并对其进行排序。
（2）判断正负性：计算分数最高的DT框与所有同类别的GT框之间的IoU，如果IoU中的最大者小于给定阈值，则将该DT框视为负样本，否则将IoU最大者相对应的GT框与该DT框相匹配，表示该DT框在IoU阈值的条件下对该GT框进行了正确的检测，此步骤完成了DT框与GT框的最佳匹配。
（3）剔除上述已成功匹配的GT框，选择分数次高的DT框，并在剩余未匹配的GT框中进行正负性判断。
（4）重复执行（2）（3）直到所有DT框均有正负性，或者所有GT框已完成匹配。

伪代码为：

输入：同一类别的DT框集合DTSet和GT框集合GTSet、IoU阈值iou_thresh

DTSet = sort(DTSet)  # 按照预测分数进行降序排序
while DTSet非空 and GTSet非空 do
	DT = DTSet(0)  # 在DTSet中取出当前分数最高的框，记为DT
	i = argmax IoU(DT, GTSet)  # 找出与DT框之间的IoU最大的那个GT框的编号
	if IoU(DT, GTSet(i)) > iou_thresh：
		DT框为正样本
		GTSet = GTSet - GTSet(i)  # 删除已经成功匹配的GT框
	else:
		DT框为负样本
	DTSet = DTSet -DT  # 删除判断过正负性的DT框
# 退出循环后，未进行正负性判断的DT框，均认为是负样本
for DT in DTSet:
	DT框为负样本

输出：所以完成正负判断的DT框集合

按照上述算法，在pred_label=3时，判断DT框的正负性，得到下表。（为了简单，并未给出真正的GT box和DT box）。
在这里插入图片描述

2.1.2 计算P和R

如上表所示，按照分数由高到低计算，在计算分数较低的框时，是以所有分数大于等于该框分数的框作为一个整体的。
（1）第一行：id = 10的DT为positive，positive总数为1，且GT框总数为3，则P=1/1，R=1/3
（2）第二行：以id=10和id=6为整体，positive总数为1，则P=1/2，R=1/3
（3）第三行：以id=10、id=6、id=4为整体，positive总数为2，则P=2/3，R=2/3
后面依此类推…

2.1.3 计算AP

AP即为PR曲线的面积，对于上表
AP = Precision .* Recall
.*表示点积，即对应元素相乘再相加。

其余类别依此类推…
在这里插入图片描述

注意：

判断DT框的正负是在每一张图像上对每一类别进行逐一判断的
计算某个类别的P、R和AP是在所有图像下得到的同一类别下的所有DT框下进行判断的。

最后附上COCO API源码文件的大白话注释吧

2.2 讨论

__author__ = 'tsungyi'

import numpy as np
import datetime
import time
from collections import defaultdict
from . import mask as maskUtils
import copy

class COCOeval:
    # Interface for evaluating detection on the Microsoft COCO dataset.
    #
    # The usage for CocoEval is as follows:
    #  cocoGt=..., cocoDt=...       # load dataset and results
    #  E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
    #  E.params.recThrs = ...;      # set parameters as desired
    #  E.evaluate();                # run per image evaluation
    #  E.accumulate();              # accumulate per image results
    #  E.summarize();               # display summary metrics of results
    # For example usage see evalDemo.m and http://mscoco.org/.
    #
    # The evaluation parameters are as follows (defaults in brackets):
    #  imgIds     - [all] N img ids to use for evaluation
    #  catIds     - [all] K cat ids to use for evaluation
    #  iouThrs    - [.5:.05:.95] T=10 IoU thresholds for evaluation
    #  recThrs    - [0:.01:1] R=101 recall thresholds for evaluation
    #  areaRng    - [...] A=4 object area ranges for evaluation
    #  maxDets    - [1 10 100] M=3 thresholds on max detections per image
    #  iouType    - ['segm'] set iouType to 'segm', 'bbox' or 'keypoints'
    #  iouType replaced the now DEPRECATED useSegm parameter.
    #  useCats    - [1] if true use category labels for evaluation
    # Note: if useCats=0 category labels are ignored as in proposal scoring.
    # Note: multiple areaRngs [Ax2] and maxDets [Mx1] can be specified.
    #
    # evaluate(): evaluates detections on every image and every category and
    # concats the results into the "evalImgs" with fields:
    #  dtIds      - [1xD] id for each of the D detections (dt)
    #  gtIds      - [1xG] id for each of the G ground truths (gt)
    #  dtMatches  - [TxD] matching gt id at each IoU or 0
    #  gtMatches  - [TxG] matching dt id at each IoU or 0
    #  dtScores   - [1xD] confidence of each dt
    #  gtIgnore   - [1xG] ignore flag for each gt
    #  dtIgnore   - [TxD] ignore flag for each dt at each IoU
    #
    # accumulate(): accumulates the per-image, per-category evaluation
    # results in "evalImgs" into the dictionary "eval" with fields:
    #  params     - parameters used for evaluation
    #  date       - date evaluation was performed
    #  counts     - [T,R,K,A,M] parameter dimensions (see above)
    #  precision  - [TxRxKxAxM] precision for every evaluation setting
    #  recall     - [TxKxAxM] max recall for every evaluation setting
    # Note: precision and recall==-1 for settings with no gt objects.
    #
    # See also coco, mask, pycocoDemo, pycocoEvalDemo
    #
    # Microsoft COCO Toolbox.      version 2.0
    # Data, paper, and tutorials available at:  http://mscoco.org/
    # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
    # Licensed under the Simplified BSD License [see coco/license.txt]
    def __init__(self, cocoGt=None, cocoDt=None, iouType='segm'):
        '''
        Initialize CocoEval using coco APIs for gt and dt
        :param cocoGt: coco object with ground truth annotations
        :param cocoDt: coco object with detection results
        :return: None
        '''
        if not iouType:
            print('iouType not specified. use default iouType segm')
        self.cocoGt   = cocoGt              # ground truth COCO API
        self.cocoDt   = cocoDt              # detections COCO API
        self.params   = {}                  # evaluation parameters
        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results [KxAxI] elements
        self.eval     = {}                  # accumulated evaluation results
        self._gts = defaultdict(list)       # gt for evaluation
        self._dts = defaultdict(list)       # dt for evaluation
        self.params = Params(iouType=iouType) # parameters
        self._paramsEval = {}               # parameters for evaluation
        self.stats = []                     # result summarization
        self.ious = {}                      # ious between all gts and dts
        if not cocoGt is None:
            self.params.imgIds = sorted(cocoGt.getImgIds())
            self.params.catIds = sorted(cocoGt.getCatIds())


    def _prepare(self):
        '''
        Prepare ._gts and ._dts for evaluation based on params
        :return: None
        '''
        def _toMask(anns, coco):
            # modify ann['segmentation'] by reference
            for ann in anns:
                rle = coco.annToRLE(ann)
                ann['segmentation'] = rle
        p = self.params
        if p.useCats:
            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
        else:
            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))

        # 对于目标检测，这个不用管
        # convert ground truth to mask if iouType == 'segm'
        if p.iouType == 'segm':
            _toMask(gts, self.cocoGt)
            _toMask(dts, self.cocoDt)

        # set ignore flag
        for gt in gts:
            gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
            gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
            if p.iouType == 'keypoints':
                gt['ignore'] = (gt['num_keypoints'] == 0) or gt['ignore']
        self._gts = defaultdict(list)       # gt for evaluation
        self._dts = defaultdict(list)       # dt for evaluation
        for gt in gts:
            self._gts[gt['image_id'], gt['category_id']].append(gt)
        for dt in dts:
            self._dts[dt['image_id'], dt['category_id']].append(dt)
        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results
        self.eval     = {}                  # accumulated evaluation results

    def evaluate(self):
        '''
        Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
        :return: None
        '''
        tic = time.time()
        print('Running per image evaluation...')

        # 1.先对参数self.params进行预处理
        p = self.params
        # add backward compatibility if useSegm is specified in params
        # p.useSegm默认是None, 对于目标检测，p.iouType == 'bbox'
        if not p.useSegm is None:
            p.iouType = 'segm' if p.useSegm == 1 else 'bbox'
            print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType))
        print('Evaluate annotation type *{}*'.format(p.iouType))
        # 可能存在着重复的p.imgIds，过滤一下
        p.imgIds = list(np.unique(p.imgIds))
        # 目标检测默认是要按类别进行区分的，所以也过滤一下重复的类别id
        if p.useCats:
            p.catIds = list(np.unique(p.catIds))
        # p.maxDets的默认值是[1, 10, 100]，排一下序
        p.maxDets = sorted(p.maxDets)
        # 处理后再赋值给self.params
        self.params=p

        # 2.处理一下gt和dt的图像id和类别id
        self._prepare()

        # 3.loop through images, area range, max detection number

        # 对于目标检测，就是直接把p.catIds赋值给catIds
        catIds = p.catIds if p.useCats else [-1]

        # 对于目标检测，就是采用self.computeIoU这个函数
        if p.iouType == 'segm' or p.iouType == 'bbox':
            computeIoU = self.computeIoU
        elif p.iouType == 'keypoints':
            computeIoU = self.computeOks

        # 对于COCO2017val, 有N=5000张图像，K=80个类别，所以共有5000*80个不同的元组(imgId, catId)
        # 以这些元组作为字典self.ious的键，其iou作为键值
        self.ious = {(imgId, catId): computeIoU(imgId, catId) \
                        for imgId in p.imgIds
                        for catId in catIds}

        evaluateImg = self.evaluateImg
        maxDet = p.maxDets[-1]  # 默认值100
        # self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
        self.evalImgs = [evaluateImg(imgId, catId, areaRng, maxDet)
                 for catId in catIds
                 for areaRng in p.areaRng
                 for imgId in p.imgIds
             ]
        self._paramsEval = copy.deepcopy(self.params)
        toc = time.time()
        print('DONE (t={:0.2f}s).'.format(toc-tic))

    def computeIoU(self, imgId, catId):
        # 以某一张图像的某一个类别来计算IoU
        p = self.params
        if p.useCats:
            # 目标检测使用两行代码，对于给定的图像imgId和给定的类别catId
            # 分别取出其对应的gt框的dt框，可以出现多种情况：
            # （1）len(gt) == 0 and len(dt) ==0 这种情况是“好情况”
            # （2）len(gt) == 0 and len(dt) !=0 说明误检了
            # （3）len(gt) != 0 and len(dt) ==0 说明漏检了
            # （4）len(gt) != 0 and len(dt) !=0 起码检测出了该类别，好不好就看定位得怎样了
            gt = self._gts[imgId,catId]
            dt = self._dts[imgId,catId]
        else:
            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
        if len(gt) == 0 and len(dt) ==0:
            return []  # 啥也没有，直接返回

        # 使用快速排序法对检测出来的边界框dt按照score进行降序排序，得到索引
        inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
        # 按照索引取dt，实际上最终实现了：对检测出来的边界框dt按照score进行降序排序
        dt = [dt[i] for i in inds]

        # p.maxDets[-1]默认值是100,所以如果检测出来的dt框数量大于100，那只取score最大的前100个框
        if len(dt) > p.maxDets[-1]:
            dt=dt[0:p.maxDets[-1]]

        if p.iouType == 'segm':
            g = [g['segmentation'] for g in gt]
            d = [d['segmentation'] for d in dt]
        elif p.iouType == 'bbox':
            # 对于目标检测，执行这两行代码
            g = [g['bbox'] for g in gt]
            d = [d['bbox'] for d in dt]
        else:
            raise Exception('unknown iouType for iou computation')

        # compute iou between each dt and gt region
        iscrowd = [int(o['iscrowd']) for o in gt]
        # 计算每个gt与每个dt的IoU
        ious = maskUtils.iou(d,g,iscrowd)
        return ious

    def computeOks(self, imgId, catId):
        p = self.params
        # dimention here should be Nxm
        gts = self._gts[imgId, catId]
        dts = self._dts[imgId, catId]
        inds = np.argsort([-d['score'] for d in dts], kind='mergesort')
        dts = [dts[i] for i in inds]
        if len(dts) > p.maxDets[-1]:
            dts = dts[0:p.maxDets[-1]]
        # if len(gts) == 0 and len(dts) == 0:
        if len(gts) == 0 or len(dts) == 0:
            return []
        ious = np.zeros((len(dts), len(gts)))
        sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62,.62, 1.07, 1.07, .87, .87, .89, .89])/10.0
        vars = (sigmas * 2)**2
        k = len(sigmas)
        # compute oks between each detection and ground truth object
        for j, gt in enumerate(gts):
            # create bounds for ignore regions(double the gt bbox)
            g = np.array(gt['keypoints'])
            xg = g[0::3]; yg = g[1::3]; vg = g[2::3]
            k1 = np.count_nonzero(vg > 0)
            bb = gt['bbox']
            x0 = bb[0] - bb[2]; x1 = bb[0] + bb[2] * 2
            y0 = bb[1] - bb[3]; y1 = bb[1] + bb[3] * 2
            for i, dt in enumerate(dts):
                d = np.array(dt['keypoints'])
                xd = d[0::3]; yd = d[1::3]
                if k1>0:
                    # measure the per-keypoint distance if keypoints visible
                    dx = xd - xg
                    dy = yd - yg
                else:
                    # measure minimum distance to keypoints in (x0,y0) & (x1,y1)
                    z = np.zeros((k))
                    dx = np.max((z, x0-xd),axis=0)+np.max((z, xd-x1),axis=0)
                    dy = np.max((z, y0-yd),axis=0)+np.max((z, yd-y1),axis=0)
                e = (dx**2 + dy**2) / vars / (gt['area']+np.spacing(1)) / 2
                if k1 > 0:
                    e=e[vg > 0]
                ious[i, j] = np.sum(np.exp(-e)) / e.shape[0]
        return ious

    def evaluateImg(self, imgId, catId, aRng, maxDet):
        '''
        perform evaluation for single category and image
        :return: dict (single image results)
        '''
        p = self.params
        if p.useCats:
            # 目标检测使用两行代码，对于给定的图像imgId和给定的类别catId
            # 分别取出其对应的gt框的dt框
            gt = self._gts[imgId,catId]
            dt = self._dts[imgId,catId]
        else:
            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
        if len(gt) == 0 and len(dt) ==0:
            return None

        # 对于gt框，框的面积不在规定范围之内，都将被忽略，这样方便用于计算面积大小的AP:
        # 如AP_small、AP_medium、AP_large、AP_all
        for g in gt:
            if g['ignore'] or (g['area']<aRng[0] or g['area']>aRng[1]):
                g['_ignore'] = 1
            else:
                g['_ignore'] = 0

        # sort dt highest score first, sort gt ignore last
        # 升序排序，最终gt框中的_ignore框就被扔到了最后
        gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
        gt = [gt[i] for i in gtind]

        # 降序排序，分数越高的dt框越靠前
        dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
        dt = [dt[i] for i in dtind[0:maxDet]]

        iscrowd = [int(o['iscrowd']) for o in gt]
        # load computed ious
        # 只有一种情况满足len(self.ious[imgId, catId]) > 0：就是之前在计算iou时出现的第四种情况
        # 这时相应的IoU(dt, gt):dt取所有，gt只取g['_ignore'] = 0的
        ious = self.ious[imgId, catId][:, gtind] if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId]

        T = len(p.iouThrs) # T= 10
        G = len(gt)
        D = len(dt)
        gtm  = np.zeros((T,G))
        dtm  = np.zeros((T,D))
        gtIg = np.array([g['_ignore'] for g in gt])
        dtIg = np.zeros((T,D))
        if not len(ious)==0:
            for tind, t in enumerate(p.iouThrs):
                for dind, d in enumerate(dt):
                    # information about best match so far (m=-1 -> unmatched)
                    iou = min([t,1-1e-10])
                    m   = -1
                    for gind, g in enumerate(gt):
                        # if this gt already matched, and not a crowd, continue
                        if gtm[tind,gind]>0 and not iscrowd[gind]:
                            continue
                        # if dt matched to reg gt, and on ignore gt, stop
                        # 如果dt[dind]已经找到匹配（m>-1）且 成功匹配的gt框是有效的 且 当前要匹配的gt框是ignore的，就不找了
                        # 这样子的话，假如dt[dind]一直没找到匹配对，那我可以在无效gt框中继续寻找咯?
                        if m>-1 and gtIg[m]==0 and gtIg[gind]==1:
                            break
                        # continue to next gt unless better match made
                        if ious[dind,gind] < iou:
                            # 程序能来到这里，表明dt[dind]和gt[gind]的iou小于目前的iou阈值，即dt[dind]还需要继续寻找剩下的
                            # 为匹配的gt框，直到搜索空间为空
                            continue
                        # if match successful and best so far, store appropriately
                        iou=ious[dind,gind]
                        m=gind
                    # if match made store id of match for both dt and gt
                    if m ==-1:
                        continue
                    # 程序能来到这里，表明在IoU阈值=p.iouThrs[tind]下，预测框dt[dind]找到了与其最佳匹配的真实框gt[m]

                    dtIg[tind,dind] = gtIg[m]
                    # dtm[tind,dind]表明：在IoU阈值=p.iouThrs[tind]下，预测框dt[dind]的匹配对象为gt[m],于是存了gt[m]的id
                    dtm[tind,dind]  = gt[m]['id']
                    # gtm[tind,m]表明：在IoU阈值=p.iouThrs[tind]下，真实框gt[m]的匹配对象为dt[dind],于是存了dt[dind]的id
                    gtm[tind,m]     = d['id']

        # 感觉上面的匹配是不考虑面积的，到这里才考虑面积
        # set unmatched detections outside of area range to ignore
        a = np.array([d['area']<aRng[0] or d['area']>aRng[1] for d in dt]).reshape((1, len(dt)))
        dtIg = np.logical_or(dtIg, np.logical_and(dtm==0, np.repeat(a,T,0)))

        # store results for given image and category
        return {
                'image_id':     imgId,
                'category_id':  catId,
                'aRng':         aRng,
                'maxDet':       maxDet,
                'dtIds':        [d['id'] for d in dt],
                'gtIds':        [g['id'] for g in gt],
                'dtMatches':    dtm,
                'gtMatches':    gtm,
                'dtScores':     [d['score'] for d in dt],
                'gtIgnore':     gtIg,
                'dtIgnore':     dtIg,
            }

    def accumulate(self, p = None):
        '''
        Accumulate per image evaluation results and store the result in self.eval
        :param p: input params for evaluation
        :return: None
        '''
        print('Accumulating evaluation results...')
        tic = time.time()
        if not self.evalImgs:
            print('Please run evaluate() first')
        # allows input customized parameters
        if p is None:
            p = self.params
        p.catIds = p.catIds if p.useCats == 1 else [-1]
        T           = len(p.iouThrs)
        R           = len(p.recThrs)
        K           = len(p.catIds) if p.useCats else 1
        A           = len(p.areaRng)
        M           = len(p.maxDets)
        precision   = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories
        recall      = -np.ones((T,K,A,M))
        scores      = -np.ones((T,R,K,A,M))

        # create dictionary for future indexing
        _pe = self._paramsEval
        catIds = _pe.catIds if _pe.useCats else [-1]
        setK = set(catIds)
        setA = set(map(tuple, _pe.areaRng))
        setM = set(_pe.maxDets)
        setI = set(_pe.imgIds)

        # get inds to evaluate
        k_list = [n for n, k in enumerate(p.catIds)  if k in setK]
        # k_list = [0, 1, 2, ..., 79]
        m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
        # m_list = [1, 10, 100]
        a_list = [n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA]
        # a_list = [0, 1, 2, 3]
        i_list = [n for n, i in enumerate(p.imgIds)  if i in setI]
        # i_list = [0, 1, 2, ..., 4999]
        I0 = len(_pe.imgIds) # I0 = 5000
        A0 = len(_pe.areaRng) # A0 = 4

        # retrieve E at each category, area range, and max number of detections
        for k, k0 in enumerate(k_list):
            Nk = k0*A0*I0
            for a, a0 in enumerate(a_list):
                Na = a0*I0
                for m, maxDet in enumerate(m_list):
                    # 由3维数据找出对应的一维索引
                    E = [self.evalImgs[Nk + Na + i] for i in i_list]
                    E = [e for e in E if not e is None]
                    if len(E) == 0:
                        continue
                    dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E])

                    # different sorting method generates slightly different results.
                    # mergesort is used to be consistent as Matlab implementation.
                    inds = np.argsort(-dtScores, kind='mergesort')
                    dtScoresSorted = dtScores[inds]

                    dtm  = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds]
                    dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet]  for e in E], axis=1)[:,inds]
                    gtIg = np.concatenate([e['gtIgnore'] for e in E])
                    npig = np.count_nonzero(gtIg==0 )
                    if npig == 0:
                        continue
                    tps = np.logical_and(               dtm,  np.logical_not(dtIg) )
                    fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg) )

                    tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
                    fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
                    for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
                        tp = np.array(tp)
                        fp = np.array(fp)
                        nd = len(tp)
                        rc = tp / npig
                        pr = tp / (fp+tp+np.spacing(1))
                        q  = np.zeros((R,))
                        ss = np.zeros((R,))

                        if nd:
                            recall[t,k,a,m] = rc[-1]
                        else:
                            recall[t,k,a,m] = 0

                        # numpy is slow without cython optimization for accessing elements
                        # use python array gets significant speed improvement
                        pr = pr.tolist(); q = q.tolist()

                        for i in range(nd-1, 0, -1):
                            if pr[i] > pr[i-1]:
                                pr[i-1] = pr[i]

                        inds = np.searchsorted(rc, p.recThrs, side='left')
                        try:
                            for ri, pi in enumerate(inds):
                                q[ri] = pr[pi]
                                ss[ri] = dtScoresSorted[pi]
                        except:
                            pass
                        precision[t,:,k,a,m] = np.array(q)
                        scores[t,:,k,a,m] = np.array(ss)
        self.eval = {
            'params': p,
            'counts': [T, R, K, A, M],
            'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'precision': precision,
            'recall':   recall,
            'scores': scores,
        }
        toc = time.time()
        print('DONE (t={:0.2f}s).'.format( toc-tic))

    def summarize(self):
        '''
        Compute and display summary metrics for evaluation results.
        Note this functin can *only* be applied on the default parameter setting
        '''
        def _summarize( ap=1, iouThr=None, areaRng='all', maxDets=100 ):
            p = self.params
            iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
            titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
            typeStr = '(AP)' if ap==1 else '(AR)'
            iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
                if iouThr is None else '{:0.2f}'.format(iouThr)

            aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
            mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
            if ap == 1:
                # dimension of precision: [TxRxKxAxM]
                s = self.eval['precision']
                # IoU
                if iouThr is not None:
                    t = np.where(iouThr == p.iouThrs)[0]
                    s = s[t]
                s = s[:,:,:,aind,mind]
            else:
                # dimension of recall: [TxKxAxM]
                s = self.eval['recall']
                if iouThr is not None:
                    t = np.where(iouThr == p.iouThrs)[0]
                    s = s[t]
                s = s[:,:,aind,mind]
            if len(s[s>-1])==0:
                mean_s = -1
            else:
                mean_s = np.mean(s[s>-1])
            print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
            return mean_s
        def _summarizeDets():
            stats = np.zeros((12,))
            stats[0] = _summarize(1)
            stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
            stats[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
            stats[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
            stats[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
            stats[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
            stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
            stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
            stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
            stats[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
            stats[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
            stats[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
            return stats
        def _summarizeKps():
            stats = np.zeros((10,))
            stats[0] = _summarize(1, maxDets=20)
            stats[1] = _summarize(1, maxDets=20, iouThr=.5)
            stats[2] = _summarize(1, maxDets=20, iouThr=.75)
            stats[3] = _summarize(1, maxDets=20, areaRng='medium')
            stats[4] = _summarize(1, maxDets=20, areaRng='large')
            stats[5] = _summarize(0, maxDets=20)
            stats[6] = _summarize(0, maxDets=20, iouThr=.5)
            stats[7] = _summarize(0, maxDets=20, iouThr=.75)
            stats[8] = _summarize(0, maxDets=20, areaRng='medium')
            stats[9] = _summarize(0, maxDets=20, areaRng='large')
            return stats
        if not self.eval:
            raise Exception('Please run accumulate() first')
        iouType = self.params.iouType
        if iouType == 'segm' or iouType == 'bbox':
            summarize = _summarizeDets
        elif iouType == 'keypoints':
            summarize = _summarizeKps
        self.stats = summarize()

    def __str__(self):
        self.summarize()

class Params:
    '''
    Params for coco evaluation api
    '''
    def setDetParams(self):
        self.imgIds = []
        self.catIds = []
        # np.arange causes trouble.  the data point on arange is slightly larger than the true value
        self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
        # [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95] 10个IoU阈值
        self.recThrs = np.linspace(.0, 1.00, np.round((1.00 - .0) / .01) + 1, endpoint=True)
        # 0:0.01:1 共101个不同的召回率
        self.maxDets = [1, 10, 100]
        self.areaRng = [[0 ** 2, 1e5 ** 2], [0 ** 2, 32 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
        self.areaRngLbl = ['all', 'small', 'medium', 'large']
        self.useCats = 1

    def setKpParams(self):
        self.imgIds = []
        self.catIds = []
        # np.arange causes trouble.  the data point on arange is slightly larger than the true value
        self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
        self.recThrs = np.linspace(.0, 1.00, np.round((1.00 - .0) / .01) + 1, endpoint=True)
        self.maxDets = [20]
        self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
        self.areaRngLbl = ['all', 'medium', 'large']
        self.useCats = 1

    def __init__(self, iouType='segm'):
        if iouType == 'segm' or iouType == 'bbox':
            self.setDetParams()
        elif iouType == 'keypoints':
            self.setKpParams()
        else:
            raise Exception('iouType not supported')
        self.iouType = iouType
        # useSegm is deprecated
        self.useSegm = None