从COCO数据集提取人体关键点检测图像和标签

COCO数据集筛选人体姿态关键点数据集

K歌道童1

2307人浏览 · 2025-05-01 12:00:00

K歌道童1 · 2025-05-01 12:00:00 发布

COCO 数据集提取人体关键点检测数据集

COCO 数据集下载
安装所需库环境
COCO 数据集筛选人体关键点图像和标签文件
- json 标签文件筛选
- txt 标签文件筛选
COCO 数据集筛选行人目标检测数据集

COCO 数据集下载

COCO数据集是一个用于计算机视觉领域的数据集，该数据集包含了生活中丰富场景的不同目标类别。
COCO数据集官网下载： https://cocodataset.org/#download

在这里插入图片描述

本文以 COCO 2017 数据集为主，对其中包含关键点检测信息的图像和标签进行分离，分离后的标签转换为 .json 格式的文件，可用 Labelme 软件可视化；也可以转换为 txt 的标签文件，用于模型训练。
分别下载 COCO 2017 的训练和验证数据集，以及对应的标注信息annotations：

COCO 2017 训练、验证数据集和标签下载：

训练集下载链接：http://images.cocodataset.org/zips/train2017.zip（118287张图像）

验证集下载链接：http://images.cocodataset.org/zips/val2017.zip（5000张图像）

标签下载链接：http://images.cocodataset.org/annotations/annotations_trainval2017.zip

其中，训练集和验证集总共123K张图像，标注文件annotations_trainval2017.zip中包含的json文件如下：

$annotations_—trainval2017.zip \begin{cases} captions_—train2017.json& \text{87.6 MB}\\[2ex] captions_—val2017.json& \text{3.69 MB}\\[2ex] instances_—train2017.json& \text{448 MB}\\[2ex] instances_—val2017.json& \text{19.0 MB}\\[2ex] person_—keypoints_—train2017.json& \text{227 MB}\\[2ex] person_—keypoints_—val2017.json& \text{9.55 MB}\\[2ex] \end{cases}$

其中，与关键点标签有关的文件为 person_keypoints_train2017.json 和 person_keypoints_val2017.json 。这里给出较小的文件 person_keypoints_val2017.json 的分享链接：

person_keypoints_val2017.json

链接：https://wwte.lanzouu.com/iIcN72trb4ba

密码：2w23

同时给出软件 Labelme.exe 的官方版本下载链接，https://github.com/wkentaro/labelme/releases

本文用的是5.2.1版本的Labelme ，软件大小57.7 MB，给出分享链接：

Labelme.exe

链接：https://wwte.lanzouu.com/iSqCt2trpejc

密码：byw7

安装所需库环境

本文所用为 python 3.8 环境，在该环境中设置镜像源为清华镜像：

set PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple

安装所需库：

pip install lxml==4.6.3
pip install numpy==1.20.3
pip install pillow==9.5.0

COCO 数据集筛选人体关键点图像和标签文件

由于COCO数据集中包含除人体关键点图像外的其它目标图像，需要根据标注的标签对图像进行精确的筛选，以验证集 val2017 为例子进行筛选，验证集对应的 json 标签为 person_keypoints_val2017.json，该标签中对于存在人的图像进行标注，不存在的未标注，因此可以根据是否存在标注信息来筛选，关键点标注信息在参数"keypoints“中，共包括17个人体关键点，关键点标注坐标信息在参数”annotations“中。标注的关键点数量保存在标注信息中的”num_keypoints"参数中，该参数为0的时图像中一定没有关键点（不代表图像中没有人，COCO数据集对远距离的人只标注目标方框标签，不标注关键点）。此外，每幅图像有唯一与之对应的id号，通过id号可以准确定位匹配相应的图像和标注信息。
注意：若图像中只有人目标方框标签，无任何关键点，该类图像不作为人体关键点数据集，不进行筛选。

json 标签文件筛选

该程序自动生成json关键点标签文件和自动拷贝保存包含关键点的图像，程序运行时需要修改输入的 person_keypoints_val2017.json 文件路径（74行）、验证集 val2017 图像路径（77行）、输出的 json 文件夹（82行）和拷贝图像路径文件夹（86行）可自行起名。

from json import dumps
import os
import json
import base64
import shutil
import numpy as np
from PIL import Image
import io

def create_json_file(img_name, img_info, keypoint_list, img_ID, data_annotations, output_keypoint_json_path, output_image_path):
    # json字典定义
    json_dict = {"version": "5.2.1", "flags": {}, "shapes": [], "imagePath": "", "imageData": None, "imageHeight": 0,"imageWidth": 0}

    json_dict["imageHeight"], json_dict["imageWidth"] = img_info['height'], img_info['width']
    json_dict["imagePath"] = img_name

    # （如果图像中只有人方框信息，无关键点信息不进行筛选）
    exist_keypoints = False
    for ann in data_annotations:
        if ann['image_id'] == img_ID:
            if ann['num_keypoints'] > 0:  # 关键点不为0
                exist_keypoints = True
                break


    # 遍历标注信息
    for ann in data_annotations:
        # 图像id号相同 且 关键点个数大于0
        if ann['image_id'] == img_ID and exist_keypoints == True:
            shapes_dict = {"label": "person", "points": [], "group_id": None, "description": "","shape_type": "rectangle", "flags": {}}
            box = ann['bbox']
            # 方框坐标 [x0, y0, x1, y1]
            boxes = [box[0], box[1]], [box[0] + box[2], box[1] + box[3]]
            boxes = np.array(boxes)
            shapes_dict["points"] = boxes.tolist()
            json_dict["shapes"].append(shapes_dict)

            if ann['num_keypoints'] > 0:
                # 关键点信息
                keypoint = ann['keypoints']
                keypoint = np.array(keypoint).reshape((17, 3))
                for i, (x, y, visibility) in enumerate(keypoint):
                    if visibility != 0:  # 关键点非不可见的（2可见，1遮挡住了）

                        # shape字典
                        shapes_dict_point = {"label": "", "points": [], "group_id": None, "description": "","shape_type": "point", "flags": {}}
                        point_list = np.array([[x, y]]).tolist()
                        shapes_dict_point["label"] = keypoint_list[i]
                        shapes_dict_point["points"] = point_list
                        json_dict["shapes"].append(shapes_dict_point)



    image_path = os.path.join(image_dir, img_name)
    json_name = os.path.splitext(os.path.basename(image_path))[0]
    with Image.open(image_path) as img:
        buf = io.BytesIO()
        img.save(buf, format='JPEG',optimize=True)   # 将图像保存为 JPEG 格式并写入到 BytesIO 缓冲区中
        base64_data = base64.b64encode(buf.getvalue()).decode()

    json_dict["imageData"] = base64_data  # 图像的像素数据

    if json_dict['shapes'] is not None and len(json_dict['shapes']) > 0:
        # 拷贝保存图片到新的路径
        shutil.copy(image_path, output_image_path)

        json_fp = open(os.path.join(output_keypoint_json_path, json_name + '.json'), 'w')  # 保存
        json_str = dumps(json_dict, indent=2)

        json_fp.write(json_str)
        json_fp.close()
 

json_keypoint_file = 'person_keypoints_val2017.json'   # json文件
data = json.load(open(json_keypoint_file, 'r'))        # 加载json文件

image_dir = 'val2017/'    # COCO验证集图片路径
# 文件下的图像列表
image_files = [f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png', '.jpeg'))]

# 输出json文件夹路径
output_keypoint_json_path = "person_keypoint_json"
os.makedirs(output_keypoint_json_path, exist_ok=True)

# 输出对应的图像文件夹路径
output_image_path = "person_keypoint_image_dataset"
os.makedirs(output_image_path, exist_ok=True)

# 遍历图像
for img_file in image_files:
    # 遍历json文件
    for i, img_info in enumerate(data['images']):
        # 图像名称与json文件中的'file_name'相同
        if img_info['file_name'] == img_file:
            # 图像的高度和宽度
            height, width = img_info['height'], img_info['width']
            # 关键点的对应列表
            keypoint_list = data['categories'][0]['keypoints']
            # 该图像的id号（唯一）
            img_ID = img_info['id']
            print(img_file)
            # 写入json文件中
            create_json_file(img_file, img_info, keypoint_list, img_ID, data['annotations'], output_keypoint_json_path, output_image_path)
            # 写入json文件后，结束for循环，继续下次循环遍历图片，查找对应的json标签
            break

总共筛选验证集2346张图像及其 json 标签，对生成的 json 文件可用 Labelme 软件进行可视化，如图所示为000000000785.jpg图像的关键点可视化。

在这里插入图片描述

同时将训练集 train2017 的图像用于关键点数据集的筛选，总共筛选时间约5~7小时，总共筛选训练集56599张图像及其 json 标签，如下图为训练集筛选的图像。

在这里插入图片描述

txt 标签文件筛选

该程序自动生成 txt 关键点标签文件和自动拷贝保存包含关键点的图像，程序运行时需要修改输入的 person_keypoints_val2017.json 文件路径（82行）、验证集 val2017 图像路径（85行）和拷贝图像路径文件夹（90行）可自行起名。

import os
import json
import shutil
import numpy as np

# 坐标归一化，返回中心点坐标和宽高
def coordinates2yolo(xmin, ymin, xmax, ymax, img_w, img_h):
    x = abs(xmin + xmax) / (2.0 * img_w)
    y = abs(ymin + ymax) / (2.0 * img_h)
    w = abs(xmax - xmin) / (1.0 * img_w)
    h = abs(ymax - ymin) / (1.0 * img_h)
    return x, y, w, h

def create_keypoint_txt(img_name, img_info, img_ID, data_annotations, output_image_path):
    # 输出txt标签文件路径
    output_txt_path = "keypoint_txt"
    os.makedirs(output_txt_path, exist_ok=True)

    image_path = os.path.join(image_dir, img_name)
    txt_name = os.path.splitext(os.path.basename(img_name))[0]

    file = open(os.path.join(output_txt_path, txt_name + '.txt'), mode='w')

    # （如果图像中只有人方框信息，无关键点信息就不要）
    exist_keypoints = False
    for ann in data_annotations:
        if ann['image_id'] == img_ID:
            if ann['num_keypoints'] > 0:  # 关键点不为0
                exist_keypoints = True
                break

    # 遍历标注信息
    for ann in data_annotations:
        # id号相同 且 关键点个数大于0
        if ann['image_id'] == img_ID and exist_keypoints == True:
            box = ann['bbox']

            file.write(str(0))
            file.write(" ")
            xmin, ymin, xmax, ymax = box[0], box[1], box[0] + box[2], box[1] + box[3]

            x, y, w, h = coordinates2yolo(xmin, ymin, xmax, ymax, img_info['width'], img_info['height'])
            file.write(str(round(x, 6)))
            file.write(" ")
            file.write(str(round(y, 6)))
            file.write(" ")
            file.write(str(round(w, 6)))
            file.write(" ")
            file.write(str(round(h, 6)))
            file.write(" ")

            if ann['num_keypoints'] > 0:
                keypoint = ann['keypoints']
                # 关键点重设17✖3
                keypoint = np.array(keypoint).reshape((17, 3))
                for i, (x, y, visibility) in enumerate(keypoint):
                    if visibility != 0:  # 关键点非不可见的（2可见，1遮挡住）
                        file.write(str(round(x / img_info['width'], 6)))
                        file.write(" ")
                        file.write(str(round(y / img_info['height'], 6)))
                        file.write(" ")
                        file.write(str(int(visibility))+'.000000')
                        file.write(" ")
                    else:
                        file.write('0.000000')
                        file.write(" ")
                        file.write('0.000000')
                        file.write(" ")
                        file.write('0.000000')
                        file.write(" ")

            file.write('\n')
    file.close()

    if os.path.getsize(os.path.join(output_txt_path, txt_name + '.txt')) == 0:
        os.remove(os.path.join(output_txt_path, txt_name + '.txt'))
    else:
        # 拷贝保存图片到新的路径
        shutil.copy(image_path, output_image_path)


json_keypoint_file = 'person_keypoints_val2017.json'   # json文件
data = json.load(open(json_keypoint_file, 'r'))        # 加载json文件

image_dir = 'val2017/'    # COCO验证集图片路径
# 文件下的图像列表
image_files = [f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png', '.jpeg'))]

# 输出对应的图像文件夹路径
output_image_path = "person_keypoint_image_dataset"
os.makedirs(output_image_path, exist_ok=True)

# 遍历图像
for img_file in image_files:
    # 遍历json文件
    for i, img_info in enumerate(data['images']):
        # 图像名称与json文件中的'file_name'相同
        if img_info['file_name'] == img_file:
            # 图像的高度和宽度
            height, width = img_info['height'], img_info['width']
            # 关键点的对应列表
            keypoint_list = data['categories'][0]['keypoints']
            # 该图像的id号（唯一）
            img_ID = img_info['id']
            print(img_file)
            # 写入txt标签文件中
            create_keypoint_txt(img_file, img_info, img_ID, data['annotations'], output_image_path)
            # 写入json文件后，结束for循环，继续下次循环遍历图片，查找对应的json标签
            break

生成的 txt 标签文件中的每个关键点包含有三个值，前两个为与坐标有关的值，最后一个表示是否可见，2为可见，1为遮挡住，0不可见。如下所示为000000000036.jpg图像及其生成的 txt 文件，前五个参数分别是目标类别（0）和对应人的方框坐标。

在这里插入图片描述

0 0.671279 0.617945 0.645759 0.726859 0.519751 0.38125 2.000000 0.550936 0.348438 2.000000 0.488565 0.367188 2.000000 0.642412 0.354688 2.000000 0.488565 0.395312 2.000000 0.738046 0.526562 2.000000 0.446985 0.534375 2.000000 0.846154 0.771875 2.000000 0.442827 0.8125 2.000000 0.925156 0.964062 2.000000 0.507277 0.698438 2.000000 0.702703 0.942188 2.000000 0.555094 0.95 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

COCO 数据集筛选行人目标检测数据集

这里给出从 COCO 数据集中筛选行人数据集并自动转换为xml和txt文件的程序，程序运行时需要修改输入的 person_keypoints_val2017.json 文件路径（124行）、验证集 val2017 图像路径（127行）、输出 xml 标签文件夹路径（132行）和拷贝图像路径文件夹（136行）可自行起名。

import json
import os
import shutil
import numpy as np
from xml.etree import ElementTree as ET

def create_tree(image_path, imgdir, h, w):
    image_name = os.path.basename(image_path)
    global annotation
    annotation = ET.Element('annotation')
    folder = ET.SubElement(annotation,'folder')
    folder.text = imgdir 
    filename=ET.SubElement(annotation,'filename')
    filename.text=image_name
    path=ET.SubElement(annotation,'path')
    path.text=  '{}'.format(image_path)
    source=ET.SubElement(annotation,'source')
    database=ET.SubElement(source,'database')
    database.text='Unknown'
    size=ET.SubElement(annotation,'size')
    width=ET.SubElement(size,'width')
    width.text= str(w)
    height=ET.SubElement(size,'height')
    height.text= str(h)
    depth = ET.SubElement(size,'depth')
    depth.text = '3'
    segmented = ET.SubElement(annotation,'segmented')
    segmented.text = '0'
    return (annotation)



def create_object(root, xi, yi, xa, ya, obj_name): 
    _object=ET.SubElement(root,'object')
    name=ET.SubElement(_object,'name')
    name.text= str(obj_name)
    pose=ET.SubElement(_object,'pose')
    pose.text='Unspecified'
    truncated=ET.SubElement(_object,'truncated')
    truncated.text='0'
    difficult=ET.SubElement(_object,'difficult')
    difficult.text='0'
    bndbox=ET.SubElement(_object,'bndbox')
    xmin=ET.SubElement(bndbox, 'xmin')
    xmin.text='%s'%xi
    ymin = ET.SubElement(bndbox, 'ymin')
    ymin.text = '%s'%yi
    xmax = ET.SubElement(bndbox, 'xmax')
    xmax.text = '%s'%xa
    ymax = ET.SubElement(bndbox, 'ymax')
    ymax.text = '%s'%ya

def create_xml_file(img_name, object_name, img_ID, data_annotations, output_xml_path, output_image_path):
    # (判断图像是否存在行人)
    exist_person = False
    for ann in data_annotations:
        if ann['image_id'] == img_ID:
            exist_person = True
            annotation = create_tree(os.path.join(image_dir, img_name), os.path.dirname(image_dir), height, width)
            break

    # 遍历标注信息
    for ann in data_annotations:
        # 对比id号,id号相同
        if ann['image_id'] == img_ID:
            box = ann['bbox']
            # [x0,y0,x1,y1]
            boxes = [int(box[0]), int(box[1]), int(box[0] + box[2]), int(box[1] + box[3])]
            create_object(annotation, boxes[0], boxes[1], boxes[2], boxes[3], object_name)

    if exist_person == True:
        # 将树模型写入xml文件
        tree = ET.ElementTree(annotation)
        tree.write(os.path.join(output_xml_path,'%s.xml' % img_name.strip('.jpg')))
        shutil.copy(os.path.join(image_dir, img_name), output_image_path)

def coordinates2yolo(xmin, ymin, xmax, ymax, img_w, img_h):
    x = abs(xmin + xmax) / (2.0 * img_w)
    y = abs(ymin + ymax) / (2.0 * img_h)
    w = abs(xmax - xmin) / (1.0 * img_w)
    h = abs(ymax - ymin) / (1.0 * img_h)
    return x, y, w, h

def create_object_txt(img_name, img_info, img_ID, data_annotations, output_image_path):
    # 输出txt标签文件路径
    output_txt_path = "object_detection_txt"
    os.makedirs(output_txt_path, exist_ok=True)

    image_path = os.path.join(image_dir, img_name)
    txt_name = os.path.splitext(os.path.basename(img_name))[0]

    # (判断图像是否存在行人)
    exist_person = False
    for ann in data_annotations:
        if ann['image_id'] == img_ID:
            exist_person = True
            break

    if exist_person:
        file = open(os.path.join(output_txt_path, txt_name + '.txt'), mode='w')
        # 拷贝保存图片到新的路径
        shutil.copy(image_path, output_image_path)

    # 遍历标注信息
    for ann in data_annotations:
        if ann['image_id'] == img_ID:
            box = ann['bbox']  # 人方框

            file.write(str(0))  # 类别 0
            file.write(" ")
            xmin, ymin, xmax, ymax = box[0], box[1], box[0] + box[2], box[1] + box[3]

            x, y, w, h = coordinates2yolo(xmin, ymin, xmax, ymax, img_info['width'], img_info['height'])
            file.write(str(round(x, 6)))
            file.write(" ")
            file.write(str(round(y, 6)))
            file.write(" ")
            file.write(str(round(w, 6)))
            file.write(" ")
            file.write(str(round(h, 6)))
            file.write(" ")
            file.write('\n')

json_keypoint_file = 'person_keypoints_val2017.json'   # json文件
data = json.load(open(json_keypoint_file, 'r'))        # 加载json文件

image_dir = 'val2017/'    # COCO验证集图片路径
# 文件下的图像列表
image_files = [f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png', '.jpeg'))]

# 输出单个xml文件夹路径
output_xml_path = "person_detection_xml"
os.makedirs(output_xml_path, exist_ok=True)

# 输出对应的图像文件夹路径
output_image_path = "person_object_image_dataset"
os.makedirs(output_image_path, exist_ok=True)

# 遍历图像
for img_file in image_files:
    for i, img_info in enumerate(data['images']):
        if img_info['file_name'] == img_file:
            # 图像的高度和宽度
            height, width = img_info['height'], img_info['width']
            object_name = data['categories'][0]['name']
            # id号
            img_ID = img_info['id']
            print(img_file)

            # 写入xml文件中
            create_xml_file(img_file, object_name, img_ID, data['annotations'], output_xml_path, output_image_path)

            # 写入txt标签文件中
            create_object_txt(img_file, img_info, img_ID, data['annotations'], output_image_path)

            # 写入json文件后，结束for循环，继续下次for循环图片对应json查找
            break

对COCO训练集和验证集分别筛选，训练集共筛选64115张图像（耗时4~5小时），验证集共筛选2693张图像。筛选后生成的 xml 文件可用 labelImg.exe 软件可视化，官方下载链接：https://github.com/tzutalin/labelImg/files/2638199/windows_v1.8.1.zip，这里给出分享链接：