python 使用numpy计算混淆矩阵

mynaiskey

3214人浏览 · 2022-04-18 19:13:37

mynaiskey · 2022-04-18 19:13:37 发布

python 使用numpy计算混淆矩阵

混淆矩阵（Confusion Matrix）中每行表示真实的类别，每列表示预测的类别。

假如一个模型要预测的类别有三个，分别为A、B、C，使用模型预测测试集得到以下结果:

在这里插入图片描述

我们一列一列来看，先看第一列：30、15、5

这里我们的测试集有且只有三个分类A、B、C；也就是真实分类A、B、C就对应着测试集的总体，对于一个样本的预测也只可能是这三者之一。

模型预测值为A的，实际标签不一定就是A，但它一定是A、B、C三者之一，这里预测为A的前提下：真实值为A的有30个、真实值为B的有15个、真实值为C的有5个。

上述表格用numpy表示如下

import numpy as np
# 混淆矩阵
c_matrix = np.array([[30,  7,   3],
                    [ 15, 22,  3],
                    [ 5,   1,  14]])
print(c_matrix.shape)  # (3,3)
print(c_matrix[0][1])  # 7

也可对混淆矩阵进行标准化，使其值在0到1之间

# 混淆矩阵标准化（这里使用L1规范化，是对每一行来说规范化)
print(c_matrix.sum(axis=1))
print(c_matrix.sum(axis=1)[:, np.newaxis])  
c_matrix = c_matrix / c_matrix.sum(axis=1)[:, np.newaxis]
print(c_matrix)

[40 40 20]
[[40]
 [40]
 [20]]
[[0.75  0.175 0.075]
 [0.375 0.55  0.075]
 [0.25  0.05  0.7  ]]

在测试模型时计算

c_matrix = np.zeros( (len(class_names), len(class_names)) )  # 混淆矩阵
for images, labels in test_ds.take(total_batch):
        labels = labels.numpy()
        predictions = model.predict(images)
        score = tf.nn.softmax(predictions)
        for index, elem in enumerate(score):
            col, row = np.argmax(elem), labels[index]
            c_matrix[row][col] += 1
c_matrix = c_matrix / c_matrix.sum(axis=1)[:, np.newaxis]

当然tensorflow中也有直接计算混淆矩阵的函数

import tensorflow as tf

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 1, 1, 0, 2, 1]

c_matrix = tf.math.confusion_matrix(labels=y_true, predictions=y_pred, num_classes=3)
print(c_matrix)

DAMO开发者矩阵

DAMO开发者矩阵，由阿里巴巴达摩院和中国互联网协会联合发起，致力于探讨最前沿的技术趋势与应用成果，搭建高质量的交流与分享平台，推动技术创新与产业应用链接，围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐

LangGraph 状态快照与回滚：Agent 跑飞时的“时光机”恢复方案

在 LangGraph 构建的复杂 Agent 系统（如代码助手、企业级客服机器人、多模态内容创作平台、AI 研究助手）中，“Agent跑飞”（Agent Drift/Agent Hallucination Loop/Agent Infinite Loop）无限循环（Infinite Hallucination Loop）：Agent 反复调用相同/相似的工具却无法收敛到预期结果，或陷入自我修正的