Graph_convolution_深度学习基础_利用mnist数据集实现mlp/cnn和自编码器

深度学习基础，利用mnist数据实现前反馈神经网络、卷积神经网络和自编码神经网络

斯外戈的小白

944人浏览 · 2022-10-14 10:27:18

斯外戈的小白 · 2022-10-14 10:27:18 发布

提示：笔记来源来自网课，记录相关内容仅为后续复习方便，不是抄袭，不是抄袭！本节课程的代码记录于google网页的colaboratory中，命名方式是：graph_convolution_lesson20。

文章目录

1.加载数据
2.前馈神经网络
3. 卷积神经网络 (CNN)
4. 自编码器 AutoEncoder

1.加载数据

我们以MNIST数据集为例展示PyTorch中如何加载数据。MNIST数据集包含了10种不同手写的数字，即0到9。每张图片的标签也就是0到9中的某个数字。在这个数据集上，我们的任务预测图片对应哪个数字，所以这是一个分类问题。

torchvision中收集了多种数据集，我们可以直接调用这个工具包来下载我们需要的数据。

import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import torch.nn as nn
import torch
from torch.nn import functional as F


#下载mnist数据集中的训练集，并保存到data文件夹
data_train = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())
# 加载MNIST数据集中的测试集
data_test = datasets.MNIST('data', train=False, download=True, transform=transforms.ToTensor())

请添加图片描述

#取出mnist中的第一张图
imag, label = data_train[0]
print(imag.size())
#torch.Size([1, 28, 28])

#可视化10个数据
for i in range(10):
  imag, label = data_train[i]
  imag = imag.numpy().squeeze() #将数据形状[1, 28, 28]变为[28, 28]
  plt.subplot(1, 10, i+1)
  plt.imshow(imag)

请添加图片描述

2.前馈神经网络

2.1. 搭建神经网络

请添加图片描述

2.1.1 用矩阵运算搭建神经网络层

因为神经网络里的运算其实就是在对矩阵进行操作，所以我们可以直接把网络用矩阵运算来表示。例如，给定输入x，一个线性层里的操作可以表示为: x’ = W x + b，其中W和b是这个线性层的参数。遵循面向对象编程（Object Oriented Programming，OOP）的原则，我们定义一个前馈神经网络的类，把我们需要的功能封装在其中。我们可以按照如下的逻辑来设计一个两层前馈神经网络：

class Linear(nn.Module): # 继承torch.nn.Module
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__() # super()是表示调用父类的函数，这行代码相当于nn.Module.__init__(self)
        self.weight = nn.Parameter(torch.randn(in_features, out_features)) 
        self.bias = nn.Parameter(torch.randn(out_features))
    
    def forward(self, x): # x是模型输入
        x = x.mm(self.weight) 
        return x + self.bias

一些注意事项：
1、定义的Linear层需要继承nn.Module(torch.nn.Module)这个类。继承这个类之后，Linear这个类就能使用nn.Module类里定义好的函数。比如eval()、to()。同时继承之后需要调用父类的初始化函数，即super(Linear, self).init()；
2、这里我们定义了两种参数，即self.weight和self.bias；
3、我们在定义parameter的时候，使用了torch.randn()，相当于对参数进行了初始化；
4、forward函数表示网络里的前向过程，得到输出。由于该类继承了nn.Module，当我们实例化该类后，如layer = Linear(…)，layer()就等于layer.forward()；
5、注意这里我们没有使用Softmax层，因为后面我们使用损失函数是torch.nn.CrossEntropyLoss，它里面已经包含了Softmax操作。

class MLP(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super(MLP, self).__init__()
        self.in_features = in_features
        self.layer1 = Linear(in_features, hidden_features)  # 此处的Linear()是我们前面自定义的Linear类
        self.layer2 = Linear(hidden_features, out_features)
        
    def forward(self,x):
        x = x.view(-1, self.in_features) # 将每张图片转换成向量
        x = self.layer1(x)
        x = F.relu(x) # 非线性激活层，ReLU函数
        return self.layer2(x)

一些注意事项：
1、同样的，MLP这个类也需要继承nn.Module；
2、层与层之间我们需要一个非线性激活层，这里我们使用ReLU;
3、不是所有的MLP都需要这一步x = x.view(-1, self.in_features)，这里只是因为我们的输入数据是二维图片，我们需要将其变为向量。

2.1.2 用PyTorch里预定义的神经网络层

class MLP(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super(MLP, self).__init__()
        self.in_features = in_features
        self.layer1 = nn.Linear(in_features, hidden_features) # 此处的nn.Linear是PyTorch里定义的线性层
        self.layer2 = nn.Linear(hidden_features, out_features)
        
    def forward(self, x):
        x = x.view(-1, self.in_features) # 将每张图片转换成向量
        x = self.layer1(x)
        x = F.relu(x) # 非线性激活层，ReLU函数
        return self.layer2(x)

这个MLP同前面我们的手动定义的MLP的不同之处就在于，self.layer1和self.layer2是直接调用的PyTorch中定义好的nn.Linear。torch.nn中还定义了许多网络层，如卷积层、池化层等等。详情可见该链接。

2.1.3 神经网络的前向过程 Foward propogation

# 这里我们继续以第一张图片为例来演示神经网络的前向过程
# 我们首先实例化一个MLP，
img, label = data_train[0]
img = img.view(-1)  # view(-1)表示把矩阵变成一维向量: (1,28,28) --> (784,)
feat_dim = len(img) # 计算输入的特征的维度（图片的维度）
num_classes = 10
model = MLP(in_features=feat_dim, hidden_features=256, out_features=num_classes)
print(model)

请添加图片描述

# 我们定义一个device变量，用于决定我们是否要把数据和模型放在GPU上运行
# 如果没有安装cuda的话，则使用cpu
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)
#'cuda'

将模型和数据放在device上（放在cpu或者gpu上）

model = model.to(device) 
img = img.to(device)

output = model(img)
print(output)

请添加图片描述

模型输出的是一个长度为10的向量，我们可以认为每个元素代表对应类别的概率（这里是未进行归一化的概率）

# 我们取出其中最大值对应的类别，并和真实类别就行比较
predicted = output.argmax()
print('预测标签:', predicted.item(), '; 实际标签:', label)
#
# 我们取出其中最大值对应的类别，并和真实类别就行比较
predicted = output.argmax()
print('预测标签:', predicted.item(), '; 实际标签:', label)
#预测标签: 1 ; 实际标签: 5

2.2 训练神经网络

# 我们定义一个train函数来封装我们的训练过程
def train(model, data, num_epochs=5, learning_rate=1e-3, batch_size=32):
    # 定义一个优化器，Adam优化器是梯度下降法的一个变种
    optimizer = torch.optim.Adam(model.parameters(),
                                 lr=learning_rate, 
                                 weight_decay=1e-5) # weight_decay表示L2正则项
    
    # 把训练数据封装到DataLoader，这样便于我们以及选取batch以及打乱数据顺序
    train_loader = torch.utils.data.DataLoader(data, 
                                               batch_size=batch_size, 
                                               shuffle=True)
    # 定义损失函数
    criterion = nn.CrossEntropyLoss()
    
    for epoch in range(num_epochs):
        loss_total = 0 # 定义一个loss_total变量来记录我们的loss变化
        for data in train_loader:
            # 梯度清零
            optimizer.zero_grad()
            
            img, label = data
            img = img.to(device)
            label = label.to(device)
            
            # 前向传播和反向传播 
            output = model(img)
            loss = criterion(output, label)
            loss.backward()
            
            # 优化参数
            optimizer.step()
            
        loss_total += loss.item() 
        print('Epoch: {}, Training Loss: {:.4f}'.format(epoch+1, loss_total))

train(model, data_train)

请添加图片描述
之前经常有报错：我们发现两个数据的device不同，建议同时放在cpu和cuda上。今天找到了原因：model和数据都需要.to(device)

测试函数

torch.no_grad() # 由于测试的时候不需要求导，可以暂时关闭autograd，提高速度，节约内存
def test(model, data, batch_size=128):
    num_correct = 0 # 预测正确的图片数
    num_total = 0 # 总共的图片数
    
    test_loader = torch.utils.data.DataLoader(data, 
                                              batch_size=batch_size, 
                                              shuffle=False)
    for data in test_loader: # 按batch取出测试集中的数据
        img, label = data
        img = img.to(device)
        label = label.to(device)
        output = model(img)
        predicted = output.argmax(1)
        num_total += len(label)
        num_correct += (predicted == label).sum()

    print('共有{}张图片，准确率为: {:.2f}%'.format(num_correct, 100 * num_correct / num_total))

test(model, data_train)
test(model, data_test)

请添加图片描述

3. 卷积神经网络 (CNN)

请添加图片描述
CNN可以被拆分为两个块（block）：特征提取部分（feature extractor）和分类部分（classification）。特征提取部分由多个卷积层构成，而分类部分由全连接层，即前馈神经网络。因此我们在定义CNN的时候也可以按照这么两部分来定义。

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.features = nn.Sequential( 
            nn.Conv2d(1, 16, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 7)
        )
        self.classifier = nn.Sequential(
            nn.Linear(64, 120),
            nn.ReLU(),
            nn.Linear(120, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 64) # 铺平成向量，flattening 
        x = self.classifier(x)
        return x

cnn卷积神经网络使用过程中要注意shape的变化请添加图片描述

注意：
1、这里我们不再手动定义卷积层了，直接调用nn.Conv2d。对于nn.Conv2d(1, 16, 3, stride=2, padding=1)，其中1表示输入有1个通道（channel），16表示该层的输出有16个通道，stride=2表示卷积的平移步长是2，padding=1表示填充的幅度。

2、我们在定self.features的时候用到了nn.Sequential()，它是一个将多个网络层（和激活函数）结合起来的容器。用户可以通过nn.Sequential()来组合自己想搭建的神经网络。

model = CNN().to(device)

train(model, data_train) # 调用之前定义好的train函数

请添加图片描述

test(model, data_train)
test(model, data_test)

请添加图片描述

4. 自编码器 AutoEncoder

请添加图片描述
自编码器可以看作是一个试图从输出中重建输入的神经网络。它包含两个部分：编码器和解码器。编码器负责对输入进行编码，而解码器负责对编码后的信息进行重建。因此，作为一个示例，我们构建如下的由卷积神经网络构成的自编码器。

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        # 我们用前面CNN中使用的特征提取部分当作这里的编码器encoder
        self.encoder = nn.Sequential( 
            nn.Conv2d(1, 16, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 32, 7)
        )
        
        # 对于解码器decoder，我们需要使用nn.ConvTranspose2d
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(32, 32, 7),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

其中编码器由卷积层构成，解码器都由反卷积层构成。

autoencoder = Autoencoder().to(device) # device is either 'cpu' or 'cuda'

# 大体上跟前面train()函数一致，只是损失函数和输入有所不同
def train_autoencoder(model, data, num_epochs=10, learning_rate=1e-3, batch_size=32):
    # 定义一个优化器，Adam优化器是梯度下降法的一个变种
    optimizer = torch.optim.Adam(model.parameters(),
                                 lr=learning_rate, 
                                 weight_decay=1e-5) # weight_decay表示L2正则项
    
    # 把训练数据封装到DataLoader，这样便于我们以及选取batch以及打乱数据顺序
    train_loader = torch.utils.data.DataLoader(data, 
                                               batch_size=batch_size, 
                                               shuffle=True)
    # 定义损失函数，这里我们使用MSE loss来衡量输出和输入的差别
    criterion = nn.MSELoss()
    
    for epoch in range(num_epochs):
        loss_total = 0 # 定义一个loss_total变量来记录我们的loss变化
        for data in train_loader:
            # 梯度清零
            optimizer.zero_grad()
            
            img, _ = data
            img = img.to(device)
            
            # 前向传播和反向传播 
            output = model(img)
            loss = criterion(output, img)
            loss.backward()
            
            # 优化参数
            optimizer.step()
            
        loss_total += loss.item() 
        print('Epoch: {}, Training Loss: {:.4f}'.format(epoch+1, loss_total))

train_autoencoder(autoencoder, data_train)

请添加图片描述

# 观察生成的图片的质量
test_loader = torch.utils.data.DataLoader(data_test, 
                                          batch_size=8, 
                                          shuffle=False)


for i, data in enumerate(test_loader):
    img, _ = data
    autoencoder = autoencoder.to('cpu')
    img_new = autoencoder(img).detach().numpy()
    img = img.numpy()
    plt.figure(figsize=(8, 2))
    for j in range(8):
        plt.subplot(2, 8, j+1)
        plt.imshow(img_new[j].squeeze())
        plt.subplot(2, 8, 8+j+1)
        plt.imshow(img[j].squeeze())
    if i >= 2:
        break