深度学习经典模型之VGGNet

铭瑾熙

1013人浏览 · 2024-11-08 02:00:36

铭瑾熙 · 2024-11-08 02:00:36 发布

1 VGGNet

1.1 模型介绍

VGGNet是由牛津大学视觉几何小组（Visual Geometry Group, VGG）提出的一种深层卷积网络结构，他们以7.32%的错误率赢得了2014年ILSVRC分类任务的亚军（冠军由GoogLeNet以6.65%的错误率夺得）和25.32%的错误率夺得定位任务（Localization）的第一名（GoogLeNet错误率为26.44%） $^{[5]}$ ，网络名称VGGNet取自该小组名缩写。VGGNet是首批把图像分类的错误率降低到10%以内模型，同时该网络所采用的 $3×33\times3$ 卷积核的思想是后来许多模型的基础，该模型发表在2015年国际学习表征会议（International Conference On Learning Representations, ICLR）后至今被引用的次数已经超过1万4千余次。

1.2 模型结构

在这里插入图片描述

图 1 VGG16网络结构图

在原论文中的VGGNet包含了6个版本的演进，分别对应VGG11、VGG11-LRN、VGG13、VGG16-1、VGG16-3和VGG19，不同的后缀数值表示不同的网络层数（VGG11-LRN表示在第一层中采用了LRN的VGG11，VGG16-1表示后三组卷积块中最后一层卷积采用卷积核尺寸为 $1×11\times1$ ，相应的VGG16-3表示卷积核尺寸为 $3×33\times3$ ），本节介绍的VGG16为VGG16-3。图1中的VGG16体现了VGGNet的核心思路，使用 $3×33\times3$ 的卷积组合代替大尺寸的卷积（2个 $3×3卷积即可与3\times3卷积即可与$ $5×55\times5$ 卷积拥有相同的感受视野），网络参数设置如表2所示。

表2 VGG16网络参数配置

网络层	输入尺寸	核尺寸	输出尺寸	参数个数
卷积层 $C_{11}$	$224×224×3224\times224\times3$	$3×3×64/13\times3\times64/1$	$224×224×64224\times224\times64$	$(3×3×3+1)×64(3\times3\times3+1)\times64$
卷积层 $C_{12}$	$224×224×64224\times224\times64$	$3×3×64/13\times3\times64/1$	$224×224×64224\times224\times64$	$(3×3×64+1)×64(3\times3\times64+1)\times64$
下采样层 $S_{max1}$	$224×224×64224\times224\times64$	$2×2/22\times2/2$	$112×112×64112\times112\times64$	$0$
卷积层 $C_{21}$	$112×112×64112\times112\times64$	$3×3×128/13\times3\times128/1$	$112×112×128112\times112\times128$	$(3×3×64+1)×128(3\times3\times64+1)\times128$
卷积层 $C_{22}$	$112×112×128112\times112\times128$	$3×3×128/13\times3\times128/1$	$112×112×128112\times112\times128$	$(3×3×128+1)×128(3\times3\times128+1)\times128$
下采样层 $S_{max2}$	$112×112×128112\times112\times128$	$2×2/22\times2/2$	$56×56×12856\times56\times128$	$0$
卷积层 $C_{31}$	$56×56×12856\times56\times128$	$3×3×256/13\times3\times256/1$	$56×56×25656\times56\times256$	$(3×3×128+1)×256(3\times3\times128+1)\times256$
卷积层 $C_{32}$	$56×56×25656\times56\times256$	$3×3×256/13\times3\times256/1$	$56×56×25656\times56\times256$	$(3×3×256+1)×256(3\times3\times256+1)\times256$
卷积层 $C_{33}$	$56×56×25656\times56\times256$	$3×3×256/13\times3\times256/1$	$56×56×25656\times56\times256$	$(3×3×256+1)×256(3\times3\times256+1)\times256$
下采样层 $S_{max3}$	$56×56×25656\times56\times256$	$2×2/22\times2/2$	$28×28×25628\times28\times256$	$0$
卷积层 $C_{41}$	$28×28×25628\times28\times256$	$3×3×512/13\times3\times512/1$	$28×28×51228\times28\times512$	$(3×3×256+1)×512(3\times3\times256+1)\times512$
卷积层 $C_{42}$	$28×28×51228\times28\times512$	$3×3×512/13\times3\times512/1$	$28×28×51228\times28\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
卷积层 $C_{43}$	$28×28×51228\times28\times512$	$3×3×512/13\times3\times512/1$	$28×28×51228\times28\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
下采样层 $S_{max4}$	$28×28×51228\times28\times512$	$2×2/22\times2/2$	$14×14×51214\times14\times512$	$0$
卷积层 $C_{51}$	$14×14×51214\times14\times512$	$3×3×512/13\times3\times512/1$	$14×14×51214\times14\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
卷积层 $C_{52}$	$14×14×51214\times14\times512$	$3×3×512/13\times3\times512/1$	$14×14×51214\times14\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
卷积层 $C_{53}$	$14×14×51214\times14\times512$	$3×3×512/13\times3\times512/1$	$14×14×51214\times14\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
下采样层 $S_{max5}$	$14×14×51214\times14\times512$	$2×2/22\times2/2$	$7×7×5127\times7\times512$	$0$
全连接层 $FC_{1}$	$7×7×5127\times7\times512$	$(7×7×512)×4096(7\times7\times512)\times4096$	$1×40961\times4096$	$(7×7×512+1)×4096(7\times7\times512+1)\times4096$
全连接层 $FC_{2}$	$1×40961\times4096$	$4096×40964096\times4096$	$1×40961\times4096$	$(4096+1)×4096(4096+1)\times4096$
全连接层 $FC_{3}$	$1×40961\times4096$	$4096×10004096\times1000$	$1×10001\times1000$	$(4096+1)×1000(4096+1)\times1000$

1.3 模型特性

整个网络都使用了同样大小的卷积核尺寸 $3×33\times3$ 和最大池化尺寸 $2×22\times2$ 。
$1×11\times1$ 卷积的意义主要在于线性变换，而输入通道数和输出通道数不变，没有发生降维。
两个 $3×33\times3$ 的卷积层串联相当于1个 $5×55\times5$ 的卷积层，感受野大小为 $5×55\times5$ 。同样地，3个 $3×33\times3$ 的卷积层串联的效果则相当于1个 $7×77\times7$ 的卷积层。这样的连接方式使得网络参数量更小，而且多层的激活函数令网络对特征的学习能力更强。
VGGNet在训练时有一个小技巧，先训练浅层的的简单网络VGG11，再复用VGG11的权重来初始化VGG13，如此反复训练并初始化VGG19，能够使训练时收敛的速度更快。
在训练过程中使用多尺度的变换对原始数据做数据增强，使得模型不易过拟合。

DAMO开发者矩阵

DAMO开发者矩阵，由阿里巴巴达摩院和中国互联网协会联合发起，致力于探讨最前沿的技术趋势与应用成果，搭建高质量的交流与分享平台，推动技术创新与产业应用链接，围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐

ROS2 Jazzy 自定义 Action 实现小车导航（实时进度反馈）

DAMO开发者矩阵

扫地机器人拖布套刚需市场缺口填补：智能模板机量产解决方案科普

DAMO开发者矩阵

movebase改进

3.需要辅助定位功能来获取准确的定位数据(二维码，反光板，uwb ,gps等) 上一个视频我把反光板定则解决了，这个视频把运动控制解决了。movebase 增加pid位置精度调整，在一移动机器人到达位置后会再次调整位置，提高定位精度。开源地址https://gitee.com/yongwangzhiqiankai/move_base-kaiyuan.git 请给个start。也就是: 1.要知道