90d09c621c2722e1445f00781b46e4b0.png

分割是一个计算机视觉很难的任务,主要是语义信息不容易获得。最近深度学习的发展推动了这个领域的突破,即语义分割(semantic segmentation)和实例分割(instance segmentation),还有最近流行的全景分割(Panoptic segmentation)。语义分割是预测每个像素点的语义类别;实例分割预测每个实例物体包含的像素区域;而全景分割是为每个像素点赋予类别和实例 ,生成全局分割。

从一篇2017年综述论文“A Review on Deep Learning Techniques Applied to Semantic Segmentation“总结一下语义分割的方法,如下是该综述总结的方法概括图:

3ac8fc7042bac71e82dd94e0fbe1ece4.png

可以说,深度学习在语义分割的工作开始于FCN(fully convolution network),即真正pixel-to-pixel的图像理解。实例分割实际是目标检测和语义分割的结合,其方法基本也是分成两个方向:基于目标检测和基于语义分割;前者以Mask-RCNN为标志,而后者以SGN(Sequential grouping networks)为典型。

最近提出全景分割的概念,下图比较三个分割的概念区别。可以看出,全景分割是真正意义上的景物理解,也是实例分割和语义分割的互补结果。

04ccc8277caa7bfd87151f8a56efb8f3.png

MIT和Google联合发表的论文中提出DeeperLab,可以看成一个这方面的典型方法,下面是其算法的流程图:由一个FCN产生逐像素的语义预测和实例预测,然后通过一个快速算法把这些预测融成在一起得到最后全景分割结果。

decbdc27f935f50d97ef401ce9cac3d4.png

在自动驾驶系统中,分割的价值在于,一是高清地图的定位,二是可驾驶道路的识别。

。。。。。。待续


"Estimating Drivable Collision-Free Space from Monocular Video"

2c358a2ec9097583f3e33d72d9c4e371.png

379333625abf9c152638ece0685cfb03.png

4be952b004d7efa678368a58abb288b6.png

"Minimizing Supervision for Free-space Segmentation"

首先,道路区域是比较特别的, 纹理简单,位置也居中。这里提出的方法有一个显著优点就是弱监督下自动分割能力。图示是该方法的概览:从一个预先训练的扩张残差网络(dilated ResNet)提取特征图,采用图像分割的超像素做特征图校准,利用道路位置先验知识进行聚类,最后产生训练一个分割网络SegNet的掩码,期间无需任何手工标注。

195f60f9bc11b39b3817bd004d59d6b0.png

c433bf315650af07a5eefb4a70766061.png
聚类

3a8f8450b614cbe718bb9a634e54b735.png

271e36d784f68f4b4737639b41bd15d2.png

"Modeling of Drivable Free Space with Fused Camera Data for Autonomous Driving"

16225452cbe3ee047134bd174778d3e5.png

97b8366b0de45924b930ad1b5058524e.png

Note: Digital Elevation Maps (DEM)

8a683d36926ab68d83aa710ab9fab1fa.png

"Perception Pipeline for Mapless Driving on an Ego-Lane Corridor"

6fe1aca627252addb6e957cfcab4a1fb.png

8be42d7109c43d1a78ba59c36af2cc1a.png

30654ca8bbf641597ff3591482320849.png

Classification of Point Cloud for Road Scene Understanding with Multiscale Voxel Deep Network

f836c50d21735810b14f600ef2482454.png
Application of classified point cloud for map-based localization of an autonomous vehicle

da1946146104737a2b8611d6ca28ce97.png
Multi-Scale Voxel Network architecture

f9b37cb5b2a4750657cdd9c7e86e8b0d.png
Classified point cloud example of Semantic3D test set

6bb41b531cc291c3ecc276875a43cd14.png
Example of classified point cloud on Paris-Lille-3D dataset.

"Real-Time Road Segmentation Using LiDAR Data Processing on an FPGA"

19155c72204beedd3e84d5f55663fbd7.png

ac99a937b3cda44b6492be207f0dcc0a.png

f10b646d66508a687f908115048e49fd.png

c5dc31bbc15696f429e7b95e02bde483.png


"Fusing Lidar and Semantic Image Information in Octree Maps"

b3ff6afcacace61cfa6bb65662035362.png
top level pipeline

351368778231eeb0403da64948e672bb.png
Projecting labels of the visual classifier and its uncertainty to the point cloud

84a7744dd8826a547c6d242eed7b3a92.png

43c3394463f89c0f5934a16f13f322c6.png

"Sensor Fusion for Semantic Segmentation of Urban Scenes"

9bf347f1da5d21e919717f8206e48640.png
Top-level pipeline

fe8765a7646e87271f66fed5231aaf14.png
(a) image

68c96f21765d60d53af6cca7f1ca39a0.png
(b) point cloud projected onto the image colored by distance

a4a32092f2a54441b0e907653dde6ea8.png
(c) point cloud, colored by height

dc72792335c5997680467d879d983951.png

fd98f738fbc383cb2c19435ef794bdff.png
(a) Ground truth; (b) point cloud unimodal classification; (c) point cloud unimodal semantic segmentation projected onto image; (d) image unimodal semantic segmentation;

396d5dee526806c7de0c9ca878e3da6d.png
(e) fused semantic segmentation; (f) the overall system.

"Fusion Based Holistic Road Scene Understanding"

f5d16a5e8c1a9e156ef15c54eca2e9fe.png
overview

17b54a5806bfea1299aa2f8b4737d0ed.png
CRF

bee621bf2fbdf34844c687e02f346efa.png
category potential

903a43f74ae23aad247d6d1d751358f6.png

"Road detection based on the fusion of Lidar and image data"

f1c2f03e9502285f5a976be35310fafd.png

2a76fbfc2fc40b7b116936c818599068.png

fb6dc1419a0d06f7b5bf3fe8ad0028be.png

a4f63699c74f82e7dd5ce5fb19b41511.png

"Fusing Geometry and Appearance for Road Segmentation"

4ac5700cc38d9652e1adb6223c0c3f42.png
Geometric Prior

33c8d76c16d167a144fb1e5d64fa7a50.png
Selecting an appearance model

de1bf5ea124fee70e89d49d8e83c02c2.png
fusion of geometric and appearance cues

35b05b546be1ef5af091f76b86969e35.png
Example results. (a) Test image (b) Ground truth segmentation. (c) Geometric method. (d) Fusion method (initial). (e) Fusion method (refined). (f) SegNet method (road + pavement)

"LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks"

讨论通过深度学习将激光雷达和摄像头结合做道路检测。该方法会将3D点云先投影到图像平面,然后上采样得到致密的深度图像。该方法训练多个全卷积网络 (FCNs) 以不同方式完成道路检测:单个传感器工作或者三个融合方式,即前融合、后融合和交叉融合。前/后融合一般是在某个网络深度层集成信息,而交叉融合的FCN网络是直接从数据学习如何集成信息,其方法是训练一个在激光雷达分支和摄像头分支之间的交叉接头。

002937474d473f9aeea1f98b3931f03f.png

三种融合机制:1) 前融合. 传感器数据直接合并成2) 后融合. 在layer 20,输出的结果在数据的深度维度合并,通过一个卷积层完成高层融合. 3) 交叉融合. 两个分支通过一个可训练的标量交叉接头连接在一起,即系数aj,bj, 深度层j ∈ {1, . . . , 20}. 在给定深度,会根据图示的操作对每层输入(摄像头数据和激光雷达数据)进行计算,

5da0fc78a4f1108d45934126e77e4dbd.png

6464417737ccdb8deaecf91dc8ee0bb4.png

5a3f23971437e62e1180c72ab508ee75.png

8d7f5700f4d3a0979a545871aa83b798.png

266c14f267d8c25844610bbf2f64492d.png

3939605a35ede15d04ffa2a58886af63.png

"Detecting Drivable Area for Self-driving Cars: An Unsupervised Approach"

10811bd676f0c288b3795b0ac1d532e3.png
unsupervised approach

c2b39e5596d4bcdce47eceaae07dcb19.png
obstacle classification

。。。待续。。。

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐