深度学习入门(8) - Generative models 生成模型

andyc_03

1086人浏览 · 2024-04-28 19:16:00

andyc_03 · 2024-04-28 19:16:00 发布

Generative models

Supervised vs. Unsupervised

Discriminative Model vs. Generative Models vs. Conditional Generative

Discriminative: only label compete for probability mass, no competition between images

Generative: images compete with each other for probability mass

usage

Discriminative:

Feature learning
Assign labels to data

Generative:

detect outliers
feature learning
sample to generate new data

Conditional Generative:

assign labels while rejecting outliers
generate new data conditioned on input labels

请添加图片描述

Autoregressive

Goal: Write down an explicit function for $p (x) = f (x, W)$

We can break down the probability function to get $p(x_1,x_2,...,x_T) = \Pi_{t=1}^Tp(x_t|x_1,x_2,...x_{t-1})$

We can use RNN to train a density function

PixelRNN

generate image pixels one at a time, staring at the upper left corner

compute a hidden state for each pixel that depends on hidden state and RGB values from the left and above (LSTM recurrence)

$h_{x,y} = f(h_{x-1,y},h_{x,y-1},W)$

At each pixel, predict red, then blue, then green

Each pixel depends implicitly on all pixels above and to the left

Problem: Really slow both training and testing

PixelCNN

Dependency on previous pixel now modeled using a CNN over context

on new CIFAR images

Autoregressive Models: PixelRNN/CNN

Pros:

can explicitly compute likelihood
gives good evaluation metric
good samples

Cons:

sequential generation -> slow

Variational Autoencoders

VAE define an intractable density.

we can optimize a lower bound on this density instead.

(non-variational) Autoencoders

Features should extract useful information that can use for downstream tasks

Problem: how can we learn this feature transform from raw data?

idea: use the features to reconstruct the input data with a decoder

loss: L2 distance between input and reconstructed data

Variational Autoencoders

learn latent features z from raw data
sample from the model to generate new data

请添加图片描述

How to train this model: maximize the likelihood of data

However, we can’t get full access to all the x

so, we need to find a computable lower bound of the likelihood

Hopefully, with the growing of lower-bound, the real likelihood will increase

请添加图片描述

Process:

run input data through encoder to get a distribution over latent codes
encoder output should match the prior $p (z)$

Here, we assume the distribution we learn and the prior $p (z)$ are both diagonal Gaussian distribution for computational convenience.

sample code z from encoder output
run sampled code through decoder to get a distribution over data samples
original input data should be likely under the distribution output from step 4

Step 2, 5 are the two terms of loss we try to minimize

Generative Adversarial Networks

Setup: Assume we have data $x$ drawn from distribution $p_{data}(x)$ . We want to sample from $p_{data}(x)$ .

Idea: Introduce a latent variable z with simple prior $p (z)$ .

sample z from $p (z)$ and pass to a Generator Network $x = G (z)$

Then x is a sample from the generator distribution $p_G$ , we want $p_G = p_{data}$

We train the Generator Network to convert z into fake data x sampled from $p_G$

by fooling the discriminator D

Train D to classify data as real or fake

We will train the two networks jointly, they are fighting against each other.

loss

请添加图片描述

problem: At the beginning of training, vanishing gradients for G

solution: change minimize log(1-D(G(z))) to maximize -log(D(G(z))) then G gets strong gradients at the beginning of training ! nice idea

We can do some math calculations to prove GAN can get the optimal $p_{data}$ .

But there are still caveats about the capability of our fixed architecture to reach the optimal and its convergence.

DC-GAN

interpolating between points in latent z space

We can even do latent vector math!

Conditional GANs

we can use conditional Batch Normalization

learn parameters for different labels

Spectral Normalization

we can generate images in specific labels

…

DAMO开发者矩阵

DAMO开发者矩阵，由阿里巴巴达摩院和中国互联网协会联合发起，致力于探讨最前沿的技术趋势与应用成果，搭建高质量的交流与分享平台，推动技术创新与产业应用链接，围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐

ros小车自动充电硬件架构与 IsaacLab 强化学习仿真部署

文章摘要：本文探讨了机器人开发中的两大核心挑战：高容错自动充电系统设计与灵巧手强化学习仿真部署。自动充电硬件架构：采用固定端弹簧顶针+移动端铜垫方案，避免机械损伤。通过肖特基二极管防止电池倒灌短路，结合微动开关实现“冷插拔”时序控制，确保电气安全。 IsaacLab灵巧手仿真优化：传统CAD模型直接导入会导致算力爆炸，需对URDF文件进行轻量化（凸分解、阻尼注入）和碰撞网格优化。提供S

DAMO开发者矩阵

【机器人 / 强化学习】HIL-SERL 算法篇：HG-DAgger 与 RLPD —— 从模仿到超越的训练双阶段

是一条完整的能力进化链。HG-DAgger 解决的是"如何安全、高效地获取高质量纠正数据"，RLPD 解决的是"如何利用这些数据超越人类表现"。HG-DAgger 和 RLPD 共享相同的干预接口（SpaceMouse 拦截、env.step 的 intervene_action 通道），但处理数据的策略截然不同——一个用监督学习模仿人类，一个用强化学习优化回报。奖励信号的两种角色。