ModulatedConv2d and torch.autograd.Function

0. 本章内容
1.对Function的直观理解
2. Function与Module的差异与应用场景
3. 一个ReLU Function

ModulatedConv2d


class ModulatedConv2d(nn.Module):
    def __init__(self, fin, fout, kernel_size, padding_type='reflect', upsample=False, downsample=False, latent_dim=256, normalize_mlp=False):
        super(ModulatedConv2d, self).__init__()
        self.in_channels = fin
        self.out_channels = fout
        self.kernel_size = kernel_size
        self.upsample = upsample
        self.downsample = downsample
        padding_size = kernel_size // 2
        if kernel_size == 1:
            self.demudulate = False
        else:
            self.demudulate = True
        self.weight = nn.Parameter(torch.Tensor(fout, fin, kernel_size, kernel_size))
        self.bias = nn.Parameter(torch.Tensor(1, fout, 1, 1))
        self.conv = F.conv2d
        if normalize_mlp:
            self.mlp_class_std = nn.Sequential(EqualLinear(latent_dim, fin), PixelNorm())
        else:
            self.mlp_class_std = EqualLinear(latent_dim, fin)
        self.blur = Blur(fout)
        if padding_type == 'reflect':
            self.padding = nn.ReflectionPad2d(padding_size)
        else:
            self.padding = nn.ZeroPad2d(padding_size)
        if self.upsample:
            self.upsampler = nn.Upsample(scale_factor=2, mode='nearest')
        if self.downsample:
            self.downsampler = nn.AvgPool2d(2)
        self.weight.data.normal_()
        self.bias.data.zero_()
    def forward(self, input, latent):
        fan_in = self.weight.data.size(1) * self.weight.data[0][0].numel()
        weight = self.weight * sqrt(2 / fan_in)
        weight = weight.view(1, self.out_channels, self.in_channels, self.kernel_size, self.kernel_size)
        s = 1 + self.mlp_class_std(latent).view(-1, 1, self.in_channels, 1, 1)
        weight = s * weight
        if self.demudulate:
            d = torch.rsqrt((weight ** 2).sum(4).sum(3).sum(2) + 1e-5).view(-1, self.out_channels, 1, 1, 1)
            weight = (d * weight).view(-1, self.in_channels, self.kernel_size, self.kernel_size)
        else:
            weight = weight.view(-1, self.in_channels, self.kernel_size, self.kernel_size)
        if self.upsample:
            input = self.upsampler(input)
        if self.downsample:
            input = self.blur(input)
        b,_,h,w = input.shape
        input = input.view(1,-1,h,w)
        input = self.padding(input)
        out = self.conv(input, weight, groups=b).view(b, self.out_channels, h, w) + self.bias
        if self.downsample:
            out = self.downsampler(out)
        if self.upsample:
            out = self.blur(out)
        return out

from torch.autograd import Function
class BlurFunctionBackward(Function):
    @staticmethod
    def forward(ctx, grad_output, kernel, kernel_flip):
        ctx.save_for_backward(kernel, kernel_flip)
        grad_input = F.conv2d(
            grad_output, kernel_flip, padding=1, groups=grad_output.shape[1]
        )
        return grad_input
    @staticmethod
    def backward(ctx, gradgrad_output):
        kernel, kernel_flip = ctx.saved_tensors
        grad_input = F.conv2d(
            gradgrad_output, kernel, padding=1, groups=gradgrad_output.shape[1]
        )
        return grad_input, None, None
class BlurFunction(Function):
    @staticmethod
    def forward(ctx, input, kernel, kernel_flip):
        ctx.save_for_backward(kernel, kernel_flip)
        output = F.conv2d(input, kernel, padding=1, groups=input.shape[1])
        return output
    @staticmethod
    def backward(ctx, grad_output):
        kernel, kernel_flip = ctx.saved_tensors
        grad_input = BlurFunctionBackward.apply(grad_output, kernel, kernel_flip)
        return grad_input, None, None
blur = BlurFunction.apply
class Blur(nn.Module):
    def __init__(self, channel):
        super().__init__()
        weight = torch.tensor([[1, 2, 1], [2, 4, 2], [1, 2, 1]], dtype=torch.float32)
        weight = weight.view(1, 1, 3, 3)
        weight = weight / weight.sum()
        weight_flip = torch.flip(weight, [2, 3])
        self.register_buffer('weight', weight.repeat(channel, 1, 1, 1))
        self.register_buffer('weight_flip', weight_flip.repeat(channel, 1, 1, 1))
    def forward(self, input):
        return blur(input, self.weight, self.weight_flip)

0. 本章内容

在本次，我们将学习如何自定义一个torch.autograd.Function，下面是本次的主要内容

对Function的直观理解
Function与Module的差异与应用场景
写一个简单的ReLU Function
1.对Function的直观理解

在之前的介绍中，我们知道，Pytorch是利用Variable与Function来构建计算图的。回顾下Variable，Variable就像是计算图中的节点，保存计算结果（包括前向传播的激活值，反向传播的梯度），而Function就像计算图中的边，实现Variable的计算，并输出新的Variable
Function简单说就是对Variable的运算，如加减乘除，relu，pool等
但它不仅仅是简单的运算。与普通Python或者numpy的运算不同，Function是针对计算图，需要计算反向传播的梯度。因此他不仅需要进行该运算（forward过程），还需要保留前向传播的输入（为计算梯度），并支持反向传播计算梯度。如果有做过公开课cs231的作业，记得里面的每个运算都定义了forward，backward，并通过保存cache来进行反向传播。这两者是类似的。
在之前Variable的学习中，我们知道进行一次运算后，输出的Variable对应的creator就是其运行的计算，如y = relu(x), y.creator，就是relu这个Function
我们可以对Function进行拓展，使其满足我们自己的需要，而拓展就需要自定义Function的forward运算，已经对应的backward运算，同时在forward中需要通过保存输入值用于backward
总结，Function与Variable构成了pytorch的自动求导机制，它定义的是各个Variable之间的计算关系

2. Function与Module的差异与应用场景
Function与Module都可以对pytorch进行自定义拓展，使其满足网络的需求，但这两者还是有十分重要的不同：
Function一般只定义一个操作，因为其无法保存参数，因此适用于激活函数、pooling等操作；Module是保存了参数，因此适合于定义一层，如线性层，卷积层，也适用于定义一个网络
Function需要定义三个方法：init, forward, backward（需要自己写求导公式）；Module：只需定义init和forward，而backward的计算由自动求导机制构成
可以不严谨的认为，Module是由一系列Function组成，因此其在forward的过程中，Function和Variable组成了计算图，在backward时，只需调用Function的backward就得到结果，因此Module不需要再定义backward。
Module不仅包括了Function，还包括了对应的参数，以及其他函数与变量，这是Function所不具备的
3. 一个ReLU Function

首先我们定义一个继承Function的ReLU类
然后我们来看Variable在进行运算时，其creator是否是对应的Function
最后我们为方便使用这个ReLU类，将其wrap成一个函数，方便调用，不必每次显式都创建一个新对象

3.1 定义一个ReLU类

import torch
from torch.autograd import Variable
class MyReLU(torch.autograd.Function):
    def forward(self, input_):
        # 在forward中，需要定义MyReLU这个运算的forward计算过程
        # 同时可以保存任何在后向传播中需要使用的变量值
        self.save_for_backward(input_)         # 将输入保存起来，在backward时使用
        output = input_.clamp(min=0)               # relu就是截断负数，让所有负数等于0
        return output
    def backward(self, grad_output):
        # 根据BP算法的推导（链式法则），dloss / dx = (dloss / doutput) * (doutput / dx)
        # dloss / doutput就是输入的参数grad_output、
        # 因此只需求relu的导数，在乘以grad_outpu    
        input_, = self.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0                # 上诉计算的结果就是左式。即ReLU在反向传播中可以看做一个通道选择函数，所有未达到阈值（激活值<0）的单元的梯度都为0
        return grad_input

3.2 验证Variable与Function的关系

from torch.autograd import Variable
input_ = Variable(torch.randn(1))
relu = MyReLU()
output_ = relu(input_)
# 这个relu对象，就是output_.creator，即这个relu对象将output与input连接起来，形成一个计算图
print relu
print output_.creator

输出：

<__main__.MyReLU object at 0x7fd0b2d08b30>
<__main__.MyReLU object at 0x7fd0b2d08b30>

可见，Function连接了Variable与Variable并实现不同计算
3.3 Wrap一个ReLU函数
可以直接把刚才自定义的ReLU类封装成一个函数，方便直接调用

def relu(input_):
    # MyReLU()是创建一个MyReLU对象，
    # Function类利用了Python __call__操作，使得可以直接使用对象调用__call__制定的方法
    # __call__指定的方法是forward，因此下面这句MyReLU（）（input_）相当于
    # return MyReLU().forward(input_)
    return MyReLU()(input_)
input_ = Variable(torch.linspace(-3, 3, steps=5))
print input_
print relu(input_)

输出：

Variable containing:
-3.0000
-1.5000
 0.0000
 1.5000
 3.0000
[torch.FloatTensor of size 5]
Variable containing:
 0.0000
 0.0000
 0.0000
 1.5000
 3.0000
[torch.FloatTensor of size 5]

0. 本章内容

1.对Function的直观理解

2. Function与Module的差异与应用场景

3. 一个ReLU Function