Transposed Convolution
The CNN layers we have seen so far, such as convolutional layers (:numref:sec_conv_layer) and pooling layers (:numref:sec_pooling), typically reduce (downsample) the spatial dimensions (height and width) of the input, or keep them unchanged. In semantic segmentation that classifies at pixel-level, it will be convenient if the spatial dimensions of the input and output are the same. For example, the channel dimension at one output pixel can hold the classification results for the input pixel at the same spatial position.
To achieve this, especially after the spatial dimensions are reduced by CNN layers, we can use another type of CNN layers that can increase (upsample) the spatial dimensions of intermediate feature maps. In this section, we will introduce transposed convolution, which is also called fractionally-strided convolution [204], for reversing downsampling operations by the convolution.
using Pkg;
Pkg.activate("../../d2lai")
using Flux, d2lai Activating project at `~/d2l-julia/d2lai`
WARNING: Method definition corr2d(AbstractArray{T, N} where N where T, AbstractArray{T, N} where N where T) in module d2lai at /home/ashutosh-b-b/d2l-julia/d2lai/src/misc.jl:1 overwritten at /home/ashutosh-b-b/d2l-julia/d2lai/src/models/cnn.jl:1.
ERROR: Method overwriting is not permitted during Module precompilation. Use `__precompile__(false)` to opt-out of precompilation.Basic Operation
Ignoring channels for now, let's begin with the basic transposed convolution operation with stride of 1 and no padding. Suppose that we are given a
As an example, Figure illustrates how transposed convolution with a
Transposed convolution with a
We can (implement this basic transposed convolution operation) trans_conv for a input matrix X and a kernel matrix K.
function trans_conv(X::AbstractArray, K::AbstractArray)
Y = zeros(size(X) .+ size(K) .- 1)
h, w = size(K)
for i in 1:size(X, 1)
for j in 1:size(X, 2)
Y[i:(i+h-1), j:(j+h-1)] += X[i,j].*K
end
end
Y
endtrans_conv (generic function with 1 method)In contrast to the regular convolution (in :numref:sec_conv_layer) that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input. We can construct the input tensor X and the kernel tensor K from Figure to [validate the output of the above implementation] of the basic two-dimensional transposed convolution operation.
K = [0. 1.; 2. 3.]
X = [0. 1.; 2. 3.]
trans_conv(X, K)3×3 Matrix{Float64}:
0.0 0.0 1.0
0.0 4.0 6.0
4.0 12.0 9.0Alternatively, when the input X and kernel K are both four-dimensional tensors, we can use high-level APIs to obtain the same results.
K = reshape(K, 2, 2, 1, 1)
X = reshape(X, 2, 2, 1, 1)
tconv = ConvTranspose(K)
tconv(X)3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
0.0 3.0 2.0
6.0 14.0 6.0
2.0 3.0 0.0Padding, Strides, and Multiple Channels
Different from in the regular convolution where padding is applied to input, it is applied to output in the transposed convolution. For example, when specifying the padding number on either side of the height and width as 1, the first and last rows and columns will be removed from the transposed convolution output.
tconv = ConvTranspose(K; pad = 1)
tconv(X)1×1×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
14.0In the transposed convolution, strides are specified for intermediate results (thus output), not for input. Using the same input and kernel tensors from Figure, changing the stride from 1 to 2 increases both the height and weight of intermediate tensors, hence the output tensor in Figure.
Transposed convolution with a
The following code snippet can validate the transposed convolution output for stride of 2 in Figure.
tconv = ConvTranspose(K; stride = 2)
tconv(X)4×4×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
0.0 0.0 3.0 2.0
0.0 0.0 1.0 0.0
6.0 4.0 9.0 6.0
2.0 0.0 3.0 0.0For multiple input and output channels, the transposed convolution works in the same way as the regular convolution. Suppose that the input has
As in all, if we feed
X = rand(16, 16, 10, 1)
conv = Conv((5,5), 10 => 20, pad = 2, stride = 3)
tconv = ConvTranspose((5,5), 20 => 10, pad = 2, stride = 3)
size(tconv(conv(X))) == size(X)trueConnection to Matrix Transposition
🏷️subsec-connection-to-mat-transposition
The transposed convolution is named after the matrix transposition. To explain, let's first see how to implement convolutions using matrix multiplications. In the example below, we define a X and a K, and then use the corr2d function to compute the convolution output Y.
X = reshape(collect(1:9), 3,3)'
K = [1 2; 3 4]
Y = d2lai.corr2d(X, K)2×2 Matrix{Float64}:
37.0 47.0
67.0 77.0function kernel2matrix(K)
k = zeros(5)
W = zeros((4, 9))
k[1:2] .= K[1, :]
k[4:5] .= K[2, :]
W[1, 1:5] .= k
W[2, 2:6] .= k
W[3, 4:8] .= k
W[4, 5:end] .= k
return W
endkernel2matrix (generic function with 1 method)W = kernel2matrix(K)4×9 Matrix{Float64}:
1.0 2.0 0.0 3.0 4.0 0.0 0.0 0.0 0.0
0.0 1.0 2.0 0.0 3.0 4.0 0.0 0.0 0.0
0.0 0.0 0.0 1.0 2.0 0.0 3.0 4.0 0.0
0.0 0.0 0.0 0.0 1.0 2.0 0.0 3.0 4.0Concatenate the input X row by row to get a vector of length 9. Then the matrix multiplication of W and the vectorized X gives a vector of length 4. After reshaping it, we can obtain the same result Y from the original convolution operation above: we just implemented convolutions using matrix multiplications.
Y == reshape(W * reshape(X', :, 1), 2, 2)'trueLikewise, we can implement transposed convolutions using matrix multiplications. In the following example, we take the Y from the above regular convolution as input to the transposed convolution. To implement this operation by multiplying matrices, we only need to transpose the weight matrix W with the new shape
Consider implementing the convolution by multiplying matrices. Given an input vector
Summary
In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input.
If we feed
into a convolutional layer to output and create a transposed convolutional layer with the same hyperparameters as except for the number of output channels being the number of channels in , then will have the same shape as . We can implement convolutions using matrix multiplications. The transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer.