Skip to content

Image Augmentation ​

In :numref:sec_alexnet, we mentioned that large datasets are a prerequisite for the success of deep neural networks in various applications. Image augmentation generates similar but distinct training examples after a series of random changes to the training images, thereby expanding the size of the training set. Alternatively, image augmentation can be motivated by the fact that random tweaks of training examples allow models to rely less on certain attributes, thereby improving their generalization ability. For example, we can crop an image in different ways to make the object of interest appear in different positions, thereby reducing the dependence of a model on the position of the object. We can also adjust factors such as brightness and color to reduce a model's sensitivity to color. It is probably true that image augmentation was indispensable for the success of AlexNet at that time. In this section we will discuss this widely used technique in computer vision.

julia
using Pkg;
Pkg.activate("../../d2lai")
using DataAugmentation, Images
using d2lai, Flux, MLDatasets
using Flux, MPI, NCCL, CUDA
using Random
using Flux.Optimisers
using Flux.Zygote
using Statistics
using MLDatasets
using Plots
  Activating project at `~/d2l-julia/d2lai`

Common Image Augmentation Methods ​

In our investigation of common image augmentation methods, we will use the following 400×500 image an example.

julia
img = load("../img/cat1.jpg")

Most image augmentation methods have a certain degree of randomness. To make it easier for us to observe the effect of image augmentation, next we define an auxiliary function apply. This function runs the image augmentation method aug multiple times on the input image img and shows all the results.

julia
function apply_(img, aug; nrows = 2, ncols = 4, scale = 1.5)
    item = Image(img)
    Y = map(1:(nrows*ncols)) do _ 
        apply(aug, item) |> itemdata
    end
    mosaicview(Y...; nrow = nrows)
end
apply_ (generic function with 1 method)

Flipping and Cropping ​

Flipping the image left and right usually does not change the category of the object. This is one of the earliest and most widely used methods of image augmentation. Next, we use the transforms module to create the RandomFlipLeftRight instance, which flips an image left and right with a 50% chance.

julia
aug = DataAugmentation.compose(Maybe(FlipX{2}()))
apply_(img, aug)

Flipping up and down is not as common as flipping left and right. But at least for this example image, flipping up and down does not hinder recognition. Next, we create a RandomFlipTopBottom instance to flip an image up and down with a 50% chance.

julia
aug = Maybe(FlipY{2}())
apply_(img, aug)

In the example image we used, the cat is in the middle of the image, but this may not be the case in general. In :numref:sec_pooling, we explained that the pooling layer can reduce the sensitivity of a convolutional layer to the target position. In addition, we can also randomly crop the image to make objects appear in different positions in the image at different scales, which can also reduce the sensitivity of a model to the target position.

In the code below, we [randomly crop] an area with an area of 10%∼100% of the original area each time, and the ratio of width to height of this area is randomly selected from 0.5∼2. Then, the width and height of the region are both scaled to 200 pixels. Unless otherwise specified, the random number between a and b in this section refers to a continuous value obtained by random and uniform sampling from the interval [a,b].

julia
aug = RandomCrop((200,200))
apply_(img, aug)

Changing Colors ​

Another augmentation method is changing colors. We can change four aspects of the image color: brightness, contrast, saturation, and hue. In the example below, we randomly change the brightness of the image to a value between 50% (1−0.5) and 150% (1+0.5) of the original image.

julia
aug = DataAugmentation.compose(AdjustBrightness(0.5), AdjustContrast(0.5))
apply_(img, aug)

Training with Image Augmentation ​

Let's train a model with image augmentation. Here we use the CIFAR-10 dataset instead of the Fashion-MNIST dataset that we used before. This is because the position and size of the objects in the Fashion-MNIST dataset have been normalized, while the color and size of the objects in the CIFAR-10 dataset have more significant differences. The first 32 training images in the CIFAR-10 dataset are shown below.

In order to obtain definitive results during prediction, we usually only apply image augmentation to training examples, and do not use image augmentation with random operations during prediction. Here we only use the simplest random left-right flipping method. In addition, we use a ToTensor instance to convert a minibatch of images into the format required by the deep learning framework, i.e., 32-bit floating point numbers between 0 and 1 with the shape of (batch size, number of channels, height, width).

julia

struct CIFAR10Data{T,V,L,A} <: d2lai.AbstractData 
    train::T
    val::V
    labels::L
    args::A
    function CIFAR10Data(; batchsize = 64, flatten = false, aug = nothing)
        dataset = MLDatasets.CIFAR10
        t = dataset(:train)[:]
        v = dataset(:test)[:]
       
        l = dataset().metadata["class_names"]
        train, val = map((t,v)) do x
            item = colorview(RGB, permutedims(x.features[:,:,:,:], (3, 1, 2, 4)))
            item = Image(item)
            data = itemdata(apply(aug, item))
            data = permutedims(data, (1, 2, 4, 3)) |> collect
            (features = data, targets= x.targets)
        end
        args = (batchsize = batchsize, flatten = flatten)
        new{typeof(train), typeof(val), typeof(l), typeof(args)}(train, val, l, args)
    end
end
julia
aug = DataAugmentation.compose(
    Maybe(FlipX{3}()),
    ImageToTensor()
)
data = CIFAR10Data(; aug)
Data object of type CIFAR10Data

Multi-GPU Training ​

We train the ResNet-18 model from :numref:sec_resnet on the CIFAR-10 dataset. Recall the introduction to multi-GPU training in :numref:sec_multi_gpu_concise. In the following, we define a function to train and evaluate the model using multiple GPUs.

julia
arch = ((2, 64), (2, 128), (2, 256), (2, 512))
model = d2lai.ResNet(arch, 10, (32, 32, 3)) |> gpu ;
julia
CUDA.allowscalar(false)
julia

function train_ch13(model, train_iter, test_iter, trainer;  num_epochs = 100, batchsize = 256, verbose = true)
  DistributedUtils.initialize(NCCLBackend)
  backend = DistributedUtils.get_distributed_backend(NCCLBackend)
  rank = DistributedUtils.local_rank(backend)
  model = DistributedUtils.synchronize!!(backend, DistributedUtils.FluxDistributedModel(model); root=0) 
  opt = DistributedUtils.DistributedOptimizer(backend, trainer.opt)
  st_opt = Optimisers.setup(opt, model)
  st_opt = DistributedUtils.synchronize!!(backend, st_opt; root=0) 
  train_data = DistributedUtils.DistributedDataContainer(
            backend, train_iter
        )

  train_loader = Flux.DataLoader(train_data, batchsize = batchsize, shuffle = true)

  val_data = DistributedUtils.DistributedDataContainer(
            backend, test_iter
        )

  val_loader = Flux.DataLoader(val_data, batchsize = batchsize)

  for i in 1:num_epochs 
    losses = (train_losses = [], val_losses = [], val_acc = [])
    for batch in train_loader 
      l, back = Zygote.pullback(d2lai.training_step, model, batch)
      g = back(one(l))[1]
      st_opt, model = Optimisers.update(st_opt, model, g)
      push!(losses.train_losses, d2lai.training_step(model, batch))
    end
    for batch in val_loader 
      val_loss, val_acc = d2lai.validation_step(model, batch)
      push!(losses.val_losses, val_loss)
      push!(losses.val_acc, val_acc)
    end
    verbose &&@info "Epoch: $i Training Loss: $(mean(losses.train_losses)) Val Loss: $(mean(losses.val_losses)) Val Acc: $(mean(losses.val_acc))" 

    d2lai.draw_metrics(model, i, trainer, losses)
  end
  verbose && Plots.display(trainer.board.plt)
end
train_ch13 (generic function with 1 method)
julia
trainer = Trainer(model, nothing, Optimisers.Adam(0.01))
Trainer{ResNet{Chain{Tuple{d2lai.ResNetB1{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, MaxPool{2, 4}}}}, d2lai.ResNetBlock{Chain{Vector{d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}}}}, Chain{Tuple{d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}, d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}, d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}}}, GlobalMeanPool, typeof(Flux.flatten), Dense{typeof(identity), CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, typeof(softmax)}}}, Nothing, ProgressBoard, Adam, NamedTuple}(ResNet{Chain{Tuple{d2lai.ResNetB1{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, MaxPool{2, 4}}}}, d2lai.ResNetBlock{Chain{Vector{d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}}}}, Chain{Tuple{d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}, d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}, d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}}}, GlobalMeanPool, typeof(Flux.flatten), Dense{typeof(identity), CuArray{Float32, 2, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, typeof(softmax)}}}(Chain(d2lai.ResNetB1{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, MaxPool{2, 4}}}}(Chain(Conv((7, 7), 3 => 64, pad=3, stride=2), BatchNorm(64, relu), MaxPool((3, 3), pad=1, stride=2))), d2lai.ResNetBlock{Chain{Vector{d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}}}}(Chain([d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}(Parallel(+, Chain(Conv((3, 3), 64 => 64, pad=1), BatchNorm(64, relu), Conv((3, 3), 64 => 64, pad=1), BatchNorm(64)), identity)), d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}(Parallel(+, Chain(Conv((3, 3), 64 => 64, pad=1), BatchNorm(64, relu), Conv((3, 3), 64 => 64, pad=1), BatchNorm(64)), identity))])), Chain(d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}(Chain([d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}(Parallel(+, Chain(Conv((3, 3), 64 => 128, pad=1, stride=2), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), Conv((1, 1), 64 => 128, stride=2))), d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}(Parallel(+, Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), identity))])), d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}(Chain([d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}(Parallel(+, Chain(Conv((3, 3), 128 => 256, pad=1, stride=2), BatchNorm(256, relu), Conv((3, 3), 256 => 256, pad=1), BatchNorm(256)), Conv((1, 1), 128 => 256, stride=2))), d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}(Parallel(+, Chain(Conv((3, 3), 256 => 256, pad=1), BatchNorm(256, relu), Conv((3, 3), 256 => 256, pad=1), BatchNorm(256)), identity))])), d2lai.ResNetBlock{Chain{Vector{d2lai.Residual}}}(Chain([d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}}}}(Parallel(+, Chain(Conv((3, 3), 256 => 512, pad=1, stride=2), BatchNorm(512, relu), Conv((3, 3), 512 => 512, pad=1), BatchNorm(512)), Conv((1, 1), 256 => 512, stride=2))), d2lai.Residual{Parallel{typeof(+), Tuple{Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}, Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.DeviceMemory}, CuArray{Float32, 1, CUDA.DeviceMemory}}, BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.DeviceMemory}, Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}}}, typeof(identity)}}}(Parallel(+, Chain(Conv((3, 3), 512 => 512, pad=1), BatchNorm(512, relu), Conv((3, 3), 512 => 512, pad=1), BatchNorm(512)), identity))]))), GlobalMeanPool(), flatten, Dense(512 => 10), softmax)), nothing, ProgressBoard("epochs", :identity, :log10, Any[:blue, :red, :orange], (25, 25), Plot{Plots.GRBackend() n=0}, Dict{Any, Any}(), Animation("/tmp/jl_VJhUr6", String[])), Adam(0.01, (0.9, 0.999), 1.0e-8), (verbose = true, gradient_clip_val = 0.0))
julia
function load_cifar10(aug, is_train)
  data = CIFAR10Data(; aug)
  is_train ? data.train : data.val 
end
load_cifar10 (generic function with 1 method)
julia
train_iter = load_cifar10(aug, true) |> gpu
test_iter = load_cifar10(aug, false) |> gpu

train_ch13(model, train_iter, test_iter, trainer; num_epochs= 20)
[Info] Epoch: 1 Training Loss: 4.067707 Val Loss: 2.4243784 Val Acc: 0.28056640625
[Info] Epoch: 2 Training Loss: 3.026581 Val Loss: 1.7421306 Val Acc: 0.3640625
[Info] Epoch: 3 Training Loss: 1.95483 Val Loss: 1.4842434 Val Acc: 0.44404296875
[Info] Epoch: 4 Training Loss: 1.4267793 Val Loss: 1.4293945 Val Acc: 0.4763671875
[Info] Epoch: 5 Training Loss: 1.3861558 Val Loss: 1.475838 Val Acc: 0.45078125
[Info] Epoch: 6 Training Loss: 1.2854872 Val Loss: 1.3778646 Val Acc: 0.50888671875
[Info] Epoch: 7 Training Loss: 1.1668421 Val Loss: 1.2478685 Val Acc: 0.56455078125
[Info] Epoch: 8 Training Loss: 1.0165958 Val Loss: 1.1656833 Val Acc: 0.583203125
[Info] Epoch: 9 Training Loss: 0.93660057 Val Loss: 1.0494362 Val Acc: 0.65009765625
[Info] Epoch: 10 Training Loss: 0.85998106 Val Loss: 1.094136 Val Acc: 0.63466796875
[Info] Epoch: 11 Training Loss: 0.76146543 Val Loss: 1.2776625 Val Acc: 0.5943359375
[Info] Epoch: 12 Training Loss: 0.7236295 Val Loss: 1.0485957 Val Acc: 0.64833984375
[Info] Epoch: 13 Training Loss: 0.6517101 Val Loss: 1.0409178 Val Acc: 0.6763671875
[Info] Epoch: 14 Training Loss: 0.5535658 Val Loss: 0.93536675 Val Acc: 0.71259765625
[Info] Epoch: 15 Training Loss: 0.49900094 Val Loss: 0.9871677 Val Acc: 0.7025390625
[Info] Epoch: 16 Training Loss: 0.46305642 Val Loss: 1.2034806 Val Acc: 0.6869140625
[Info] Epoch: 17 Training Loss: 0.43134528 Val Loss: 1.1011393 Val Acc: 0.7064453125
[Info] Epoch: 18 Training Loss: 0.40690297 Val Loss: 1.0475357 Val Acc: 0.72294921875
[Info] Epoch: 19 Training Loss: 0.3579023 Val Loss: 1.1345057 Val Acc: 0.7115234375
[Info] Epoch: 20 Training Loss: 0.26337913 Val Loss: 1.1019546 Val Acc: 0.72138671875
julia